{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\n# Quick Start\n\nEnd-to-end walkthrough: initialize, query, download, and load\ndata.\n\nThis example touches every step of a typical LAION-fMRI workflow:\n\n1. Initialize a local data directory\n2. Query what is available in the dataset\n3. Download data for a single subject\n4. Load and inspect the data\n\nFor deeper dives into each step, see the focused examples on\n:doc:`initialization <plot_02_initialization>`,\n:doc:`querying <plot_03_querying>`, and\n:doc:`loading <plot_04_loading>`.\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Initialize the data directory\n\nExamples 1, 2, and 4 share one data directory so the licenses\naccepted here, and the data downloaded below, are reused by the\nother examples without re-prompting or re-downloading.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import os\n\nfrom laion_fmri.config import dataset_initialize, get_data_dir\n\ndata_dir = os.environ.get(\n    \"LAION_FMRI_EXAMPLE_DATA_DIR\",\n    os.path.join(os.getcwd(), \"laion_fmri_quickstart\"),\n)\nos.makedirs(data_dir, exist_ok=True)\ndataset_initialize(data_dir)\nprint(f\"Data directory: {get_data_dir()}\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Query the dataset\n\nThe bucket is public, so no AWS credentials are needed.\nDiscovery functions read directly from the S3 bucket, so you\ncan see what is available before downloading anything.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "from laion_fmri.discovery import describe, get_subjects\n\nprint(f\"Available subjects: {get_subjects()}\")\ndescribe()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Download one subject -- but just one session, in parallel\n\nA full subject is several tens of GB. ``download`` accepts BIDS\nentities (``ses``, ``task``, ``space``, ``desc``, ``stat``,\n``suffix``, ``extension``) as filters, so you can grab just the\nslice you need.\n\n**About** ``ses``: when set to a session ID, only that session's\nfiles are pulled -- subject-level aggregate maps (the per-subject\nnoise-ceiling variants, the mean-R^2 summaries, etc.) are\n*excluded*. To pull only those aggregates, use the special value\n``ses=\"averages\"``; combine with a list to pull both. The brain\nmask is the one exception -- it's automatically included with any\n``ses`` filter, since the loader needs it to mask voxels.\n\n``n_jobs`` runs that many ``aws s3 cp`` workers in parallel. The\ncall is also idempotent -- re-running after an interrupted\ntransfer skips files that already match the bucket size, so you\nonly fetch what's missing.\n\nThe neuroimaging data and the stimuli are covered by two separate\nlicenses. On the first download you will be prompted **twice** --\nonce for each -- and you must type ``I AGREE`` each time:\n\n1. **Neuroimaging data** (CC0 1.0) -- unrestricted use.\n2. **Stimuli** (closed, research-only) -- no redistribution, no\n   commercial or AI/ML-training use.\n\nThe acceptances are persisted, so the prompts only appear on the\nfirst download into a given data directory.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "from laion_fmri.download import download\n\nsubject_id = \"sub-01\"\nsession_id = \"ses-01\"\nprint(f\"Downloading {subject_id} / {session_id}\")\n# Most workflows only need the files the loaders read directly --\n# trial info, statmaps, and ROI masks. Asking for the suffix\n# subset below keeps a session pull around a few hundred MB\n# instead of the multi-GB you'd get pulling everything; drop\n# ``suffix`` if you also want the raw GLMsingle model dump or\n# the JSON sidecars.\ndownload(\n    subject=subject_id,\n    ses=session_id,\n    suffix=[\"statmap\", \"trials\", \"mask\"],\n    include_stimuli=True,\n    n_jobs=4,\n)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Load the subject\n\nOnce data is on disk, load a :class:`~laion_fmri.subject.Subject`\nand inspect its sessions and available ROIs. The brain mask is\nderived on the fly from the subject-level mean-R^2 file\n(``..._stat-rsquare_desc-R2mean_statmap.nii.gz``) -- voxels with\nany non-zero GLMsingle fit are considered \"in brain\".\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "from laion_fmri.subject import load_subject\n\nsub = load_subject(subject_id)\nprint(f\"Subject:   {sub.subject_id}\")\nprint(f\"Sessions:  {sub.get_sessions()}\")\nprint(f\"Voxels:    {sub.get_n_voxels()}\")\nprint(f\"ROIs:      {sub.get_available_rois()}\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Single-trial betas\n\n``get_betas`` returns ``(n_trials, n_voxels)`` within the brain\nmask. **For real subjects the brain-mask voxel count is ~270k**;\nmultiplied by ~1000 trials per session, that's ~1 GB per call.\nIn practice you should always pass an ``roi=`` filter to keep the\narray small (face-area ROI, e.g. ~1000 voxels, drops the call to\na few MB).\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "session = \"ses-01\"\n\n# Without ROI: heavy but works.\nbetas_all = sub.get_betas(session=session)\nprint(f\"{session} betas (full mask): {betas_all.shape}\")\n\n# Recommended: use an ROI filter.\nrois_face = sub.get_available_rois(category=\"face\")\nif rois_face:\n    betas_face = sub.get_betas(session=session, roi=\"face\")\n    print(f\"{session} betas (face ROIs): {betas_face.shape}\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Save a derived map back to NIfTI\n\n``get_betas`` returns a 1-D ``(n_trials, n_voxels)`` array\nmasked to brain-mask voxels. To round-trip any per-voxel\nstatistic you compute back into a 3-D NIfTI on disk (so\nexternal tools can read it), pass the array to\n``Subject.to_nifti``: it scatters the values into a\n``(X, Y, Z)`` volume and writes the file.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "mean_betas = betas_all.mean(axis=0)  # (n_voxels,)\nmean_path = f\"/tmp/{subject_id}_{session}_trial_mean.nii.gz\"\nsub.to_nifti(mean_betas, mean_path)\nprint(f\"Saved {mean_path}\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Visualize the first three trials\n\nSingle-trial betas are inherently noisier than the contrast\nmaps you may be used to: each panel below captures one\nstimulus presentation rather than an average over many, so\ndon't expect crisp activation patterns. The view is most\nuseful as a quick sanity check -- amplitudes in a reasonable\nrange, signal concentrated in cortex rather than at edges or\nin white matter, and the three trials looking distinct from\neach other rather than suspiciously similar.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import warnings\n\nimport numpy as np\nimport matplotlib.pyplot as plt\nimport seaborn as sns\nfrom matplotlib.colors import Normalize\nfrom nilearn import plotting\n\nfrom laion_fmri._paths import r2mean_path\n\n# Nilearn warns about NaN / inf voxels from GLMsingle non-fits;\n# they're outside the brain mask and don't affect the rendering.\nwarnings.filterwarnings(\n    \"ignore\",\n    message=\"Non-finite values detected\",\n    category=UserWarning,\n)\n\nbg_img = str(r2mean_path(get_data_dir(), subject_id))\nstat_cmap = sns.diverging_palette(220, 20, as_cmap=True)\n\ntrial_paths = []\nfor i in range(3):\n    p = f\"/tmp/{subject_id}_{session}_trial{i}.nii.gz\"\n    sub.to_nifti(betas_all[i], p)\n    trial_paths.append(p)\n\nvmax = float(np.percentile(np.abs(betas_all[:3]), 99))\n\nfig = plt.figure(figsize=(16, 4.3))\ngs = fig.add_gridspec(2, 1, height_ratios=[1, 0.05], hspace=0.1)\nstrip_gs = gs[0].subgridspec(1, 3, wspace=0.05)\naxes = [fig.add_subplot(strip_gs[0, i]) for i in range(3)]\ncbar_ax = fig.add_subplot(gs[1])\n\nfor i, ax in enumerate(axes):\n    plotting.plot_stat_map(\n        trial_paths[i], bg_img=bg_img, axes=ax,\n        display_mode=\"z\", cut_coords=[-17],\n        cmap=stat_cmap, vmax=vmax, colorbar=False,\n        black_bg=False, threshold=0.5,\n    )\n    ax.set_title(f\"Trial {i}\")\n\nsm = plt.cm.ScalarMappable(\n    cmap=stat_cmap, norm=Normalize(vmin=-vmax, vmax=vmax),\n)\nfig.colorbar(\n    sm, cax=cbar_ax, orientation=\"horizontal\",\n    label=f\"{session} trial \u03b2\",\n)\nplt.show()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Three category-selective ROIs\n\nFace, body, and place ROIs sit in fairly stereotyped parts\nof ventral temporal cortex, but each subject's exact\nlocalization differs. Glance at this panel to confirm the\nmasks landed where you'd expect -- FFA1 in fusiform gyrus,\nEBA in lateral occipitotemporal cortex, PPA in\nparahippocampal cortex -- before relying on them to filter\ndownstream analyses.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import nibabel as nib\nfrom nilearn.plotting import find_xyz_cut_coords\n\npalette = sns.color_palette(\"colorblind\")\nroi_specs = [\n    (\"FFA1\", palette[0]),\n    (\"EBA\",  palette[1]),\n    (\"PPA\",  palette[2]),\n]\n\nfig, axes = plt.subplots(1, 3, figsize=(16, 3.7))\nfor ax, (roi, color) in zip(axes, roi_specs):\n    roi_path = f\"/tmp/{subject_id}_roi_{roi}.nii.gz\"\n    sub.to_nifti(\n        sub.get_roi_mask(roi).astype(\"float32\"), roi_path,\n    )\n    _, _, z = find_xyz_cut_coords(nib.load(roi_path))\n    display = plotting.plot_anat(\n        bg_img, axes=ax,\n        display_mode=\"z\", cut_coords=[z],\n        black_bg=False, threshold=0.1, colorbar=False,\n    )\n    display.add_contours(\n        roi_path, levels=[0.5],\n        colors=[color], linewidths=1.5,\n    )\n    ax.set_title(roi)\nplt.show()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Per-session noise ceiling\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "nc = sub.get_noise_ceiling(session=session)\nprint(\n    f\"NC: shape={nc.shape}, \"\n    f\"range=[{nc.min():.3f}, {nc.max():.3f}]\"\n)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Visualize the noise-ceiling map\n\nThe noise ceiling tells you which voxels are reliably driven\nby the stimulus set: high values mark voxels where repeated\npresentations produce consistent responses, low values mark\nnoise. Looking at this map before any decoding or RSA work\nhelps you decide whether to threshold by NC, restrict to\nhigh-NC voxels, or stay with ROI-based analyses.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "mako_cmap = sns.color_palette(\"mako\", as_cmap=True)\n\nnc_path = f\"/tmp/{subject_id}_{session}_nc.nii.gz\"\nsub.to_nifti(nc, nc_path)\n\nnc_vmax = float(nc.max())\ncuts = [-17, -5, 8]\n\nfig = plt.figure(figsize=(16, 4.5))\ngs = fig.add_gridspec(2, 1, height_ratios=[1, 0.05], hspace=0.1)\nstrip_gs = gs[0].subgridspec(1, 3, wspace=0.05)\naxes = [fig.add_subplot(strip_gs[0, i]) for i in range(3)]\ncbar_ax = fig.add_subplot(gs[1])\n\nfor ax, z in zip(axes, cuts):\n    plotting.plot_stat_map(\n        nc_path, bg_img=bg_img, axes=ax,\n        display_mode=\"z\", cut_coords=[z],\n        cmap=mako_cmap, vmax=nc_vmax, colorbar=False,\n        black_bg=False, threshold=0.1,\n    )\n\nsm = plt.cm.ScalarMappable(\n    cmap=mako_cmap, norm=Normalize(vmin=0, vmax=nc_vmax),\n)\nfig.colorbar(\n    sm, cax=cbar_ax, orientation=\"horizontal\",\n    label=f\"{session} noise ceiling\",\n)\nplt.show()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Stimulus images (when uploaded)\n\nStimuli are forward-compatible: the API is in place but the\nimages themselves arrive in the bucket later. Until then, the\ncall below will raise ``StimuliNotDownloadedError`` -- that's\nthe intended signal.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# images = sub.get_images()\n# print(f\"Images: {len(images)}\")"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.12.13"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}