{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\n# Loading Data\n\nLoad single-trial betas, noise-ceiling maps, ROI masks, and stimulus\nimages.\n\nEvery accessor maps to one file in the bucket: it returns the raw\ncontents of the file you pick. Combining sessions, averaging\nacross trials, or rebinning is the caller's responsibility.\n\nThe \"brain mask\" is **derived on the fly** from the subject-level\nmean-R^2 map (``..._stat-rsquare_desc-R2mean_statmap.nii.gz``):\nvoxels with any non-zero GLMsingle fit are considered \"in brain\".\nThe bucket does not ship a separate brain-mask file.\n\n<div class=\"alert alert-info\"><h4>Note</h4><p>Run :doc:`plot_01 <plot_01_quickstart>` first so at least one\n   subject is downloaded into the shared quickstart directory. A\n   single session of full-brain betas is ``~1000 trials \u00d7 ~270k\n   voxels \u00d7 4 bytes \u2248 1 GB``; pass an ``roi=`` filter to keep\n   per-call memory in the tens of MB.</p></div>\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Bind the quickstart's data directory\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import os\n\nfrom laion_fmri.config import dataset_initialize, get_data_dir\n\ndata_dir = os.environ.get(\n    \"LAION_FMRI_EXAMPLE_DATA_DIR\",\n    os.path.join(os.getcwd(), \"laion_fmri_quickstart\"),\n)\nos.makedirs(data_dir, exist_ok=True)\ndataset_initialize(data_dir)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Load a subject and pick a session\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "from laion_fmri.subject import load_subject\n\nsubject_id = \"sub-01\"\nsession = \"ses-01\"\nroi = \"FFA1\"\n\nsub = load_subject(subject_id)\nprint(f\"Subject: {subject_id} | session: {session}\")\nprint(f\"Voxels in brain mask: {sub.get_n_voxels()}\")\n\navailable_rois = sub.get_available_rois()\nprint(f\"Primary ROI: {roi}\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Single-trial betas for one session\n\nReturns ``(n_trials, n_voxels)``. **Always pass an ROI filter\nunless you really want the full brain-masked array** -- the\n``roi=`` form drops memory by 1-2 orders of magnitude.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "if roi is not None:\n    betas_roi = sub.get_betas(session=session, roi=roi)\n    print(f\"{roi} ROI:           {betas_roi.shape}\")\n\nbetas_nc = sub.get_betas(session=session, nc_threshold=0.2)\nprint(f\"NC > 0.2:            {betas_nc.shape}\")\n\nif roi is not None:\n    betas_both = sub.get_betas(\n        session=session, roi=roi, nc_threshold=0.3,\n    )\n    print(f\"ROI + NC > 0.3:      {betas_both.shape}\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Visualize the first trial with ROI contour overlays\n\nOverlaying the canonical face / body / place ROIs on a\nsingle trial's response lets you check whether your regions\nof interest sit where the activity actually is. Single\ntrials are noisy and stimulus-dependent, so don't expect\nevery ROI to show a hotspot for every trial -- you should\nat least see plausible signal somewhere near the contours\nrather than uniform noise. The betas are rendered in\ngreyscale so the colored contours stay visually dominant.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import warnings\n\nimport numpy as np\nimport matplotlib.pyplot as plt\nimport seaborn as sns\nfrom matplotlib.colors import Normalize\nfrom matplotlib.lines import Line2D\nfrom nilearn import plotting\n\nfrom laion_fmri._paths import r2mean_path\n\n# Nilearn warns about NaN / inf voxels from GLMsingle non-fits;\n# they're outside the brain mask and don't affect the rendering.\nwarnings.filterwarnings(\n    \"ignore\",\n    message=\"Non-finite values detected\",\n    category=UserWarning,\n)\n\nbg_img = str(r2mean_path(get_data_dir(), subject_id))\nstat_cmap = \"gray\"\noverlay_rois = (\"FFA1\", \"OFA\", \"PPA\", \"EBA\", \"FBA\", \"MT\")\nroi_colors = dict(\n    zip(overlay_rois, sns.color_palette(\"colorblind\")),\n)\n\nfirst_beta = sub.get_betas(session=session)[0]\nbeta_path = f\"/tmp/{subject_id}_{session}_trial0_full.nii.gz\"\nsub.to_nifti(first_beta, beta_path)\n\nroi_paths = {}\nfor r in overlay_rois:\n    p = f\"/tmp/{subject_id}_roi_{r}.nii.gz\"\n    sub.to_nifti(sub.get_roi_mask(r).astype(\"float32\"), p)\n    roi_paths[r] = p\n\nvmax = float(np.percentile(np.abs(first_beta), 99))\ncuts = [-17, -5, 8]\n\nfig = plt.figure(figsize=(16, 4.7))\ngs = fig.add_gridspec(\n    3, 1, height_ratios=[0.10, 1, 0.05], hspace=0.1,\n)\nlegend_ax = fig.add_subplot(gs[0])\nlegend_ax.axis(\"off\")\nstrip_gs = gs[1].subgridspec(1, 3, wspace=0.05)\naxes = [fig.add_subplot(strip_gs[0, i]) for i in range(3)]\ncbar_ax = fig.add_subplot(gs[2])\n\nfor ax, z in zip(axes, cuts):\n    display = plotting.plot_stat_map(\n        beta_path, bg_img=bg_img, axes=ax,\n        display_mode=\"z\", cut_coords=[z],\n        cmap=stat_cmap, vmax=vmax, colorbar=False,\n        black_bg=False, threshold=0.5,\n    )\n    for roi, color in roi_colors.items():\n        display.add_contours(\n            roi_paths[roi], levels=[0.5],\n            colors=[color], linewidths=1.5,\n        )\n\nproxies = [\n    Line2D([0], [0], color=c, lw=2.0, label=r)\n    for r, c in roi_colors.items()\n]\nlegend_ax.legend(\n    handles=proxies, loc=\"center\", ncol=len(overlay_rois),\n    frameon=False, handlelength=1.2,\n)\n\nsm = plt.cm.ScalarMappable(\n    cmap=stat_cmap, norm=Normalize(vmin=-vmax, vmax=vmax),\n)\nfig.colorbar(\n    sm, cax=cbar_ax, orientation=\"horizontal\",\n    label=f\"{session} trial 0 \u03b2\",\n)\nplt.show()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Single-trial filtering by stimulus type\n\nRestrict to trials whose stimulus is in the shared / unique\nsubset (relies on the dataset-level stimulus metadata, which\nthe bucket doesn't yet expose).\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "if roi is not None and sub.has_stimuli():\n    betas_shared = sub.get_betas(\n        session=session, roi=roi, stimuli=\"shared\",\n    )\n    print(f\"Shared trials:       {betas_shared.shape}\")\nelse:\n    print(\n        \"Skipped: stimulus subset filter needs stimuli/stimuli.tsv.\"\n    )"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Custom voxel mask\n\nCombine the ROI mask and the noise-ceiling map yourself, then\npass the result back in via ``mask=``.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "if roi is not None:\n    roi_mask = sub.get_roi_mask(roi)\n    nc = sub.get_noise_ceiling(session=session)\n    custom_mask = roi_mask & (nc > 0.25)\n    print(f\"Custom mask voxels: {custom_mask.sum()}\")\n\n    betas_custom = sub.get_betas(session=session, mask=custom_mask)\n    print(f\"Custom betas:       {betas_custom.shape}\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## ROI masks (multi-level query)\n\n``get_roi_mask`` accepts a specific ROI name, a category, or\n``\"all\"``. Pass a list to combine several at once -- overlapping\nvoxels appear only once in the resulting mask.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "if available_rois:\n    if roi is not None:\n        single = sub.get_roi_mask(roi)\n        print(f\"  {roi}: {single.sum()} voxels\")\n    categories = sub.get_available_categories()\n    if categories:\n        first_cat = categories[0]\n        cat_mask = sub.get_roi_mask(first_cat)\n        print(f\"  {first_cat} (category): {cat_mask.sum()} voxels\")\n    union = sub.get_roi_mask(\"all\")\n    print(f\"  all: {union.sum()} voxels\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Visualize every face-category ROI\n\nInspecting every face-category ROI in one row surfaces\nthings you'd miss looking at one at a time: which ROIs are\npresent for this subject (some are absent for some\nsubjects), how the various face areas spatially relate to\neach other, and whether any region looks unexpectedly small\nor empty -- a sign that the localizer underperformed there\nand that ROI may not be reliable to analyse with.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import nibabel as nib\nfrom nilearn.plotting import find_xyz_cut_coords\n\nface_rois = sub.get_available_rois(category=\"face\")\nif not face_rois:\n    print(\"No face-category ROIs on disk for this subject.\")\nelse:\n    palette = sns.color_palette(\"colorblind\")\n    n = len(face_rois)\n    fig, axes = plt.subplots(1, n, figsize=(5.3 * n, 3.7))\n    axes = [axes] if n == 1 else list(axes)\n\n    for ax, roi, color in zip(axes, face_rois, palette):\n        roi_path = f\"/tmp/{subject_id}_roi_{roi}.nii.gz\"\n        sub.to_nifti(\n            sub.get_roi_mask(roi).astype(\"float32\"), roi_path,\n        )\n        _, _, z = find_xyz_cut_coords(nib.load(roi_path))\n        display = plotting.plot_anat(\n            bg_img, axes=ax,\n            display_mode=\"z\", cut_coords=[z],\n            black_bg=False, threshold=0.1, colorbar=False,\n        )\n        display.add_contours(\n            roi_path, levels=[0.5],\n            colors=[color], linewidths=1.5,\n        )\n        ax.set_title(roi)\n    plt.show()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Noise ceiling\n\nUse the per-session map when you're analysing data within one\nsession (e.g. selecting reliable voxels for a single-session\ndecoder). For cross-session work, pick one of the subject-level\naggregates -- ``Noiseceiling4rep`` and ``Noiseceiling12rep`` are\ncomputed only over stimuli that have at least 4 / 12 repetitions\nin the dataset; ``NoiseceilingAllrep`` uses every repetition.\nMore repetitions tighten the estimate but include fewer stimuli,\nso the right variant depends on whether you'd rather have a\nstable ceiling or full coverage.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "nc_session = sub.get_noise_ceiling(session=session)\nprint(\n    \"Per-session NC: \"\n    f\"shape={nc_session.shape}, \"\n    f\"range=[{nc_session.min():.3f}, {nc_session.max():.3f}]\"\n)\n\n# Switch to a subject-level aggregate by passing ``desc=`` instead\n# of ``session=``. The line below is commented out to avoid an\n# extra download for the example, but the call is identical:\n#\n#     nc_subj = sub.get_noise_ceiling(desc=\"Noiseceiling12rep\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Trial info and stimulus metadata\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "trial_info = sub.get_trial_info(session=session)\nprint(f\"Trials in {session}: {len(trial_info)}\")\nprint(trial_info.head())\n\nif sub.has_stimuli():\n    stim_meta = sub.get_stimulus_metadata()\n    print(f\"Stimulus metadata rows: {len(stim_meta)}\")\nelse:\n    print(\"Stimulus metadata not yet uploaded to the bucket.\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Stimulus images\n\nSkipped automatically when the bucket's ``stimuli/`` prefix is\nnot yet populated.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "if sub.has_stimuli():\n    images = sub.get_images()\n    print(f\"Images:          {len(images)} PIL items\")\n\n    single_img = sub.get_image(idx=0)\n    print(f\"First image:     {single_img.size}\")\nelse:\n    print(\"No stimulus images on disk yet.\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Brain-space mapping: save derived results as NIfTI\n\n``Subject.to_nifti`` is the inverse of \"load + brain-mask\": it\nscatters a 1-D per-voxel array (length = ``n_brain_voxels``)\nback into a 3-D ``(X, Y, Z)`` NIfTI on the subject's image\ngrid, with zeros outside the brain mask. That makes any\nper-voxel statistic you computed in masked space round-trip\nback to disk for downstream tools (``fslview``, ``nilearn``,\n``mricron``, ...).\n\nExample: trial-mean betas as a 3-D map.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "mean_betas = sub.get_betas(session=session).mean(axis=0)\nprint(f\"per-voxel mean shape: {mean_betas.shape}\")\n\nmean_path = f\"/tmp/{subject_id}_{session}_mean_betas.nii.gz\"\nsub.to_nifti(mean_betas, mean_path)\nprint(f\"Saved {mean_path}\")\n\n# ``to_nifti`` also knows about ROI / mask filters, so an\n# ROI-restricted result lands in the right voxels:\nffa1 = sub.get_betas(session=session, roi=\"FFA1\").mean(axis=0)\nsub.to_nifti(\n    ffa1, f\"/tmp/{subject_id}_{session}_FFA1_mean.nii.gz\", roi=\"FFA1\",\n)\n\n# If you also need the (i, j, k) location of each voxel --\n# for example to build a custom voxel selection by spatial\n# proximity, or to overlay results outside ``to_nifti``'s\n# round-trip -- ``get_voxel_coordinates`` returns them in the\n# same order as the 1-D arrays from ``get_betas`` and\n# ``get_noise_ceiling``, so they line up index-for-index.\ncoords = sub.get_voxel_coordinates()\nprint(f\"Voxel coordinates: {coords.shape}\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Multi-subject group loading\n\n``Group`` holds several ``Subject`` instances and exposes\ncross-subject loaders that delegate to each one.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "from laion_fmri._paths import glmsingle_subject_dir\nfrom laion_fmri.group import load_subjects\n\n# Group loading reads each subject's local files. List the\n# subjects you want explicitly; we keep the on-disk filter so the\n# example doesn't blow up if only some have been downloaded.\ngroup_subjects = [\"sub-01\", \"sub-03\"]\non_disk = [\n    s for s in group_subjects\n    if glmsingle_subject_dir(get_data_dir(), s).is_dir()\n]\ngroup = load_subjects(on_disk)\nprint(f\"Group size: {len(group)}\")\n\n# Shared-stimulus betas need stimulus metadata.\nif roi is not None and sub.has_stimuli():\n    shared = group.get_shared_betas(session=session, roi=roi)\n    for sub_id, arr in shared.items():\n        print(f\"  {sub_id}: {arr.shape}\")\nelse:\n    print(\n        \"Skipped: shared-stimulus betas need stimuli/stimuli.tsv.\"\n    )"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## PyTorch dataset integration\n\nWhen you want to feed betas straight into a PyTorch training\nloop, ``to_torch_dataset`` wraps one session as a\n``torch.utils.data.Dataset`` whose items pair each trial's betas\nwith the matching stimulus image (plus a few bookkeeping fields).\nThe PyTorch dependencies are optional -- install them with the\n``[torch]`` add-on if you want this integration:\n\n```bash\nuv pip install \"laion-fmri[torch]\"\n```\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# The PyTorch dataset pairs each beta with a stimulus image, so it\n# requires the stimuli/ prefix to be populated.\nif sub.has_stimuli():\n    from torch.utils.data import DataLoader\n\n    dataset = sub.to_torch_dataset(session=session, roi=roi)\n    print(f\"Dataset length: {len(dataset)}\")\n\n    sample = dataset[0]\n    print(f\"betas: {sample['betas'].shape}\")\n    print(f\"image: {sample['image'].shape}\")\n\n    loader = DataLoader(dataset, batch_size=16, shuffle=True)\n    for batch in loader:\n        print(f\"Batch betas: {batch['betas'].shape}\")\n        print(f\"Batch image: {batch['image'].shape}\")\n        break\nelse:\n    print(\n        \"PyTorch dataset needs stimulus images; skipping until \"\n        \"the bucket's stimuli/ is populated.\"\n    )"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.12.3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}