Option C — full GLMM refit at HEALPix nside=128. This notebook is
a thin orchestration wrapper around the four cleaning + indexing
scripts in healpix_port/, fitting the Soroye-2020 mechanism on the
HEALPix nside=128 NESTED substrate (~46 km cells, ~16x finer in
area than the reference repo’s nside=64 fit).
Per DOMAIN.md, HEALPix indexing is always NESTED (hierarchical
bit-shift refinement: parent = pix >> 2, children = (pix << 2) | k).
We use healpix-geo, not healpy, because healpy’s
cosmology-first lon convention (0-360) and absent CRS handling
accumulate small biases that are unacceptable for biodiversity-precision
climate-impact work over Iberia.
Cross-substrate frame: this repo is the nside=128 sibling of
annefou/weatherxbiodiversity-projection (commit b7cdd47), which
runs the same pipeline at nside=64. The headline question for THIS
repo is whether the substrate-robustness verdict (sc_TEI_delta within
+-30% of weatherxbio v0.2.1’s CEA value of +0.479) still holds at
this finer scale.
Scripts run in order:
01_clean_data_iberia_healpix.py— clean GBIF Iberia + Kerr 2015 species filter + HEALPix nside=128 cell assignment.02_presence_absence_healpix.py— per-(species x cell x period_season) presence/absence on the flat Iberia HEALPix cell list (440 cells fromdata/precomputed/iberia_pix_nside128_nested.npy).03_sampling_continent_healpix.py— per-cell sampling-effort raster (distinct LYIDs) + continent code (constant 2 / Europe).04_climate_tei_pei_healpix.py— bilinear-sample CRU TS 3.24.01 at HEALPix cell centres + compute TEI / PEI per species per cell per period. Per-species niche limits derived from nside=128 baseline-occupied cells (Option C — not species-level inheritance from nside=64).
import os
import subprocess
import sys
from pathlib import PathROOT = Path("..").resolve()
PORT = ROOT / "healpix_port"
OUT_DIR = PORT / "outputs_iberia"
env = {**os.environ}
def run(script: str) -> None:
print(f"\n=== {script} ===", flush=True)
subprocess.run(
[sys.executable, script],
cwd=PORT,
env=env,
check=True,
)Run the four upstream cleaning + indexing scripts (HEALPix substrate)¶
run("01_clean_data_iberia_healpix.py")
run("02_presence_absence_healpix.py")
run("03_sampling_continent_healpix.py")
run("04_climate_tei_pei_healpix.py")Summary of intermediate artefacts produced¶
expected = [
OUT_DIR / "bombus_clean_healpix.csv",
OUT_DIR / "presence_absence_healpix.nc",
OUT_DIR / "sampling_continent_healpix.nc",
OUT_DIR / "climate_tei_pei_healpix.nc",
]
print("\nIntermediate artefacts (HEALPix nside=128 NESTED):")
for p in expected:
if p.exists():
size = p.stat().st_size
print(f" ok {p.relative_to(ROOT)} ({size:,} bytes)")
else:
print(f" MISS {p.relative_to(ROOT)}")import pandas as pd # noqa: E402
import xarray as xr # noqa: E402
clean_csv = OUT_DIR / "bombus_clean_healpix.csv"
if clean_csv.exists():
df = pd.read_csv(clean_csv)
print(f"\nbombus_clean_healpix.csv -> {len(df):,} rows, "
f"{df['species'].nunique()} species")
print(f" HEALPix cells with at least one occurrence: "
f"{df['cell_id_hp'].nunique()}")
print(f" periods present: {sorted(df['period'].dropna().unique().tolist())}")
print(f" seasons present: {sorted(df['season'].dropna().unique().tolist())}")
pa_nc = OUT_DIR / "presence_absence_healpix.nc"
if pa_nc.exists():
pa = xr.open_dataset(pa_nc)
depth = pa.attrs['dggs_grid_refinement_level']
print(
f"\nIberia HEALPix substrate: {int(pa.sizes['cells'])} cells "
f"(nside={2**depth}, depth={depth}, "
f"scheme={pa.attrs['dggs_grid_indexing_scheme']!r})"
)