# Real CMO microscope calibration on Pycaso data

## A rayfield-based case study with legacy ChArUco images

### What this page is about, in one paragraph

Most cameras have **one optical center** — a single point through which all
light rays appear to pass.  Stereo microscopes of the **Common Main Objective**
(CMO) family do not.  They use a single large lens shared between two
off-axis sub-apertures, so each channel's chief rays appear to originate from a
**different point**, a few millimetres apart.  OpenCV's standard stereo
calibration assumes one optical center per camera; it fails on this
architecture.  This page documents a complete calibration of a real CMO
microscope using a different approach: **measure the rays first, identify the
optics afterwards**.

![CMO architecture diagram](assets/diagrams/cmo_physical.png)

*The CMO architecture: two off-axis sub-pupils share one main objective.
The chief rays from each channel converge toward the object plane through
different effective origins — not through a single common center.*

### What we measured

On 10 stereo pairs of ChArUco images from the
[Pycaso](https://github.com/LaboratoireMecaniqueLille/Pycaso) open dataset, the
StereoComplex pipeline produces:

| Quantity | Value | What it tells us |
|---|---|---|
| **Compact physical CMO model** (26 params, with SE(3) arm) | **1.06 px** (P50=0.87, P95=1.84) | Best usable physical model under the 1.5 px operational BIC guard |
| Flexible Zernike rayfield (57 params, non-parametric) | 0.47 px (P50=0.34, P95=0.86) | Approximate noise floor of corner detection |
| OpenCV standard stereo calibration | **> 300 px** | Standard pinhole stereo fails on this architecture |
| Naive perspective CMO model (19 params) | ~86 px | Wrong family — direction field is telecentric, not perspective |
| Stereo baseline $b$ | **24.9 mm** | Distance between the two effective sub-pupils |
| Working distance $WD$ | **64.7 mm** | Object plane distance |
| Objective focal length $f_{\text{obj}}$ | **62.2 mm** | Read from the rayfield geometry |
| Stereo convergence angle $\theta$ | **22.6°** | Inter-channel angular separation |

The **headline result** is the 26-parameter physical model: it reaches
1.06 px on a 2048×2048 sensor (P50 = 0.87 px, P95 = 1.84 px) — within
2.3× of the non-parametric noise floor — while using less than half the
parameters and remaining fully interpretable in terms of sub-pupils, focal length, telecentricity,
and an SE(3) arm correction per channel.  The geometric descriptors
($b$, $WD$, $f_{\text{obj}}$, $\theta$) are not the output of fitting:
they are **directly read** from the measured rayfield at the centre pixel
— physical-scale quantities that can be compared with microscope
geometry or manufacturer specifications.

### What this case study claims, and what it does not

**Claims, with evidence:**

- StereoComplex calibrates a real CMO microscope where standard OpenCV
  stereo calibration fails (1.06 px vs > 300 px).
- A compact, interpretable physical model with 26 parameters reaches
  pixel residuals within 2.3× of a 57-parameter non-parametric reference,
  with a decisive BIC margin over all alternative model families
  (ΔBIC > 40 000 vs pinhole, Brown, and parallel-plate).
- The measured rayfield exposes physical geometry that OpenCV cannot
  (effective sub-pupils, working distance, baseline, convergence angle).
- A minimal perspective CMO model fails to explain the field across the
  full FOV (3× discrepancy in $d_y$ range), pointing to a telecentric
  optical architecture that naive perspective cannot capture.
- The rayfield works as a *diagnostic tool*: residual analysis on the
  Zernike basis identifies which degrees of freedom the physical model
  is missing (Step 8), guiding the SE(3) arm correction.

**Does not claim:**

- Absolute metrological accuracy validated against an independent 3D
  reference.  All numbers are *internal* to the rayfield representation.
- That the 26-parameter model captures all aberrations.  The remaining
  ~0.6 px gap above the Zernike noise floor reflects distributed
  low-amplitude effects (field curvature, astigmatism) that a compact
  physical model cannot represent.
- That OpenCV cannot be tuned to handle CMO — only that the standard
  central stereo calibration, with the configuration tested, does not.

> **The executable protocol** is
> [Notebook 09](../examples/notebooks/09_pycaso_real_data.py).
> Run it with `python examples/notebooks/09_pycaso_real_data.py` to
> reproduce all numerical values in this page.

## Claims and evidence

| Claim | Evidence | Status |
|---|---|---|
| Pycaso dataset can be processed as legacy ChArUco | Detection with `DICT_6X6_250` + `setLegacyPattern(True)` | Supported |
| Hessian completion fills all 165 corners | $\|\det H\|$ + Otsu + barycentre | Supported |
| Double TPS eliminates the pose/rayfield gauge | Z₀ drift drops from 8.5° to 0.023° | **Key result** |
| Zernike rayfield reaches subpixel calibration | 0.47 px local pixel-equivalent RMS | Supported |
| Physical descriptors are read directly from $(O, d)$ | $b, WD, f_{\text{obj}}, \theta$ without model fit | Diagnostic |
| $d_y(u,v)$ reveals telecentricity | 3× range difference vs perspective | Diagnostic |
| Residual modal analysis identifies missing DOF | $\Delta d$ and $\Delta m$ are 97–98 % $Z_0^0$ (global, not spatial) | **Diagnostic method** |
| SE(3) arm alignment resolves the global residual | 14.6 → 1.06 px (14× improvement) | **Key result** |
| BIC model selection: ray-space identifies family, operational BIC selects usable model | Ray-space BIC confirms telecentric family; operational BIC (with 1.5 px guard) selects 26p as best usable | **Key result** |
| The rayfield is a general diagnostic instrument | Observe → diagnose → fix → verify loop | **General strategy** |

## What this case study does **not** evaluate

- It does **not** validate absolute metrological accuracy on an independent 3‑D object.
- It does **not** estimate a full uncertainty budget.
- It does **not** prove that the SE(3) arm transforms correspond to specific
  physical misalignments (they are an effective parameterisation).
- It does **not** test generalisation to other microscopes or datasets.

## The dataset

| Property | Value |
|---|---|
| Sensor | 2048 × 2048 px |
| Board | Legacy ChArUco, 16 × 12 squares, 0.3 mm |
| Dictionary | DICT_6X6_250, `setLegacyPattern(True)` |
| Frames | 10 stereo pairs |
| Z range | 2.65 – 3.35 mm (Δ = 0.70 mm) |

The 10 frames span a narrow depth range (0.70 mm) typical of
high-magnification microscopy.  With 165 corners per frame, we have
3300 ray observations (165 × 10 × 2 channels) — well-conditioned for
the 57-parameter Zernike fit.  Ten frames is sufficient for this
dataset; the calibration remains stable with as few as 6 frames.

The dataset is **not vendored** in the StereoComplex repository.  Clone
[Pycaso](https://github.com/LaboratoireMecaniqueLille/Pycaso) at
`examples/pycaso_data`.

## Pipeline

```text
ChArUco legacy detection (DICT_6X6_250, setLegacyPattern)
       ↓
Hessian corner completion (|det H| + Otsu + barycentre)  →  165/165 corners
       ↓
Ray2D TPS denoising on ArUco markers → predict 165 ChArUco
       ↓
TPS re-denoising on completed 165 corners (λ=3, Huber c=1.5)
       ↓
Constrained Zernike rayfield O(0)+d(2), shared R+XY, per-pose Z
       ↓
Stability test: ΔZ₀ < 0.1° between constrained and full-pose fits
       ↓
Read CMO descriptors from (O, d)
       ↓
Propose physical models → fit → residual analysis → iterate
```

Ray2D TPS is a **purely 2‑D regularisation step**.  It does not assume
any 3‑D camera model — it predicts or regularizes missing or noisy ChArUco grid corners
based on their neighbours, using a homography + thin-plate spline
residual field.  This step does not impose a 3‑D camera model; its
validity is checked afterwards through rayfield gauge stability.

### The double TPS pass

The second TPS pass is critical for rayfield stability:

1. TPS on ArUco marker corners predicts all 165 ChArUco grid corners.
2. A second TPS pass uses the completed 165 corners themselves as control
   points with tighter smoothing (λ = 3, Huber c = 1.5).

Before double TPS, the constrained and full-pose Zernike fits produce
dramatically different rayfields (Z₀ drift = 8.5°, baseline 17 ↔ 28 mm).
After double TPS, the gauge ambiguity vanishes (Z₀ drift = 0.023°).
The Zernike rayfield becomes a **stable experimental oracle**.

The double TPS is a denoising regularizer whose validity is confirmed
not by the 2‑D residual alone, but by the disappearance of gauge drift
in the 3‑D Zernike fit.

### Error metric

> **The reported residual is not an OpenCV reprojection RMS.**

For each observed pixel, the fitted ray is intersected with the estimated
board plane.  The 3‑D distance to the corresponding board point is
converted to a **local pixel-equivalent residual**:

$$e_{\text{px}} \approx \frac{e_{\text{mm}}}{|t|} f_x.$$

This is a local first-order approximation, not an image-plane reprojection
residual from a projective camera model.

## Step-by-step: from rayfield to physical model

### Step 1 — The Zernike rayfield as observable

The Zernike rayfield $\mathcal{R}(u,v) = (O(u,v), d(u,v))$ maps each
pixel to a 3‑D line.  We fit O(0) + d(2): rigid sub-pupil per channel
(origin order 0), spatially-varying direction correction (direction
order 2), with constrained poses (shared rotation + XY, per-pose Z).
This gives 57 parameters total.  The fit reaches **0.47 px** local
pixel-equivalent RMS.

From the centre-pixel ray $(O, d)$ we **read physical descriptors
directly** — no model fit required:

| Descriptor | Symbol | How to read it | Value |
|---|---|---|---|
| Stereo baseline | $b$ | $\|O_R - O_L\|$ | **24.9 mm** |
| Sub-pupil depth | $z_p$ | $(|O_{L,z}| + |O_{R,z}|)/2$ | **2.5 mm** |
| Working distance | $WD$ | Mean of pose Z estimates | **64.7 mm** |
| Objective focal length | $f_{\text{obj}}$ | $WD - z_p$ | **62.2 mm** |
| Convergence angle | $\theta$ | $\arccos(d_L \cdot d_R)$ | **22.6°** |

These are coordinates in millimetres, expressed in the camera frame:
the left sub-pupil sits 12.7 mm to the left of the optical centre,
0.1 mm above, and 2.7 mm forward of the principal plane.  The baseline
$b = \|O_R - O_L\| = 24.9$ mm is a physical length you could verify
at the microscope mount.

These are **not fitted physical CMO parameters** — they are rayfield
readouts under a constrained Zernike gauge.

### Step 2 — Perspective CMO: the baseline hypothesis

The simplest CMO model assumes each channel is a perspective camera
viewing the object through a decentered sub-pupil.  Rays originate from
$S_c = (\pm b/2,\; 0,\; WD - f_{\text{obj}})$ and fan out to the sensor,
predicting $d_y(u,v) \propto (v - c_y)$.

**What we observe.**  The Zernike $d_y$ field is **nearly constant**
across the field (range = 0.079, mean = +0.059), while the perspective
CMO predicts a gradient from −0.116 to +0.116 (range = 0.232) — a
**3× range difference**.

**Diagnosis.**  The near-constant $d_y$ is the signature of **object-space
telecentricity**: the chief rays are almost parallel, not diverging from
a point.  No adjustment of principal point, distortion, or pitch can
fix a 3× structural mismatch — we need a different model family.

### Step 3 — Telecentric CMO: matching the observed structure

The rayfield tells us what the model should look like:

- **Origins** are well described by rigid sub-pupils.
- **Directions** are nearly constant, with weak affine variations — no
  perspective gradient.

This leads to `CMOTelecentricStereoModel`:

$$O_c = S_c = (\pm b/2,\; 0,\; WD - f_{\text{obj}})$$

$$d_c(u,v) = \operatorname{normalize}\left(d_{c,0} + s_x \tilde{u}\,
e_x + s_y \tilde{v}\, e_y + \text{cross} + \text{quadratic}\right)$$

The key difference: **the direction is not derived from a point
projection.**  Instead, $d(u,v)$ is directly parameterised as an affine
function of pixel position.  Adding pupil shear ($\rho_x, \rho_y$) —
an affine variation of the origin transverse to the direction — gives
the **14-parameter** variant.

**Result:**

| Metric | Perspective CMO | Telecentric + shear |
|---|---|---|
| Ray RMS (two-plane) | 3.48 mm | **0.12 mm** (29× better) |
| Pixel RMS | 86 px | **14.6 px** (5.9× better) |
| Pixel P50 | — | 13.2 px |
| Pixel P95 | — | 22.4 px |
| Parameters | 19 | 14 |

The 14-parameter telecentric model captures the dominant geometry with
fewer parameters and far better fidelity — because the model family
matches the observed structure.

### Step 4 — Residual analysis: what is the model still missing?

The telecentric model reaches 0.12 mm ray RMS but plateaus at ~14.6 px
reprojection.  We compute the residual against the Zernike oracle:

- **Direction residual:** $\Delta d = d_{\text{Zernike}} - d_{\text{CMO}}$
- **Moment residual:** $\Delta m = m_{\text{Zernike}} - m_{\text{CMO}}$,
  where $m = O \times d$ (the Plücker moment).

Projecting on Zernike modes up to order 4:

| Mode | Δd (L) | Δd (R) | Δm (L) | Δm (R) | Interpretation |
|---|---:|---:|---:|---:|---|
| $Z_0^0$ (piston) | **97 %** | **96 %** | **98 %** | **98 %** | **Global offset** |
| $Z_1^1$ (tilt) | 2 % | 3 % | 2 % | 2 % | Negligible |
| All $n \ge 2$ | < 0.5 % | < 1 % | < 0.1 % | < 0.1 % | Negligible |

**Both Δd and Δm are dominated by $Z_0^0$ — a constant mode.**  A
$Z_0$-dominated residual is a **global line-bundle offset**, not a
spatial field distortion.  The two optical arms each have a small rigid
misalignment relative to the ideal CMO skeleton.

### Step 5 — Testing alternative hypotheses

Before committing to arm alignment, we test two alternatives:

**Hypothesis A — Image-space pre-warp.**  Add a polynomial
$\xi = W(u,v)$ before the direction model.

| Model | Params | Ray RMS | Pixel RMS | P50 | P95 |
|---|---|---|---|---|---|
| Telecentric L0 | 14 | 0.118 mm | 14.6 px | 13.2 px | 22.4 px |
| + affine warp | 20 | 0.115 mm | 16.0 px (worse) | 14.3 px | 25.0 px |
| + quadratic warp | 26 | 0.115 mm | 16.5 px (worse) | 15.1 px | 25.0 px |

The pre-warp degrades pixel RMS — consistent with the $Z_0$ diagnostic
(a warp would produce spatial, not global, changes).

**Hypothesis B — Spatially varying origin.**  Fit affine and quadratic
transverse origin fields with direction fixed.

| Origin model | Ray RMS | vs constant |
|---|---|---|
| O0 (constant) | 0.117 mm | baseline |
| O1 (affine) | 0.107 mm | 8 % reduction |
| O2 (quadratic) | 0.107 mm | no further gain |

Only 8 % improvement — the residual is not spatial.

### Step 6 — SE(3) arm alignment: the breakthrough

The $Z_0$-dominated residual points to a **global** misalignment.  We
add a per-channel rigid transform to the telecentric model's Plücker
lines:

$$d' = R_c \, d_{\text{tel}}, \qquad O' = R_c \, O_{\text{tel}} + t_c$$

where $(R_c, t_c)$ is a small rotation and translation for each channel
(12 additional parameters, 26 total).

Fitting jointly against the Zernike rayfield:

| Metric | Telecentric L0 | **Telecentric + SE(3)** | Zernike ref |
|---|---|---|---|
| Parameters | 14 | **26** | 57 |
| Ray RMS (mm) | 0.118 | **0.0021** | 0.0007 |
| Direction RMS (°) | 0.27 | **0.003** | 0 |
| Moment RMS (mm) | 0.32 | **0.001** | 0 |
| **Pixel RMS (px)** | 14.6 | **1.06** | 0.47 |
| **Pixel P50 (px)** | 13.2 | **0.87** | 0.34 |
| **Pixel P95 (px)** | 22.4 | **1.84** | 0.86 |


The SE(3) arm alignment reduces pixel RMS by **14×** (14.6 → 1.06 px).
Rotations are stable across runs (~2.5° left, ~3.7° right); translations
are sub-mm but trade off with telecentric base parameters.

> **Important caveat — what the 0.0007 mm Zernike RMS really means.**
>
> The Zernike model's two-plane RMS of 0.0007 mm is a **self-evaluation
> residual**: it measures how accurately a 57-parameter Zernike model
> reconstructs its own ray-field on the support points it was fitted to.
> It is not an absolute physical accuracy.  By construction, a model with
> enough degrees of freedom will reproduce itself nearly perfectly.
>
> The Telecentric models (14 to 26 parameters) are evaluated **against the
> Zernike rayfield**, using it as a reference.  Their two-plane RMS of
> 0.002–0.118 mm therefore reflects two things: (i) the structural
> mismatch between a compact physical model and the flexible Zernike
> representation, and (ii) the noise that Zernike absorbed but that no
> physical model should reproduce.
>
> For an apples-to-apples comparison, the **pixel RMS column** is the right
> reference: it measures each model against the same observable — the
> ChArUco corner detections.  There:
>
> - Zernike (57 params, fitted to corners) achieves 0.47 px RMS,
>   approximately the noise floor of ChArUco corner detection.
> - The Telecentric model with 14 parameters, fitted to the Zernike
>   rayfield and evaluated on the same corners, achieves ~14.6 px RMS.
> - The SE(3)-aligned Telecentric model (26 params) achieves 1.06 px RMS
>   (P50 = 0.87 px, P95 = 1.84 px) — a 14× improvement over the base
>   telecentric, and within 2.3× of the Zernike reference.
> - The perspective CMO (19 params) achieves 86 px RMS — structurally
>   inadequate for this microscope architecture.
>
> The Telecentric + SE(3) model is therefore not a *replacement* for
> Zernike when minimal pixel residual is the goal.  It is a *compact
> physical explanation* of the dominant CMO geometry, designed using the
> Zernike rayfield as a diagnostic tool.  The remaining pixel gap (1.06 px
> vs 0.47 px) reflects distributed low-amplitude aberrations that a
> 26-parameter physical model cannot capture — field curvature,
> astigmatism, and other real microscope optics.

### Step 6b — Ablation: which SE(3) parameters are essential?

| Variant | Params | Ray RMS | Px RMS | P50 | P95 |
|---|---|---|---|---|---|
| Telecentric (baseline) | 14 | 0.048 mm | 14.6 px | 13.2 px | 22.4 px |
| + Rotation only L/R | 20 | 0.014 mm | 3.74 px | 2.68 px | 7.13 px |
| + Translation only L/R | 20 | 0.010 mm | 2.44 px | 2.06 px | 4.15 px |
| **+ Full SE(3) L/R** | **26** | **0.0021 mm** | **1.06 px** | **0.87 px** | **1.84 px** |
| + Shared rotation | 23 | 0.0083 mm | not evaluated | — | — |
| + Shared translation | 23 | 0.0041 mm | not evaluated | — | — |
| + Differential only | 20 | 0.0070 mm | 4.04 px | 2.85 px | 7.74 px |

Pixel RMS was not evaluated for the shared-rotation and shared-translation
variants because their ray-space degradation (+92 % to +289 %) already
disqualifies them — pixel error would be strictly worse than the 26p baseline.

**Both rotation and translation are essential.  Per-arm DOFs are
individually necessary — 26 parameters is the smallest validated compact model among the tested parameterisations.**

### Step 7 — Autopsy of the 26p model and BIC model selection

The 26p model achieves 1.06 px with excellent L/R symmetry (1.10 vs
1.01 px).  Residual direction RMS is 0.003°, moment RMS is 0.0006 mm —
the SE(3) has eliminated the Z₀ piston.

**Formal BIC model selection** on the Pycaso Zernike rayfield.
The ray-space BIC identifies the correct optical
family; the **operational BIC** (with reprojection guard) adds a
reprojection guard: models exceeding 1.5 px incur a hard penalty
($+10^6 + N \log(e_{\text{px}}^2 / 1.5^2)$), enforcing a usability
constraint that the ray-space BIC alone does not capture.

| Model | Params | RMS (mm) | $BIC_{ray}$ | Pixel RMS | $BIC_{usable}$ | Status |
| --- | --- | --- | --- | --- | --- | --- |
| cmo_telecentric_shear | 14 | 0.111 | −36 129 | 14.6 px | +978 890 | REJECTED |
| cmo_telecentric | 12 | 0.146 | −33 201 | 27.7 px | +986 044 | REJECTED |
| **CMO + SE(3) 26p** | **26** | **0.002** | **−32 433** | **1.06 px** | **−32 433** | **BEST USABLE** |
| Zernike O(0)+d(2) | 57 | -- | reference | 0.47 px | reference | best flexible |

The ray-space BIC confirms that the CMO telecentric family is correct
(> 40 000 points over pinhole, Brown-Conrady, parallel-plate).  But the
14p base model is **unusable** at 14.6 px — the operational BIC
correctly rejects it.  The SE(3)-aligned 26p model is the first compact
physical model to pass the 1.5 px usability threshold, making it the
best **usable** physical model.  The shear variant (14 params)
is preferred over no-shear (12 params), confirming that pupil shear
captures meaningful structure (ΔBIC ≈ 2 900).

**What remains after 26p.**  The residual is distributed across Zernike
orders 1–3 with no single dominant block.  PCA on the two-plane residual
reveals effective rank ≈ 4 (96 % variance in 2 modes), but the modes
are **spatially varying** — no global correction helps.  A rank‑2
per-pixel correction would theoretically reach ~0.22 px, but requires
spatial parameterisation (i.e., Zernike flexibility).

**Final model hierarchy:**

| Model | Params | Pixel RMS | P50 | Nature |
|---|---|---|---|---|
| Perspective CMO | 19 | ~86 px | — | Baseline (inadequate) |
| Telecentric L0 | 14 | ~14.6 px | 13.2 px | Correct family, missing DOF |
| **CMO + SE(3)** | **26** | **1.06 px** | **0.87 px** | **Compact physical model** |
| CMO + SE(3) + corner BA | 26 | ~0.98 px | ~0.80 px | Refined (negligible gain) |
| Zernike O(0)+d(2) | 57 | 0.47 px | 0.34 px | Flexible subpixel reference |

### Step 8 — Why the residual analysis was decisive

```text
Residual Δd, Δm projected on Zernike modes
       │
       ├── Z0-dominated (97-98%) → GLOBAL misalignment
       │       │
       │       ├── Pre-warp image? → NO (degrades)
       │       ├── Variable origin? → NO (8% gain)
       │       └── SE(3) arm alignment? → YES (14× improvement)
       │
       └── Higher modes dominant → SPATIAL distortion
               └── (Not what we observe)
```

Without the rayfield, we would be guessing.  The 2‑D reprojection error
tells you *that* the model is wrong, but not *how*.  The Zernike
projection of Δd and Δm tells you exactly what kind of degree of freedom
is missing.

### Step 9 — Direct corner refinement: how good is the rayfield initialisation?

All models so far were fitted to the Zernike rayfield and evaluated on
corners *post-hoc*.  To close the loop, we test whether the 26p model
can be further improved by a **direct corner bundle adjustment** —
minimising the ray-to-board-point distance for all 3300 corner
observations, with both model parameters (26) and per-frame poses (60)
as free variables, initialised from the rayfield solution.

| Stage | Pixel RMS | P50 | P95 |
|---|---|---|---|
| 26p rayfield fit (init) | 1.06 px | 0.87 px | 1.84 px |
| + pose-only BA (420 iters) | ~1.00 px | ~0.82 px | ~1.75 px |
| + joint model+pose BA | ~0.98 px | ~0.80 px | ~1.70 px |

The corner BA improves the pixel RMS by only **~7 %** (1.06 → 0.98 px)
after hundreds of iterations.  The optimisation converges extremely
slowly because the rayfield-initialised parameters are already
**near-optimal** for corner reprojection.

**This is a strong validation of the entire approach.**  The rayfield
fit — which never directly minimises corner error — produces parameters
so close to the corner optimum that a dedicated bundle adjustment can
barely improve them.  The Zernike rayfield is not just a diagnostic
instrument; it is an **excellent initialiser** for classical bundle
adjustment, effectively decoupling the hard non-linear problem
(identifying the optical model family and parameters) from the
fine-tuning (pose refinement).

The subpixel reference remains the Zernike rayfield at 0.47 px.  The
compact 26p model reaches its practical limit at ~1 px — a 2.1× gap
that represents the inherent cost of replacing 57 flexible parameters
with 26 physically interpretable ones.

## The Ray2D → Ray3D feedback loop

The double TPS pass was essential: before it, the Zernike rayfield was
gauge-unstable (Z₀ drift = 8.5°).  After it, the gauge ambiguity
vanishes (Z₀ drift = 0.023°) and the Zernike rayfield becomes a **stable
experimental oracle**.

This feedback loop — Ray2D → Ray3D → diagnose → fix Ray2D → verify with
Ray3D — is a **general strategy** for any stereo calibration pipeline:

```text
   Ray2D: corner detection + completion + TPS denoising
                          ↓
   Ray3D: Zernike rayfield — the experimental oracle
                          ↓
   Read descriptors from (O, d) — baseline, WD, f_obj, θ
                          ↓
   Propose physical model → residual vs Zernike
                          ↓
   Z0-dominated residual? → missing global DOF (SE(3) arms)
   Spatial residual?      → missing field structure (Zernike)
                          ↓
   Add DOF → refit → evaluate → iterate
                          ↓
   Final model: 1.06 px reprojection (P50 = 0.87 px)
```

**Why this is not possible with standard calibration.**  The 2‑D
reprojection error is **blind** to the pose/rayfield gauge — a full-pose
fit can absorb corner noise into rayfield distortions without increasing
pixel RMS.  You can have "good" 2‑D residuals and a physically unstable
rayfield at the same time.  Only the rayfield reveals the problem, and
only the rayfield tells you *which* degree of freedom is missing.

**This is a general strategy, not specific to CMO microscopes.**  Any
stereo calibration pipeline that fits a pixel-to-ray mapping can use the
same test: fit with constrained poses, fit with free poses, compare the
rayfields.  If they differ substantially, your corners are not clean
enough for physically interpretable calibration.

## Limitations

1. **Gauge dependence.**  The Zernike origin $O(u,v)$ is defined up to a
   displacement along the ray direction.  The transverse gauge
   $O(u,v) \cdot d(u,v) = 0$ is enforced.

2. **Constrained poses.**  The shared-rotation + per-pose-Z assumption is
   physically motivated but unverified.

3. **Fixed K.**  The Zernike BA uses a fixed pinhole reference
   ($f_x = 25600$, principal point at image centre).

4. **No independent 3‑D ground truth.**  Residuals are computed on the
   same board points used for calibration.

5. **Single dataset.**  These results are for one specific Pycaso
   microscope and one calibration target.

6. **SE(3) translation parameters are not uniquely identifiable.**  The
   rotation angles (~2.5°, ~3.7°) are stable across optimisation runs,
   but the translation components vary — they trade off with the
   telecentric base parameters (WD, $f_{\text{obj}}$, $b$, principal
   point).  The SE(3) rotation is the robust diagnostic.

7. **The BIC comparison evaluates models in ray space, not pixel space.**
   The two-plane residual metric amplifies angular errors by $\Delta Z$,
   which may penalise models differently than direct corner reprojection.

## Methodology recap: the generalisable workflow

This case study is an instance of a **general method** that can be
applied to any non-standard optical instrument:

1. **Measure the rayfield first** with a flexible non-parametric basis
   (Zernike polynomials).  Do not assume a camera model upfront.
2. **Read physical descriptors directly** from the measured $(O, d)$
   field — baseline, working distance, convergence angle — before
   fitting any model.
3. **Hypothesise a compact physical model** from the observed structure
   ($d_y$ constancy → telecentric; Z₀-dominated residual → global
   arm misalignment).
4. **Validate by BIC** against the rayfield reference.  The ray-space
   BIC identifies the correct optical family; the operational BIC
   (with pixel reprojection guard) selects the usable model.
5. **Iterate via residual analysis.**  Project $\Delta d$ and
   $\Delta m$ onto Zernike modes — if the residual is $Z_0$-dominated,
   the missing DOF is global; if higher modes dominate, the missing
   DOF is a spatial field structure.

This feedback loop — Ray2D preprocessing → Ray3D measurement →
diagnose residual → improve model → verify — is the core contribution
of the StereoComplex framework, independent of the CMO architecture.

## Stabilising the direct BA with a Schur-complement prior

### The problem: pose–intrinsic coupling

Once a physical CMO model is identified from the rayfield, it can be used
as an initialiser for a **direct bundle adjustment** — jointly optimising
optical parameters and board poses against the ChArUco corner residuals.
This direct BA step typically reduces the reprojection RMS, but it comes
with a risk: **some optical directions are poorly observable once the
poses are free to adjust.**

The Fisher information matrix of the BA residual, partitioned into an
optical block $\mathcal{I}_{\theta\theta}$ and a pose block
$\mathcal{I}_{\eta\eta}$, reveals the coupling:

\[
\mathcal{I} =
\begin{bmatrix}
\mathcal{I}_{\theta\theta} & \mathcal{I}_{\theta\eta} \\
\mathcal{I}_{\eta\theta} & \mathcal{I}_{\eta\eta}
\end{bmatrix}.
\]

The **Schur complement** of the pose block,

\[
S_\theta = \mathcal{I}_{\theta\theta}
- \mathcal{I}_{\theta\eta}\,\mathcal{I}_{\eta\eta}^{-1}\,\mathcal{I}_{\eta\theta},
\]

measures the *effective* information on the optical parameters after
marginalising the poses.  Eigenvectors of $S_\theta$ with very small
eigenvalues — the **weak modes** — are directions in optical parameter
space that a change in the board poses can almost perfectly mimic.  An
unregularised BA can drift along these modes, reducing the pixel RMS
while destroying the physical interpretability of the parameters
($b, WD, f_{\text{obj}}, \theta_{\text{conv}}, R_L, R_R$).

### The prior

The rayfield estimate $\theta_0$ provides more than an initialisation:
it defines an **observability-aware prior**.  We diagonalise $S_0 =
S_\theta(\theta_0, \eta_0)$ and construct per-mode weights

\[
w_i = \left(
\frac{\lambda_{\max}}{\lambda_i + \varepsilon\,\lambda_{\max}}
\right)^p,
\]

where $p=1$ gives moderate penalisation and $p=2$ more aggressive.  The
regularisation added to the BA cost is:

\[
\mathcal{L}_{\text{Schur}} =
\alpha \sum_i w_i \left(
v_i^T D_\theta^{-1}(\theta - \theta_0)
\right)^2,
\]

with $D_\theta$ a diagonal matrix of per-parameter scales (degrees for
rotations, millimetres for translations, pixels for the principal point).
The prior penalises **only** the weakly observable modes, leaving the
well-observed directions free to improve the fit.

### Validation on the 2-cent coin specimen

A dense stereo reconstruction of the Pycaso 2-cent euro coin (DIS optical
flow, 1.94 M correspondences over a 1448 × 1448 px ROI) tests whether
the regularised BA preserves or degrades the geometric reconstruction.
Five optical models are compared:

![5-variant specimen reconstruction](assets/pycaso_real_data/schur_ba/specimen_comparison_all_variants.png)

*Surface relief (Z minus local mean plane) and ray-pair gap distributions
for the Zernike rayfield, the CMO rayfield initialisation, the
unregularised BA, and two regularised variants (isotropic Tikhonov and
Schur-complement prior).  The shared colour scale makes surface roughness
directly comparable across models.*

| Model | Z MAD | Median ray gap | Magnification vs 18.75 mm coin |
|---|---:|---:|---:|
| Zernike rayfield (57 p) | 0.194 mm | 0.0011 mm | 0.1968 |
| CMO 26 p (rayfield init) | 0.073 mm | **0.0224 mm** | 0.1904 |
| CMO 26 p — BA unregularised | 0.030 mm | 0.0011 mm | 0.1931 |
| CMO 26 p — BA + isotropic prior ($\alpha{=}10^{-2}$) | 0.027 mm | 0.0011 mm | 0.1930 |
| CMO 26 p — BA + **Schur prior** ($\alpha{=}10^{-3}$) | **0.027 mm** | 0.0011 mm | 0.1930 |

The rayfield initialisation has a median ray gap 20× worse than all BA
variants — the Y-axis correction (see ``src/stereocomplex/core/conventions.py``) reveals that
the initial model's triangulation quality was artificially inflated by
the old coordinate convention.  All BA variants recover tight ray
intersections (median gap ~1 µm).

The unregularised BA reduces surface roughness from 0.073 mm to
0.030 mm (a factor of 2.4×).  The regularised variants improve this
further — to 0.027 mm — showing that the prior does not degrade the fit.

### Schur vs isotropic sweep

![Schur complement spectrum](assets/pycaso_real_data/schur_ba/schur_spectrum.png)

A sweep over the prior strength $\alpha$ reveals the difference between
the isotropic (Tikhonov) and Schur-based priors:

| $\alpha$ | Isotropic RMS (px) | Isotropic weak drift | Schur RMS (px) | Schur weak drift |
|---:|---:|---:|---:|---:|
| $10^{-4}$ | 0.239 px (✗) | 0.529 | **0.277 px** (✓) | **0.0033** |
| $10^{-3}$ | 0.245 px | 0.248 | **0.278 px** | **0.0007** |
| $10^{-2}$ | 0.257 px | 0.082 | **0.279 px** | **0.0001** |
| $10^{-1}$ | 0.270 px | 0.017 | **0.279 px** | **0.0000** |
| $10^{0}$  | 0.278 px | 0.003 | **0.283 px** | **0.0000** |
| $10^{1}$  | 0.286 px | 0.010 | **0.323 px** | **0.0000** |

The isotropic prior faces a trade-off: a small $\alpha$ leaves the weak
modes uncontrolled ($\text{drift}_{\text{weak}} = 0.53$ at
$\alpha{=}10^{-4}$), while a large $\alpha$ degrades the fit (RMS
rises to 0.286 px).  The Schur prior **breaks this trade-off**: even at
$\alpha{=}10^{-4}$ it suppresses 99.4% of the weak-mode drift while
keeping the RMS within 0.039 px of the unregularised baseline.  At
$\alpha{=}10^{-3}$ the weak-mode drift is below $10^{-3}$ and the RMS
penalty is only 0.039 px.

### Interpretation

The Schur prior is more than an algorithmic refinement — it formalises a
**double role** for the rayfield estimate:

1. **Initialiser** — $\theta_0$ places the direct BA in the correct
   convergence basin, avoiding the local minima that trap a pinhole or
   perspective-CMO initialisation (see the direct-vs-rayfield comparison
   in notebook 08).
2. **Observability prior** — the Schur eigenmodes of the Fisher matrix
   at $\theta_0$ tell the optimiser *which directions it may trust*.
   The prior blocks compensation between poses and intrinsics without
   penalising the genuinely observable optical degrees of freedom.

The 5-variant specimen reconstruction confirms that this strategy works
on real hardware: the Schur-regularised BA produces the smoothest surface
reconstruction (lowest Z MAD), tightest ray intersections, and stable
physical descriptors — all from only 10 ChArUco stereo pairs.

## Saved artefacts

```text
docs/assets/pycaso_real_data/
    detection_summary.json                 ← per-frame ChArUco counts
    summary.json                           ← calibration RMS, CMO descriptors
    model_comparison.json                  ← Zernike vs telecentric vs perspective
    zernike_pose_variants.json             ← full Zernike coeffs for both pose models
    zernike_conditioning_diagnostic.json   ← design matrix, modal Δd, sensitivity
    zernike_gauge_regularization_sweep.json ← regularization sweep
    moment_residual_diagnostic.json        ← Δm modal decomposition + O1/O2 fits
    arm_alignment_diagnostic.json          ← SE(3) arm alignment sweep
    aligned_cmo_fit.json                   ← final joint fit (telecentric + SE(3))
    se3_ablation.json                      ← SE(3) parameter ablation study
    autopsy_20p.json                       ← 20p model autopsy (negative control)
    autopsy_26p.json                       ← 26p model autopsy + compression
    pca_residual_26p.json                  ← PCA low-rank residual analysis
    warped_model_comparison.json           ← pre-warp L1/L2 evaluation
    bic_model_selection.json               ← BIC model selection on Pycaso data
    pareto_gauge_regularization.png        ← Pareto frontier plot
    schur_ba/
        schur_ba_diagnostic.json           ← Schur spectrum + coupling norm
        schur_spectrum.png                 ← normalised eigen-spectrum
        optical_ba_unregularized.json      ← direct (unregularised) BA result
        optical_ba_isotropic_prior_sweep.json  ← α-sweep, Tikhonov baseline
        optical_ba_isotropic_1e-2.json     ← best isotropic BA
        optical_ba_schur_prior_sweep.json  ← α-sweep, Schur prior
        optical_ba_schur_1e-3.json         ← best Schur-regularised BA
        specimen_comparison_all_variants.png   ← 5-variant coin reconstruction
```

To regenerate all results from raw images:

```bash
PYTHONPATH=src python examples/notebooks/09_pycaso_real_data.py
```

**To reproduce the model fitting, BIC, and SE(3) diagnostics without the
Pycaso raw images**, restart from `intermediate_state.npz`.  This file
contains the already-detected, Hessian-completed, TPS-denoised corner
positions, the fitted Zernike rayfield, and the initial 26p model
parameters — everything needed to run Steps 4–9 (model fitting, BIC
selection, SE(3) alignment, ablation, and corner refinement) without
access to the original TIFF/PNG images.

## See also

- :doc:`IDENTIFY_MY_OPTICS` — how to read physical descriptors from a rayfield
- :doc:`CMO_PHYSICAL_MODEL` — the shared-rig CMO model definition
- :doc:`DIRECT_VS_RAYFIELD_INVERSION` — why measure a rayfield before fitting optics
- :doc:`NOTEBOOKS` — all walkthrough notebooks
- :doc:`SCHUR_REGULARIZED_BA` (planned) — detailed theory behind the Schur-complement prior
- ``src/stereocomplex/core/conventions.py`` — coordinate-frame convention layer (OpenCV vs physical Y-up)
- [Notebook 09](../examples/notebooks/09_pycaso_real_data.py) — executable protocol