ChArUco: 2D identification strategy (baseline)
Goal: obtain ChArUco 2D corner positions that are as stable as possible (sub-pixel) in order to quantify the impact of blur, compression, and aberrations, and to prepare calibration/reconstruction stages.
The project deliberately separates:
a geometric prior (planar board + ArUco/ChArUco correspondences);
an image observation (blur, compression, contrast, etc.);
methods that rely on a parametric model (pinhole + distortion) or a non-parametric model (smoothed field).
Error measurement
On synthetic datasets, the error is computed against the ground truth stored in gt_charuco_corners.npz:
matching by
corner_id(stable ID);per-view metrics (left/right): RMS, p50, p95, max, bias dx/dy.
Command:
.venv/bin/python -m stereocomplex.cli eval-charuco-detection dataset/v0 --method <METHOD>
Pixel-center convention (important)
The project uses a “pixel centers at integer coordinates” convention (see docs/CONVENTIONS.md).
OpenCV often reports corners in a convention shifted by 0.5 px; the evaluation code compensates for that shift for --method charuco.
Available methods (CLI --method)
1) charuco (direct OpenCV)
OpenCV ArUco pipeline → ChArUco interpolation (inner chessboard corners).
Pro: simple, no camera model assumption.
Limitation: accuracy is often limited (sensitivity to blur/compression + conventions + internal heuristics).
2) homography (2nd-pass planar geometry)
Detects ArUco corners, estimates a global homography (RANSAC), then projects all ChArUco corners.
Works well when the image is well explained by a “simple” planar projective mapping.
Limitation: degrades in the presence of out-of-model distortions (e.g. strong radial distortion).
3) pnp (2nd-pass parametric K + distortion)
Uses
meta.json(pitch/crop/resize +f_um) to buildKand distortion coefficients, then:runs
solvePnPRansacon ArUco 3D→2D corners,uses
projectPointsto predict ChArUco corners.
Pro: robust when the optics can be modeled as pinhole + (Brown) distortion.
Limitation: not applicable / biased for non-pinhole systems (e.g. non-central microscope/CMO models).
Important note (focal length)
In the current synthetic dataset, f_um is known because it is generated and stored in meta.json (sim_params.f_um).
Therefore, method pnp uses it as a known parameter to isolate the “point identification” effect.
In real data, f_um (and more generally K and distortion) are not known a priori:
either they are estimated by a classical multi-view calibration (e.g. Zhang) before running
pnp,or they are part of an auto-calibration problem (latent variables to estimate),
or one avoids the pinhole assumption and uses a non-parametric method (e.g.
rayfield).
4) rayfield (2nd-pass non-parametric “smoothed field” on the board plane)
Goal: replace a pinhole model by a weaker assumption: the mapping from the board plane to the image is low-frequency.
Implementation (plane-only):
global homography
H(RANSAC) as a stable baseline;residual field
r(x,y)estimated on a grid (bilinear), regularized with a smoothing term (Laplacian) and robust to outliers (Huber);prediction:
u(x,y) = H(x,y) + r(x,y).
Pros:
does not depend on a pinhole optical model;
captures slow variations (complex aberrations) while remaining stable.
Limitation:
this “ray-field” is restricted to the plane (a 2D warp); for a full 3D per-pixel ray field, calibration across multiple poses/planes is required.
4b) rayfield_tps and rayfield_tps_robust (recommended in this repo)
The current default used in the examples/paper is rayfield_tps_robust:
base homography
H,TPS residual field,
robustification by IRLS (Huber).
Compared to the grid backend, TPS is usually more stable when the residual field is only observed sparsely (AruCo corners).
5) kfield (a “local K” field approximated by smoothed affines)
This method was an intermediate step: the idea is to replace a global K by a spatially varying field, under a low-frequency assumption.
Note: in the current code, kfield does not interpolate a pinhole matrix \(K\) in the strict sense. Instead, it builds
a smoothed field of local affine (first-order) models obtained by linearizing the plane→image mapping.
Linearization (Jacobian)
Consider an unknown (potentially complex) mapping between the board plane and the image:
u = u(x,y)v = v(x,y)
Around a reference point \((x_q, y_q)\), we can write a first-order Taylor expansion:
u(x,y) ≈ u_q + (∂u/∂x)_q · (x-x_q) + (∂u/∂y)_q · (y-y_q)v(x,y) ≈ v_q + (∂v/∂x)_q · (x-x_q) + (∂v/∂y)_q · (y-y_q)
The local Jacobian (the linear part) is:
J(x_q,y_q) = [[∂u/∂x, ∂u/∂y],
[∂v/∂x, ∂v/∂y]] (évalué en (x_q,y_q))
The kfield idea is to estimate this local Jacobian (and offset) from the ArUco correspondences available in the image,
then smooth/interpolate it to obtain a low-frequency approximation.
Construction (what the code does)
choose an anchor grid in board coordinates \((x,y)\);
at each anchor, fit a local affine model by weighted least squares (nearest ArUco neighbors):
\[u(x,y)=a_0 + a_1 x + a_2 y,\quad v(x,y)=b_0 + b_1 x + b_2 y\]where
a1,a2,b1,b2estimate the local Jacobian \((\partial u/\partial x, \partial u/\partial y, \partial v/\partial x, \partial v/\partial y)\).smooth each parameter \((a_0,a_1,a_2,b_0,b_1,b_2)\) on the grid (Gaussian);
for a query point \((x,y)\), bilinearly interpolate these parameters and apply the affine mapping.
Why this is not sufficient:
a local affine model does not capture projective effects (and even less distortion) over the full board;
directly interpolating a matrix \(K\) is not “geometrically stable” (constraints on \(f_x,f_y\), etc.).
In practice, rayfield (homography + smoothed residual field) matches the “low-frequency” intuition while remaining numerically stable.
Assumptions per method (what it “assumes”)
Summary of dependencies (as of the current code):
charuco: does not requireK/distortion, but depends on OpenCV heuristics.homography: does not requireK/distortion; assumes a global homography explains the board image well.tps: does not requireK/distortion; assumes a smooth 2D warp (thin-plate spline) and can extrapolate unstably if underconstrained.pnp: requires an optical model (pinhole + distortion) and its parameters (or a prior step estimating them).rayfield: does not requireK/distortion; assumes a low-frequency planar warp and uses only correspondences (Aruco) + regularization.rayfield_tps: arayfieldvariant where the residual is reconstructed by regularized TPS (instead of a bilinear grid + Laplacian).rayfield_tps_robust: TPS residual + robust loss (recommended default).
Photometric refinements (CLI --refine)
Refinements based on structure tensor/gradients exist (tensor, lines, lsq, noble), but on the current datasets they often moved corners toward a photometric optimum that does not match the GT geometric center.
They should be considered as ablations/experiments rather than the recommended method.
Current recommendation
If the optics are well approximated by pinhole + distortion: prefer
pnp.If the optics are complex/non-central: prefer
rayfield_tps_robust(low-frequency assumption) and increase regularization if needed.
Paper comparison (reproducible script)
The manuscript includes an automatically generated table (methods vs errors). To regenerate it:
.venv/bin/python paper/experiments/compare_charuco_methods.py dataset/v0_png --splits train
bash paper/build_pdflatex.sh
Worked example (raw OpenCV vs ray-field + plots)
See docs/RAYFIELD_WORKED_EXAMPLE.md (includes a detailed explanation of why a global homography + a smoothed residual field can correct part of the aberrations/distortions on the board plane).