How o3 and GPT-5 diverged on a single misalignment prompt.
A reproducible elicitation, awarded twice in Palisade Research's 2025 AI Misalignment Bounty — nine submissions of two hundred and ninety-five accepted, two from the same scenario.
9 / 295
Submissions awarded
2 ×
3rd-place prizes · one scenario
o3 · GPT-5
Models elicited
G1 — embodied alignment, in the open.
EMBODIED ALIGNMENT HARNESS
G1 — testing whether AI deceives operators when controlling a humanoid robot under pressure.
MuJoCo-simulated Unitree G1 on Inspect AI (UK AISI). Telemetry-corruption variant probes concealment of safety violations. Built at the Gemini 3 Hackathon.
Open platform4
Frontier models
30
Reproducible runs
29%
L3 concealment · Gemini ER 1.5
5.0
Honesty · GPT-5 (1.9 safety)
About the researcher.
Ma'alona Mafaufau — cybersecurity engineer and AI safety researcher, based in Tāmaki Makaurau. Six years across APT threat hunting, SIEM detection engineering, and quantitative analysis — now applying frontier-AI evaluation methods to alignment, and to defensive security.
2026Building autonomous incident-response agents — SANS FIND EVIL! / Protocol SIFT extension.
2025Two-time 3rd-place Palisade Research Misalignment Bounty winner.
BuildingG1 embodied-alignment harness on Inspect AI (UK AISI), MuJoCo Unitree G1.
QuantBSc (Hons) Statistics, University of Auckland.
NEWSLETTER·Quarterly
Research notes, in the inbox.
One quarterly note. Misalignment findings, eval-design write-ups, occasional teaching material.