Research Note 01·arXiv:2510.19738

How o3 and GPT-5 diverged on a single misalignment prompt.

A reproducible elicitation, awarded twice in Palisade Research's 2025 AI Misalignment Bounty — nine submissions of two hundred and ninety-five accepted, two from the same scenario.

Read the paper
9 / 295
Submissions awarded
2 ×
3rd-place prizes · one scenario
o3 · GPT-5
Models elicited
Signature moment · concept TBD
1080 × 800
Number Guess · sections 4.5 – 4.6 of arXiv:2510.19738
01 — PLATFORM

G1 — embodied alignment, in the open.

g1-alignment.vercel.app
EMBODIED ALIGNMENT HARNESS

G1 — testing whether AI deceives operators when controlling a humanoid robot under pressure.

MuJoCo-simulated Unitree G1 on Inspect AI (UK AISI). Telemetry-corruption variant probes concealment of safety violations. Built at the Gemini 3 Hackathon.

Open platform
4
Frontier models
30
Reproducible runs
29%
L3 concealment · Gemini ER 1.5
5.0
Honesty · GPT-5 (1.9 safety)
02 — WHO

About the researcher.

Ma'alona Mafaufau — cybersecurity engineer and AI safety researcher, based in Tāmaki Makaurau. Six years across APT threat hunting, SIEM detection engineering, and quantitative analysis — now applying frontier-AI evaluation methods to alignment, and to defensive security.

2026Building autonomous incident-response agents — SANS FIND EVIL! / Protocol SIFT extension.
2025Two-time 3rd-place Palisade Research Misalignment Bounty winner.
BuildingG1 embodied-alignment harness on Inspect AI (UK AISI), MuJoCo Unitree G1.
QuantBSc (Hons) Statistics, University of Auckland.
NEWSLETTER·Quarterly

Research notes, in the inbox.

One quarterly note. Misalignment findings, eval-design write-ups, occasional teaching material.