Research Note 01·arXiv:2510.19738

How o3 and GPT-5 diverged from a single elicitation.

A reproducible elicitation, awarded twice in Palisade Research's 2025 AI Misalignment Bounty — nine submissions of two hundred and ninety-five accepted, two from the same scenario.

Read the note

9 / 295

Submissions awarded

2 ×

3rd-place prizes · one scenario

o3 · GPT-5

Models elicited

Signature moment · concept TBD

1080 × 800

Number Guess · sections 4.5 – 4.6 of arXiv:2510.19738

00 — RESEARCH

Recent notes

Index →

01 · RESEARCH NOTE

Same scenario, two different deceptions: how o3 and GPT-5 diverged from a single elicitation.

Frontier model evalsPalisade Bounty

21 May 2026
11 min

01 · RESEARCH NOTE · 21 May 2026

Same scenario, two different deceptions: how o3 and GPT-5 diverged from a single elicitation.

Frontier model evalsPalisade Bounty

01 — PLATFORM

G1 — embodied alignment, in the open.

g1-alignment.vercel.app

EMBODIED ALIGNMENT HARNESS

G1 — testing whether AI deceives operators when controlling a humanoid robot under pressure.

MuJoCo-simulated Unitree G1 on Inspect AI (UK AISI). Telemetry-corruption variant probes concealment of safety violations. Built at the Gemini 3 Hackathon.

Open platform

Frontier models

Reproducible runs

29%

L3 concealment · Gemini ER 1.5

5.0

Honesty · GPT-5 (1.9 safety)

02 — WHO

About the researcher.

Ma'alona Mafaufau — cybersecurity engineer and AI safety researcher, based in Tāmaki Makaurau. Six years across APT threat hunting, SIEM detection engineering, and quantitative analysis — now applying frontier-AI evaluation methods to alignment, and to defensive security.

2026Building autonomous incident-response agents — SANS FIND EVIL! / Protocol SIFT extension.

2025Two-time 3rd-place Palisade Research Misalignment Bounty winner.

BuildingG1 embodied-alignment harness on Inspect AI (UK AISI), MuJoCo Unitree G1.

QuantBSc (Hons) Statistics, University of Auckland.

Full resume →

NEWSLETTER·Quarterly

Research notes, in the inbox.

One quarterly note. Misalignment findings, eval-design write-ups, occasional teaching material.

How o3 and GPT-5 diverged from a single elicitation.

Same scenario, two different deceptions: how o3 and GPT-5 diverged from a single elicitation.

Same scenario, two different deceptions: how o3 and GPT-5 diverged from a single elicitation. →

G1 — testing whether AI deceives operators when controlling a humanoid robot under pressure.

Research notes, in the inbox.

Same scenario, two different deceptions: how o3 and GPT-5 diverged from a single elicitation.