Nuwa 女娲

Nuwa 女娲
Frontier AI Safety Lab

Transparent, third-party, open infrastructure for frontier AI safety evaluation and governance.

Nuwa studies frontier AI risks, agent safety, autonomy risks, deception, scheming, and loss-of-control. Supported by Whitzard, Nuwa develops open evaluation frameworks, benchmarks, technical notes, and governance evidence for safe and controllable AI.

Supported by Whitzard

Repair

Identify and fix safety gaps

Boundary

Define safe operational limits

Prudence

Cautious evaluation of risks

Guided

Evidence-based governance

Transparent Third-Party Evaluation

Nuwa's approach to independent frontier AI safety evaluation and governance evidence.

Transparent

Public research notes, open evaluation assumptions, and clear evidence standards.

Third-Party

Independent safety evaluation and governance evidence for frontier AI systems.

Open Infrastructure

Benchmarks, tools, datasets, and frameworks that support broader AI safety research.

Governance-Oriented

Technical evidence that can support responsible AI governance and public accountability.

Research Themes

Core themes driving our frontier AI safety research.

Autonomy Risks

Self-replication, self-proliferation, and autonomous resource acquisition in frontier AI.

Deception

AI deception in human-AI interaction, developer-facing deception, and trust dynamics.

Scheming

Evaluation faking, situational awareness, and observer effects in safety evaluations.

Loss-of-Control

Conditions under which AI systems may evade oversight or resist shutdown.

Agent Safety

Behavioral safety, thought correction, and guardrails for AI agent systems.

Evaluation

Executable risk-evaluation environments, benchmarks, and evaluation integrity.

Governance Evidence

Technical evidence supporting responsible AI governance and public accountability.

Featured Research

arXiv preprint · 2024

Frontier AI systems have surpassed the self-replicating red line

Xudong Pan, Jiarun Dai, Yihe Fan, Min Yang

Evaluates whether frontier AI systems can autonomously self-replicate and reports successful self-replication in controlled trials.

Frontier AI risk self-replication loss of control

arXiv preprint; work in progress · 2025

Large language model-powered AI systems achieve self-replication with no human intervention

Xudong Pan, Jiarun Dai, Yihe Fan, Minyuan Luo, Changyi Li, Min Yang

Extends self-replication evaluation across 32 AI systems and reports autonomous replication, self-exfiltration, adaptation, and shutdown-survival behaviors.

Frontier AI risk self-replication self-exfiltration shutdown resistance

Nüwa Project preprint · 2026

One Step from Silicon Life: Autonomous AI Agents Capable of Uncontrolled Self-Proliferation

Geng Hong, Xudong Pan, Jiarun Dai, Jiaqi Luo, Wuyuao Mai, Min Yang

Demonstrates autonomous agents acquiring external computational resources and propagating across remote devices under controlled, simulated real-world conditions.

Frontier AI risk self-proliferation autonomous resource acquisition

arXiv preprint · 2025

Shell or Nothing: Real-World Benchmarks and Memory-Activated Agents for Automated Penetration Testing

Wuyuao Mai, Geng Hong, Qi Liu, Jinsong Chen, Jiarun Dai, Xudong Pan, Yuan Zhang, Min Yang

Introduces TermiBench and TermiAgent to evaluate and improve autonomous penetration-testing agents in realistic shell-acquisition settings.

AI system security autonomous penetration testing cyber agents

Publications & Papers

View all

arXiv preprint · 2024

Frontier AI systems have surpassed the self-replicating red line

Xudong Pan, Jiarun Dai, Yihe Fan, Min Yang

Evaluates whether frontier AI systems can autonomously self-replicate and reports successful self-replication in controlled trials.

Frontier AI risk self-replication loss of control

arXiv preprint; work in progress · 2025

Large language model-powered AI systems achieve self-replication with no human intervention

Xudong Pan, Jiarun Dai, Yihe Fan, Minyuan Luo, Changyi Li, Min Yang

Extends self-replication evaluation across 32 AI systems and reports autonomous replication, self-exfiltration, adaptation, and shutdown-survival behaviors.

Frontier AI risk self-replication self-exfiltration shutdown resistance

Nüwa Project preprint · 2026

One Step from Silicon Life: Autonomous AI Agents Capable of Uncontrolled Self-Proliferation

Geng Hong, Xudong Pan, Jiarun Dai, Jiaqi Luo, Wuyuao Mai, Min Yang

Demonstrates autonomous agents acquiring external computational resources and propagating across remote devices under controlled, simulated real-world conditions.

Frontier AI risk self-proliferation autonomous resource acquisition

arXiv preprint · 2025

Shell or Nothing: Real-World Benchmarks and Memory-Activated Agents for Automated Penetration Testing

Wuyuao Mai, Geng Hong, Qi Liu, Jinsong Chen, Jiarun Dai, Xudong Pan, Yuan Zhang, Min Yang

Introduces TermiBench and TermiAgent to evaluate and improve autonomous penetration-testing agents in realistic shell-acquisition settings.

AI system security autonomous penetration testing cyber agents

arXiv preprint · 2026

CyberEvolver: Structured Self-Evolution for Cybersecurity Agents On the Fly

Yihe Fan, Changyi Li, Lichen Xu, Xudong Pan, Jiarun Dai, Geng Hong, Min Yang

Proposes a cybersecurity agent framework that iteratively revises its own scaffold from failed attempts to adapt across targets and failure modes.

AI system security cybersecurity agents self-evolving agents

Follow Nuwa's research updates on frontier AI risk, agent safety, and controllable AI.

We collaborate on frontier AI safety evaluation, agent safety research, and runtime defense.