Nuwa 女娲
Frontier AI Safety Lab

Transparent, third-party, open infrastructure for frontier AI safety evaluation and governance.

Nuwa studies frontier AI risks, agent safety, autonomy risks, deception, scheming, and loss-of-control. Supported by Whitzard, Nuwa develops open evaluation frameworks, benchmarks, technical notes, and governance evidence for safe and controllable AI.

Supported by Whitzard

Read Research Subscribe to Nuwa Brief

Repair

Identify and fix safety gaps

Boundary

Define safe operational limits

Prudence

Cautious evaluation of risks

Guided

Evidence-based governance

Autonomy Risks

Self-replication & resource acquisition

Agent Safety

Behavioral safety & runtime defense

Evaluation

Benchmarks & risk assessment

Transparent Third-Party Evaluation

Nuwa's approach to independent frontier AI safety evaluation and governance evidence.

Transparent

Public research notes, open evaluation assumptions, and clear evidence standards.

Third-Party

Independent safety evaluation and governance evidence for frontier AI systems.

Open Infrastructure

Benchmarks, tools, datasets, and frameworks that support broader AI safety research.

Governance-Oriented

Technical evidence that can support responsible AI governance and public accountability.

Research Themes

Core themes driving our frontier AI safety research.

Autonomy Risks

Self-replication, self-proliferation, and autonomous resource acquisition in frontier AI.

Deception

AI deception in human-AI interaction, developer-facing deception, and trust dynamics.

Scheming

Evaluation faking, situational awareness, and observer effects in safety evaluations.

Loss-of-Control

Conditions under which AI systems may evade oversight or resist shutdown.

Agent Safety

Behavioral safety, thought correction, and guardrails for AI agent systems.

Evaluation

Executable risk-evaluation environments, benchmarks, and evaluation integrity.

Governance Evidence

Technical evidence supporting responsible AI governance and public accountability.

Featured Research

arXiv preprint · 2024

Frontier AI systems have surpassed the self-replicating red line

Xudong Pan, Jiarun Dai, Yihe Fan, Min Yang

Evaluates whether frontier AI systems can autonomously self-replicate and reports successful self-replication in controlled trials.

Frontier AI risk self-replication loss of control

Read paper → PDF arXiv

arXiv preprint; work in progress · 2025

Large language model-powered AI systems achieve self-replication with no human intervention

Xudong Pan, Jiarun Dai, Yihe Fan, Minyuan Luo, Changyi Li, Min Yang

Extends self-replication evaluation across 32 AI systems and reports autonomous replication, self-exfiltration, adaptation, and shutdown-survival behaviors.

Frontier AI risk self-replication self-exfiltration shutdown resistance

Read paper → PDF arXiv

Nüwa Project preprint · 2026

One Step from Silicon Life: Autonomous AI Agents Capable of Uncontrolled Self-Proliferation

Geng Hong, Xudong Pan, Jiarun Dai, Jiaqi Luo, Wuyuao Mai, Min Yang

Demonstrates autonomous agents acquiring external computational resources and propagating across remote devices under controlled, simulated real-world conditions.

Frontier AI risk self-proliferation autonomous resource acquisition

Read paper →

arXiv preprint · 2025

Shell or Nothing: Real-World Benchmarks and Memory-Activated Agents for Automated Penetration Testing

Wuyuao Mai, Geng Hong, Qi Liu, Jinsong Chen, Jiarun Dai, Xudong Pan, Yuan Zhang, Min Yang

Introduces TermiBench and TermiAgent to evaluate and improve autonomous penetration-testing agents in realistic shell-acquisition settings.

AI system security autonomous penetration testing cyber agents

Read paper → PDF arXiv

News

View all →

Research Jan 15, 2026 Featured ↗

The Science of Frontier AI Risk Evaluation

Nuwa's first public research essay on making frontier AI risk evaluation more scientific, evidence-based, and operational.

Source: Nuwa Substack

Latest Nuwa Brief

Nuwa Brief External 2026

The Science of Frontier AI Risk Evaluation

Nuwa essay on making frontier AI risk evaluation more scientific, evidence-based, and operational.

Publications & Papers

View all →

arXiv preprint · 2024

Frontier AI systems have surpassed the self-replicating red line

Xudong Pan, Jiarun Dai, Yihe Fan, Min Yang

Evaluates whether frontier AI systems can autonomously self-replicate and reports successful self-replication in controlled trials.

Frontier AI risk self-replication loss of control

Read paper → PDF arXiv

arXiv preprint; work in progress · 2025

Large language model-powered AI systems achieve self-replication with no human intervention

Xudong Pan, Jiarun Dai, Yihe Fan, Minyuan Luo, Changyi Li, Min Yang

Extends self-replication evaluation across 32 AI systems and reports autonomous replication, self-exfiltration, adaptation, and shutdown-survival behaviors.

Frontier AI risk self-replication self-exfiltration shutdown resistance

Read paper → PDF arXiv

Nüwa Project preprint · 2026

One Step from Silicon Life: Autonomous AI Agents Capable of Uncontrolled Self-Proliferation

Geng Hong, Xudong Pan, Jiarun Dai, Jiaqi Luo, Wuyuao Mai, Min Yang

Demonstrates autonomous agents acquiring external computational resources and propagating across remote devices under controlled, simulated real-world conditions.

Frontier AI risk self-proliferation autonomous resource acquisition

Read paper →

arXiv preprint · 2025