Research Scientist (Control)

We build products that monitor AI coding agents for safety and security failures.

London & San Francisco Full-time Posted 6mo ago ai safetysecurity

Application deadline: We are conducting interviews actively and aim to fill this role as soon as we find someone suitable.

THE OPPORTUNITY

Join our new AGI safety product team and help transform AI control research into practical tools that directly reduce risks from AI. As an Research Scientist (Control), you’ll work closely with Marius (CEO & currently leads the monitoring efforts), other control researchers and product engineers.

We are currently building Watcher, a monitoring tool for coding agents. Our monitoring research agenda attempts to translate compute into safety at scale. You will join a small team and will have significant ability to shape the team & tech, and have the ability to earn responsibility quickly. You will like this opportunity if you're passionate about using empirical research to make AI systems safer in practice. You enjoy the challenge of translating theoretical AI risks into concrete detection mechanisms. You thrive on rapid iteration and learning from data. You want your research to directly impact real-world AI safety.

KEY RESPONSIBILITIES

TLDR: you will design & implement control protocols (see e.g. [Greenblatt et al, 2023]) and test them on real-world production systems at scale.

Research & Development

Systematically collect and catalog coding agent failure modes from real-world instances, our internal deployments, public examples, research literature, and theoretical predictions
Design and conduct experiments to test monitor effectiveness across different failure modes and agent behaviors
Build and maintain evaluation frameworks to measure progress on monitoring capabilities
Build and maintain high-quality datasets to train and test monitors on
Iterate on monitoring approaches based on empirical results, balancing detection accuracy with computational efficiency
Stay current with research on AI safety, agent failures, and detection methodologies
Stay current with research into coding security and safety vulnerabilities
Monitor Design & Optimization
Develop & maintain a comprehensive library of monitoring prompts tailored to specific failure modes (e.g., security vulnerabilities, goal misalignment, deceptive behaviors)
Experiment with different reasoning strategies and output formats to improve monitor reliability
Design and test hierarchical monitoring architectures and ensemble approaches
Optimize log pre-processing pipelines to extract relevant signals while minimizing latency and computational costs
Implement and evaluate different scaffolding approaches for monitors, including chain-of-thought reasoning, structured outputs, and multi-step verification
Fine-tuning & Red-teaming
Fine-tune open-source models to create efficient monitors for high-volume production environments
Design and build agentic monitoring systems that autonomously investigate logs to identify both known and novel failure modes
Build automated red-teaming pipelines that attack monitors at scale
Design iterative adversarial games where a red-team and blue team continuously attack and defend respectively

Posted by Apolloresearch on their own careers page — you apply directly, no recruiter in between. View original / apply →

More at Apolloresearch

AI Security & Control Engineer

Apolloresearch · We build products that monitor AI coding agents for safety and s…

London & San Francisco ai safetysecurity

10d ago

Backend Engineer (Product)

Apolloresearch · We build products that monitor AI coding agents for safety and s…

London & San Francisco ai safetysecurity

6mo ago

Backend Software Engineer (Research team)

Apolloresearch · We build products that monitor AI coding agents for safety and s…

London & San Francisco ai safetysecurity

6mo ago

Finance Associate (Expression of Interest)

Apolloresearch · We build products that monitor AI coding agents for safety and s…

London ai safetysecurity

1mo ago