EngRadardirect-apply

RESEARCHER, POST-TRAINING

Makermaker.ai

San Francisco Full-time Posted 1mo ago

ABOUT THE COMPANY

We're building autonomous research agents for recursive self-improvement (multi-agent systems that propose, run, and analyze machine learning experiments). We're a small team based in San Francisco, on-site

ABOUT THE ROLE

You'll lead our work on model post-training: supervised fine-tuning, preference data, reinforcement learning from human and AI feedback, reward modeling, and the evaluation suites that tell us what's actually working. You'll own a research area that meaningfully shapes our model behavior and capability.

This is a hands-on senior research role. You'll set direction, run experiments, and ship into production. You'll partner with the data, infrastructure, and engineering teams to make the post-training pipeline reliable and fast: improvements there compound into every model we ship.

WHAT YOU'LL DO

  • Lead post-training research: SFT, RLHF/RLAIF, RLVR, DPO and successor methods, reward modeling, preference data design

  • Design and curate the data that goes into post-training (from sourcing, to filtering, to quality assessment)

  • Build and maintain the evaluation suites that measure what matters; resist Goodharting your own benchmarks

  • Run rigorous experiments (controls, ablations, statistical significance) and write up internal findings clearly

  • Scale data pipelines and the infrastructure team to scale training

  • Identify and characterize failure modes (reward hacking, distribution drift, eval saturation) and design experiments to address them

  • Stay current on the post-training literature; bring useful methods in, ignore the noise

WHAT WE'RE LOOKING FOR

  • Strong track record of post-training research (SFT, RL, reward modeling) at a frontier-model lab or equivalent

  • 5+ years of hands-on ML research experience

  • Comfort with large-scale data curation and preference-data pipelines

  • Experience designing evaluation suites for capabilities that aren't easily benchmarked

  • Fluent in PyTorch or equivalent; comfortable at the scale of distributed training

  • Strong statistical instincts: you'd notice a flawed comparison before someone else points it out

  • Strong written communication

NICE TO HAVE

  • PhD in ML, statistics, CS, or adjacent

  • Published research at NeurIPS, ICML, ICLR, COLM, RLC, or comparable venues

  • Experience with reward hacking detection, scaling reward models, or RLHF infrastructure

  • Synthetic data generation experience

  • Background in RL math (policy gradients, importance sampling, off-policy methods)

  • Open-source contributions to post-training infrastructure

THIS ROLE IS PROBABLY NOT FOR YOU IF

  • You're primarily interested in pretraining (that's a different role)- You'd rather invent novel methods in isolation than ship them into a model that real users run

  • You prefer benchmarks that are stable to evaluation work where the right answer isn't yet defined

Posted by Makermaker.ai on their own careers page — you apply directly, no recruiter in between. View original / apply →

More at Makermaker.ai