Member of Technical Staff (Applied AI)
About Aptura
We build the evaluation datasets and RL environments that make AI reliable in domains where mistakes are expensive: finance, healthcare, and legal. Our team designs expert-curated training data, calibrated rubrics, and verifiable task environments for AI labs and startups pushing the frontier of what models can do in regulated industries.
We're a small, lean, London based team that moves fast and takes the work seriously. Everyone contributes directly. Initiative is rewarded, and ownership is the default. If you want to shape how frontier AI learns to operate in the real world, we'd like to hear from you.
About the Role
As a Member of Technical Staff on our Applied AI team, you will build the tasks and environments that AI labs use to train and evaluate their agents in finance, healthcare, and legal.
Day to day, that looks like: constructing RL environments around spreadsheets, documents, and professional workflows. Writing verification logic and reward functions. Working with domain experts to scope what a correct answer actually looks like in an LBO model or a clinical note. Some days it's engineering, some days it's closer to research. The common thread is that you're producing the ground truth that frontier models get measured against.
What You'll Do
Build RL environments across finance, healthcare, and legal domains
Assist in designing tasks with golden answers, calibrated rubrics, and programmatic reward signals
Write verification logic and reward functions that can distinguish good model outputs from bad ones
Work directly with domain experts (investment analysts, physicians, attorneys) to translate complex professional workflows into structured tasks
Prototype new approaches to evaluation, verification, and synthetic data generation
Who We're Looking For
Practical experience building with LLMs: prompting, evaluation, and agentic harnesses. You've built things that actually run, not just notebooks.
High agency and technically sharp. You don't wait for permission, specs, or a roadmap. You see what needs doing, figure out how, and get it done.
Comfortable working across very different contexts. The job moves between engineering, evaluation design, and deep collaboration with domain experts often in the same day.
You ship and iterate. Small team, no room for work that sits in review. Bias toward getting something working, learning from it, and improving it.
You own problems end to end, from scoping with a domain expert through to a working environment. If you prefer clearly partitioned tickets, this probably isn't the right fit.
Already using LLMs as part of how you build, not just as the thing you're building for.
Nice to Have
Domain knowledge in finance, healthcare, or legal
Familiarity with RL concepts, model training, and post-training workflows
Cloud infrastructure experience (AWS or GCP)
Previous startup experience, especially as an early engineer