Team Lead – Next-Gen Testing Platform (Distributed Systems & AI)
The Mission
At WEKA, we are not just testing distributed systems—we are redefining how they are validated at scale.
Our goal is to build an intelligent, autonomous validation platform that evolves alongside our product. A system that doesn’t just detect failures, but predicts them. That doesn’t just test performance, but continuously pushes the limits of what’s possible in large-scale AI and storage environments.
This is not a traditional leadership role. This is an opportunity to lead the creation of a new engineering discipline at the intersection of distributed systems, quality, and AI.
The Role
As a Team Lead, you are both the architect and the driver.
You will lead a small, elite team while remaining deeply hands-on, building the core infrastructure that will validate one of the most advanced distributed storage systems in the world.
You won’t inherit a system—you’ll define it.
From designing intelligent testing frameworks to shaping how AI is embedded into validation workflows, you will set the technical direction and raise the bar for quality across the organization.
What You'll Own:
- Technical Leadership at the Frontier: Define and drive the strategy for testing and quality infrastructure across WEKA — setting standards, shaping architecture, and influencing how every engineering team thinks about reliability.
- Next-Gen Framework Design: Architect and build a distributed testing platform that validates correctness, performance, and resilience at a scale most engineers never encounter.
- AI-Native Validation: Lead the adoption of AI-driven approaches — automated test generation, intelligent workload synthesis, and anomaly detection — to multiply the team's impact and stay ahead of the product's complexity.
- Chaos at Scale: Design end-to-end environments that simulate real-world failure conditions — extreme concurrency, fault injection, and stress scenarios that push the platform to its theoretical limits.
- Team Multiplier: Mentor and grow a team of sharp engineers, fostering a culture of technical rigor, ownership, and reliability obsession — while never losing your own hands-on edge.
What You Bring:
- 8+ years of hands-on experience building large-scale distributed systems in storage, networking, or cloud infrastructure and 1–3 years of leading or mentoring engineers.
- Deep expertise in system correctness, concurrency, and reliability - and the instincts to know where things break before they do.
- Strong coding skills in Python, Go, C++, or Rust.
- A proven ability to design frameworks that make other engineers faster and more confident.
- A forward-leaning mindset on AI/ML-driven testing and automation.
Why This Role Is Different
Most roles ask you to improve testing.
This role asks you to reinvent it.
You’ll operate at the intersection of infrastructure, scale, and intelligence—solving problems that don’t have existing playbooks.
If you’re excited about building systems that test systems—and leading the people who build them—this is where it happens