DevOps Engineer

About Bureau
Bureau is a unified risk decisioning platform for Compliance, Fraud, and Transaction risks. Our platform is a single decision-making engine, powered by a 1 billion+ identity knowledge graph. Over 150 Banks, fintechs, retailers, and digital platforms use Bureau to verify identities faster and stop fraud earlier globally.
Bureau has raised $50M+ from renowned Silicon Valley and global investors including Sorenson Capital and PayPal Ventures and is expanding rapidly from APAC to Americas, Europe, and beyond.
Why Bureau?
Bureau is building the infrastructure that makes digital identities and transactions safe and trustworthy for billions of people. The mission is big, the problems are complex, and the impact is real.
We hire people who want that level of responsibility. People who move fast, build systems from scratch, and care deeply about turning strategy into execution. If you want predictability or narrow scope, this won't be your place. If you want to shape how a scaling global company operates—keep reading.
About the Role - DevOps Engineer
As a DevOps Engineer, you will build and operate reliable, secure, and scalable cloud-native platforms. You will own CI/CD pipelines, Infrastructure as Code (IaC), and production operations for distributed systems that run across cloud and on-prem environments. A key focus for this role is data platform DevOps, meaning hands-on depth with Kafka, ClickHouse, Redis, and operating high-throughput, low-latency services with strong observability and incident discipline.
What you’ll be doing
Own and improve CI/CD for multiple services, enabling safe, fast, repeatable deployments across environments.
Build and operate Kubernetes-based platforms (cloud and on-prem), including cluster operations, upgrades, capacity planning, and reliability practices.
Implement and maintain Infrastructure as Code (Terraform-first) for cloud networking, compute, security, and platform services.
Drive automation across provisioning, deployments, scaling, failover, backups, and operational workflows.
Design production-grade patterns for high availability, performance, security, and scalability, including load balancing, autoscaling, and resilience.
Partner closely with Engineering, Security, and Product to streamline the full delivery lifecycle and reduce operational friction.
Prototype solutions and run PoCs for emerging tools, platform improvements, and operational accelerators.
Troubleshoot complex production issues using deep understanding of service topology, dependencies, and failure modes. Define mitigation strategies and communicate clearly.
Participate in on-call and incident response. Drive a culture of post-mortems, root cause analysis, and preventive engineering.
Support customer deployments where needed, including on-prem and regulated enterprise environments, ensuring repeatable and auditable delivery.
What You’ll Bring
3 to 5 years of hands-on experience in a DevOps, SRE, or Platform Engineering role.
Strong Kubernetes experience including cluster operations and production troubleshooting.
Cloud experience on one or more of AWS, Azure, GCP, OCI, with an ability to adapt across clouds.
Strong IaC skills, especially Terraform, plus clear module and environment design practices.
Strong Linux fundamentals and proven experience debugging production environments.
Strong scripting in at least one of: Shell, Python, Go, Ruby.
CI/CD experience with GitHub Actions (or equivalent) and CD tooling like ArgoCD (or equivalent GitOps model).
Strong experience with observability stacks: logging, metrics, tracing (ELK, Prometheus, Grafana, OpenTelemetry, etc.).
Strong fundamentals in AWS networking concepts (VPCs, subnets, routing, NAT, security groups, NACLs, private connectivity, DNS).
Excellent communication and problem-solving skills, comfortable operating in a fast-paced environment.
Must-have: Data Platform DevOps Experience
You should have meaningful hands-on experience operating at least 2 of the following in production:
Kafka (clusters, topic design considerations, throughput, lag, partitions, retention, broker health, client tuning)
ClickHouse (ingestion patterns, merges, partitions, schema and performance tuning basics, storage planning)
Redis (clustering, persistence modes, memory sizing, latency, failover behaviour)
Bonus if you have managed these alongside high-scale application workloads on Kubernetes.
Our Culture
We hire self-motivated people and get out of their way
We value performance, not hours worked
Speed, ownership, and impact matter most
Compensation
Competitive salary + potential equity
Health benefits, flexible PTO, learning budget

