AI Inference Engineer

San Francisco February 22, 2026 Full Time Ashby

We are looking for an AI Inference engineer to join our growing team. Our current stack is Python, Rust, C++, PyTorch, Triton, CUDA, Kubernetes. You will have the opportunity to work on large-scale deployment of machine learning models for real-time inference.

Responsibilities

  • Develop APIs for AI inference that will be used by both internal and external customers

  • Benchmark and address bottlenecks throughout our inference stack

  • Improve the reliability and observability of our systems and respond to system outages

  • Explore novel research and implement LLM inference optimizations

Qualifications

  • Experience with ML systems and deep learning frameworks (e.g. PyTorch, TensorFlow, ONNX)

  • Familiarity with common LLM architectures and inference optimization techniques (e.g. continuous batching, quantization, etc.)

  • Understanding of GPU architectures or experience with GPU kernel programming using CUDA

 

Apply on company site

How to Get Hired at Perplexity

  • Perplexity is building the next generation of search — an AI answer engine that delivers direct, cited responses instead of traditional blue links
  • The company uses Ashby as its ATS, so submit clean, well-formatted resumes in PDF format with standard section headers
Read the full guide

How well do you match this role?

Check My Resume