LLM Serving Engineer (Cloud AI Engineering), Senior / Staff Engineer
Building a scalable LLM inference platform using inference techniques (e.g. disaggregated serving and KV-Cache management, advanced parallelism, speculative algorithms, model optimization, specialized kernels). Contribute to the development of LLM Serving packages (e.g. vLLM, SGLang, TGI, Triton-Inference server, Dynamo, LLM-d). Work closely with customers to drive solutions by collaborating with internal compiler, firmware and platform teams. Work at the forefront of GenAI by understanding advanced algorithms (e.g. attention mechanisms, MoEs) and numerics to identify new optimization opportunities. Drive efficient serving through smart autoscaling, load balancing and routing. Engage with open-source serving communities to evolve the framework. Experience in analyzing, profiling, and optimizing deep learning workloads. Open-source contribution to any GenAI package. Experience architecting and developing large-scale distributed systems. High-level kernel design experience (PyTorch, CUDA, Triton). Knowledge of torch.compile or torchDynamo PhD in Computer Science, Computer Engineering or Machine Learning Bachelor's degree in Computer Science, Engineering, Information Systems, or related field and 4+ years of Hardware Engineering, Software Engineering, Systems Engineering, or related work experience. OR Master's degree in Computer Science, Engineering, Information Systems, or related field and 3+ years of Hardware Engineering, Software Engineering, Systems Engineering, or related work experience. OR PhD in Computer Science, Engineering, Information Systems, or related field and 2+ years of Hardware Engineering, Software Engineering, Systems Engineering, or related work experience.