LLM Serving Engineer (Cloud AI Engineering), Senior / Staff Engineer

San Diego, California, United States of America March 7, 2026 Eightfold Ai

Building a scalable LLM inference platform using inference techniques (e.g. disaggregated serving and KV-Cache management, advanced parallelism, speculative algorithms, model optimization, specialized kernels). Contribute to the development of LLM Serving packages (e.g. vLLM, SGLang, TGI, Triton-Inference server, Dynamo, LLM-d). Work closely with customers to drive solutions by collaborating with internal compiler, firmware and platform teams. Work at the forefront of GenAI by understanding advanced algorithms (e.g. attention mechanisms, MoEs) and numerics to identify new optimization opportunities. Drive efficient serving through smart autoscaling, load balancing and routing. Engage with open-source serving communities to evolve the framework. Experience in analyzing, profiling, and optimizing deep learning workloads. Open-source contribution to any GenAI package. Experience architecting and developing large-scale distributed systems. High-level kernel design experience (PyTorch, CUDA, Triton). Knowledge of torch.compile or torchDynamo PhD in Computer Science, Computer Engineering or Machine Learning Bachelor's degree in Computer Science, Engineering, Information Systems, or related field and 4+ years of Hardware Engineering, Software Engineering, Systems Engineering, or related work experience. OR Master's degree in Computer Science, Engineering, Information Systems, or related field and 3+ years of Hardware Engineering, Software Engineering, Systems Engineering, or related work experience. OR PhD in Computer Science, Engineering, Information Systems, or related field and 2+ years of Hardware Engineering, Software Engineering, Systems Engineering, or related work experience.

Apply on company site

How to Get Hired at Qualcomm

Qualcomm is a technology powerhouse with over many open openings spanning engineering, business, and operations roles across global locations — research the specific business unit and technology domain before applying.
The Eightfold AI-powered careers portal uses advanced matching algorithms, so a comprehensive, well-formatted profile with detailed skills and experience will maximize your visibility to recruiters.

Read the full guide

How well do you match this role?

Check My Resume