System Performance Engineer (CPU, GPU, AI/ML) for Automotive platforms

Bangalore, Karnataka, India March 7, 2026 Eightfold Ai
A deep understanding of CPU, GPU and DDR architecture internals like CPU caches (L1, L2, L3), instruction pipelines, branch prediction, memory hierarchy (including register, cache, and main memory) and multi-core and multi-threaded processing. A good understanding of performance impact with memory access, instruction dependencies, and contention for shared resources. GPU: Understanding of parallel computing concepts, Knowledge of modern GPU architecture, including their memory hierarchies and performance bottlenecks. DDR: Understanding of JEDEC specifications LPDDR4, LPDDR5, LPDDR6, including command and timing parameters (e.g., ???, ????, tRP), memory organization (rows, columns, banks), and basic view of training and initialization sequences, how a memory controller works and its specific features like command queue, port arbitration, and various control schemes. Expertise how operating systems manage processes, threads, memory, MMU, and interrupt handling. This knowledge is crucial in understanding software for the kernel scheduler and system-level bottlenecks. Good understanding of Benchmarks CPU (like GeekBench, SpecInt, CoreMark etc), GPU (like Manhattan3.1, Aztec Ruins (Vulcan/OpenGL), Car chase, 3Dmark etc.) and DDR (like lat_mem, stream, bw_mem etc.) and how they exercise the underlying CPU/GPU/DDR architecture. Proficiency in reading and, in some cases, writing assembly code to understand the precise instructions a program executes and identify inefficiencies that compilers may miss. GPU-specific languages like CUDA or OpenCL are important for understanding GPU workloads. Good understanding C/C++/Python is added advantage. Experience with a variety of performance monitoring tools like Intel VTune, Linux perf, and Utilities like top, vmstat, iostat, and netstat to monitor system resources like CPU, memory, and I/O. Experience with software tools to monitor system hotspots, command bus utilization, and identify memory traffic patterns is critical. This includes validating that the traffic generated by software is as expected. Good understanding of memory allocation policies, prefetching, and caching to minimize latency and maximize bandwidth. Understanding how an application accesses memory is vital. Skills in profiling code to analyze memory access patterns and then optimizing the code for better data locality. Analyzing large datasets from performance tests requires strong statistical skills. This can involve creating histograms of transaction latencies and deriving performance metrics to understand a system's behavior. Experience working on above listed required skills on any ARM/x86 based platforms, mobile/automotive operating systems and/or performance profiling tools. Experience in application or driver development in LinuxQNX and ability to create/customize make files with various compiler options is a plus. Additional skills in the following areas are preferred: Knowledge of Computer architecture, LP5 DDR, Bus/NOC profiling is a big plus. Fundamentals on any operating system like Linux/QNX/Hypervisor & experience working on any Automative applications. Experience in creating professional quality reports and slides using MS Office or any advanced visualization tools. Experience in PoC development and competitive analysis Knowledge on Voltage/Power/ Thermal domain is plus. Required: Bachelor's, Computer Engineering and/or Electrical Engineering Key Skills: CPU Architecture, CPU Performance, Linux Kernel, MMU, MPAM, Prefetchers, Multicore, Benchmarks, DDR Performance, Operating system Bachelor's degree in Engineering, Information Systems, Computer Science, or related field and 3+ years of Systems Engineering or related work experience. OR Master's degree in Engineering, Information Systems, Computer Science, or related field and 2+ years of Systems Engineering or related work experience. OR PhD in Engineering, Information Systems, Computer Science, or related field and 1+ year of Systems Engineering or related work experience. Drive Performance analysis on silicon using various System and Cores (i.e. CPU, GPU, Memory and AI/ML) benchmarks like Dhrystone, GeekBench, SPECInt, CNN/GenAI ML networks etc. Use of Performance tools to analyze the load patterns across IPs and identify any performance bottlenecks in system. Analyzing Perf KPIs of SoC subsystems like CPU, GPU, NSP, Memory, and corelate performance with projection Evaluate and characterize performance at various junction temperatures and optimize running at high ambient temperatures. Analyze and optimize the System performance parameters of SoC infrastructure like NoC, LP5 DDR, etc. Collaborate with cross-functional global teams to plan and execute performance activities on chipsets as well as make recommendations for next generation chipsets.
Apply on company site

How to Get Hired at Qualcomm

  • Qualcomm is a technology powerhouse with over many open openings spanning engineering, business, and operations roles across global locations — research the specific business unit and technology domain before applying.
  • The Eightfold AI-powered careers portal uses advanced matching algorithms, so a comprehensive, well-formatted profile with detailed skills and experience will maximize your visibility to recruiters.
Read the full guide

How well do you match this role?

Check My Resume