NVIDIA AI Infrastructure Specialist (GDC)
We are looking for a skilled professional with 3 to 8 years of experience to join our team as an NVIDIA AI Infrastructure Specialist (GDC). The ideal candidate will have proven experience in deploying and managing Kubernetes clusters for AI/ML workloads.
Roles and Responsibility
- Deploy and manage Kubernetes clusters for AI/ML workloads at scale.
- Utilize expertise in infrastructure and resource management, including virtualization tools such as VMware/EXSi, KVM, Ansible, Redfish, and Strong understanding of the Run:AI platform, including job scheduling, quota management, and GPU virtualization.
- Collaborate with cross-functional teams to design and implement AI infrastructure solutions.
- Develop and maintain technical documentation for AI infrastructure systems.
- Troubleshoot and resolve complex technical issues related to AI infrastructure.
- Ensure compliance with industry standards and best practices for AI infrastructure security and deployment.
Job Requirements
- Proven experience in deploying and managing Kubernetes clusters for AI/ML workloads.
- Experience with Azure Kubernetes Service, RedHat OpenShift, Microk8s, and Helm Charts.
- Expertise in containerization, microservices, and cloud-native design principles.
- Proficiency in Python, C++, and optionally dot net/C# for enterprise integration.
- Familiarity with DGX systems, Jetson, and NVIDIAs AI Factory components.
- Knowledge of NIM, NeMO, TAO, Triton, and Nucleus Servers.
- Understanding of Docker and Kubernetes.
- Ability to work collaboratively in a fast-paced environment.
- Strong problem-solving skills and attention to detail.