Senior Software Engineer Libfabric & User-Space Networking (CXI)
We are focused on delivering innovative solutions that accelerate our customers digital transformation, enabling them to tackle their complex, and data-intensive workloads. Combining deep expertise and the development of the world s most cutting-edge, high-performance supercomputers, is defining the next era of computing delivering valuable insight & innovation. Join us and redefine what s next for you.
What youll do:
Key Responsibilities
- Implement libfabric CXI providers (user-space and kernel-assisted paths):
- Endpoint, MR, CQ, and queue models
- SR-IOV-aware abstractions and resource sharing
-
- Develop and maintain libcxi and related user-space libraries:
- Efficient interaction with the CXI User Driver
- Retry handling, error propagation, and performance tuning
-
- Enable and optimize CXI support across ecosystem components:
- MPI (OpenMPI, MPICH)
- NCCL / RCCL / GPU-aware communication
- SHMEM, storage, and AI frameworks
-
- Collaborate with kernel driver teams to co-design clean, scalable APIs
What you need to bring:
Required Qualifications
- 10+ years of experience in systems or user-space networking software
- Strong expertise in:
- libfabric, RDMA concepts, or high-performance communication APIs
- C/C++ systems programming
-
- Deep understanding of user/kernel interaction models and performance tradeoffs
- Experience debugging complex distributed and multi-node systems
Preferred Qualifications
- Experience with HPC or AI communication stacks (MPI, NCCL, SHMEM)
- GPU-aware networking and GPUDirect-style architectures
- Familiarity with virtualization or SR-IOV impacts on user-space libraries
Alternate / Equivalent Skill Set (Network Operating Systems Background)
Candidates with a strong Network Operating System (NOS) background may be considered, provided they demonstrate deep systems expertise and the ability to work close to hardware and performance-critical paths.
Relevant Experience Includes
- Hands-on development experience with carrier-grade or data-center NOS platforms , such as:
- Cisco IOS-XR
- Juniper Junos
- Arista EOS
- Or equivalent Linux-based network operating systems
-
- Strong understanding of Linux kernel and user-space interactions in networking stacks:
- Netlink, netdev, sockets, offload paths
- Kernel modules, drivers, or platform abstraction layers
-
- Experience working with high-performance data plane components , including:
- Packet processing pipelines
- Queueing, scheduling, QoS, and congestion management
- Hardware offloads and ASIC-facing software layers
-
Additional Skills: