Principal Engineer - Python API Development

Jersey City, NJ April 27, 2026 Full Time

Job Description:

Note: Fidelity will not provide immigration sponsorship for this position.

Principal Engineer - Python API Development

The Role:

As a Principal Engineer on the Enterprise AI/ML Platform team, you will tackle the most complex technical challenges involved in delivering machine learning at enterprise scale. You will design, build, and evolve reliable, secure, and cost‑efficient platform capabilities—from model packaging and serving to observability and lifecycle management—working closely with multiple teams to ensure these capabilities are practical, robust, and widely usable in production.

You will take a hands‑on role across enterprise repositories, improving shared services, CI/CD workflows, and infrastructure patterns where they have the greatest impact. This includes deep technical investigation of performance and scalability issues, such as tracking down bottlenecks in web services, analyzing system and application metrics, and optimizing GPU utilization, throughput, and resource efficiency across ML workloads.

The Expertise & Skills You Bring

Bachelor’s or Master’s degree in Computer Science, Software Engineering, or a closely related engineering discipline; 8+ years (typically 10+) building and operating production platforms and services at scale.
Deep software engineering expertise in Python and distributed systems, with a track record of building production‑grade services, libraries, and internal platforms. You model engineering excellence through clean designs, automated testing, and maintainable abstractions; Linux fluency and scripting are required.
Familiarity with Java or Groovy is a plus.
Knowledge or experience with GenAI Gateways or LiteLLM a big plus.
Cloud platform leadership (AWS)—hands‑on with S3, Lambda, Batch, Step Functions, EventBridge, CloudWatch, and SNS/SQS—and experience shaping platform patterns that other teams adopt. Experience enabling managed ML services (e.g., SageMaker) as part of broader platform capabilities; exposure to Azure or GCP is beneficial.
DevOps and CI/CD at scale, owning standards for automated build/test/deploy (e.g., Jenkins, Git‑based workflows), containerization (Docker), release governance, and multi‑environment promotion for ML‑enabled workloads.
Infrastructure as Code (CloudFormation, Terraform/OpenTofu) and platform reliability engineering (SLOs/error budgets, capacity planning, cost observability, incident response, and post‑mortems) for ML serving and data/feature pipelines.
ML enablement in production: model packaging, deployment strategies (batch/online/streaming), inference routing, traffic management, performance tuning, observability, and controls for responsible use—without a research or modeling focus.
Cross‑org technical leadership: you mentor junior and senior engineers, are a backbone of code review across repos, and routinely consider impacts on upstream/downstream systems when proposing changes.
Set platform strategy and standards for ML packaging, deployment, serving, and observability—driving consistent adoption across squads and business units.
Partner with Data Scientists to package, scale, and operationalize models; define the APIs, guardrails, and automation that take work from experimentation to reliable production.
Enable secure, scalable access to traditional and generative models by collaborating with platform and application engineers to integrate through enterprise gateways and services.
Advance model/data observability—tooling for data and feature drift detection, prediction‑quality monitoring and uncertainty signals, and automated diagnostics/ explainability.
Lead cross‑platform incident response and post‑mortems, drive systemic fixes, and evolve standards to prevent recurrence—across applications and the platform.
Uplevel engineering velocity by introducing reusable frameworks, paved paths, and CI/CD templates that simplify integration, reduce toil, and improve reliability at scale.
Reduce cost and complexity across the ML ecosystem through pragmatic technology choices, clear abstractions, and a long‑term platform roadmap.

The Team

The Enterprise Data Science Platform, part of the Fidelity Data Architecture team within the Enterprise Technology business unit, is responsible for delivering scalable AI/ML capabilities across the organization. The team designs and builds advanced cloud-based, open-source, software platforms in close collaboration with Data Scientists, enabling the efficient packaging, deployment, and operation of AI/ML models at production scale.

In addition, the platform develops and maintains enterprise-grade gateways that allow teams across the company to securely discover, access, and consume AI/ML models. These gateways provide critical visibility into model usage and costs, while generating insights into model effectiveness, adoption patterns, and opportunities for continuous improvement.

The base salary range for this position is $107,000-216,000 USD per year.

Placement in the range will vary based on job responsibilities and scope, geographic location, candidate’s relevant experience, and other factors.

Base salary is only part of the total compensation package. Depending on the position and eligibility requirements, the offer package may also include bonus or other variable compensation.

We offer a wide range of benefits to meet your evolving needs and help you live your best life at work and at home. These benefits include comprehensive health care coverage and emotional well-being support, market-leading retirement, generous paid time off and parental leave, charitable giving employee match program, and educational assistance including student loan repayment, tuition reimbursement, and learning resources to develop your career. Note, the application window closes when the position is filled or unposted.

Please be advised that Fidelity’s business is governed by the provisions of the Securities Exchange Act of 1934, the Investment Advisers Act of 1940, the Investment Company Act of 1940, ERISA, numerous state laws governing securities, investment and retirement-related financial activities and the rules and regulations of numerous self-regulatory organizations, including FINRA, among others. Those laws and regulations may restrict Fidelity from hiring and/or associating with individuals with certain Criminal Histories.

Most roles at Fidelity are Hybrid, requiring associates to work onsite every other week (all business days, M-F) in a Fidelity office. This does not apply to Remote or fully Onsite roles. Please consult with your recruiter for the specific expectations for this position.

Certifications:

Category:

Information Technology

Principal Engineer - Python API Development

Job Description:

Note: Fidelity will not provide immigration sponsorship for this position.

Principal Engineer - Python API Development

The Expertise & Skills You Bring

Infrastructure as Code (CloudFormation, Terraform/OpenTofu) and platform reliability engineering (SLOs/error budgets, capacity planning, cost observability, incident response, and post‑mortems) for ML serving and data/feature pipelines.

ML enablement in production: model packaging, deployment strategies (batch/online/streaming), inference routing, traffic management, performance tuning, observability, and controls for responsible use—without a research or modeling focus.

Cross‑org technical leadership: you mentor junior and senior engineers, are a backbone of code review across repos, and routinely consider impacts on upstream/downstream systems when proposing changes.

Set platform strategy and standards for ML packaging, deployment, serving, and observability—driving consistent adoption across squads and business units.

Partner with Data Scientists to package, scale, and operationalize models; define the APIs, guardrails, and automation that take work from experimentation to reliable production.

Enable secure, scalable access to traditional and generative models by collaborating with platform and application engineers to integrate through enterprise gateways and services.

Advance model/data observability—tooling for data and feature drift detection, prediction‑quality monitoring and uncertainty signals, and automated diagnostics/ explainability.

Lead cross‑platform incident response and post‑mortems, drive systemic fixes, and evolve standards to prevent recurrence—across applications and the platform.

The Team

Certifications:

Category:

Before you apply