Software Engineer III - DevOps
O.C. Tanner is the global leader in software and services that improve workplace culture through meaningful employee experiences. Our Culture Cloud is a suite of apps designed to enhance the employee experience with strategic recognition, service awards, wellbeing, leadership, and events that help people thrive at work. Our Culture by Design approach provides expert services to organizations looking to create great workplaces.
Our global team of 1,500 people hail from 58 countries and speak 62 languages. As programmers, researchers, designers, client professionals and craftspeople we create the tech, tools and awards that connect employees to purpose at thousands of companies. Join us as we help people all over the world thrive at work.
Job Summary:
As a Lead Platform Engineer, you will oversee the design, implementation, and optimization of our CI/CD pipelines and AWS cloud infrastructure. You will lead a team of Platform engineers, collaborate with cross-functional stakeholders, and ensure our systems are scalable, secure, and resilient. Your expertise in Kubernetes, Infrastructure as Code, and Observability will be critical in modernizing our technology stack and fostering a culture of automation and reliability.
This role includes strategic planning, technical leadership, and hands-on engineering. Participation in an on-call rotation and occasional support outside of business hours is expected.
Key Responsibilities:
Lead and mentor a team of Platform engineers, fostering growth and technical excellence.
Architect and manage cloud infrastructure using Infrastructure as Code tools via Terraform.
Lead and perform research to find solutions for complex business problems as they relate to infrastructure.
Oversee CI/CD pipeline development and optimization using GitHub Actions and Argo.
Drive automation initiatives using Golang, Helm, and Bash for infrastructure and monitoring.
Enhance system observability through logging, monitoring, and alerting solutions.
Collaborate with development, architecture, operations, and security teams to align infrastructure with business goals across time zones.
Ensure high availability, performance, and security of production systems.
Evaluate and integrate emerging Platform tools and practices to improve efficiency and reliability.
Lead incident response and root cause analysis for infrastructure-related issues.
Qualifications:
8+ years of experience in Platform or related infrastructure engineering, with at least 2 years in a technical leadership role.
Deep expertise with Infrastructure as Code tooling (Terraform), CI/CD pipelines, and strong knowledge of Argo, Helm, and GitHub Actions.
Proficiency in scripting languages and knowledge of cloud platforms.
Strong communication, collaboration, and analytical skills. Ability to work in a team and manage multiple tasks simultaneously.
Expertise with observability tools and techniques (e.g., logging, metrics, monitoring, and alerting).
Expertise with compliance and risk management requirements (e.g., security, PII, SOC, ISO, etc.)
Excellent troubleshooting and debugging skills, with experience resolving complex infrastructure and application issues.
Excellent communication and collaboration skills, with the ability to work with minimal supervision.
Experience developing system requirements, documentation, architecture diagrams, and implementation plans.
Bonus Skills:
AWS or DevOps-related certifications
Expertise with Cloud Optimization strategies
Experience working with multiple cloud providers