Senior Site Reliability Engineer
Business Unit
Technology Engineering Group (TEG) is responsible for supporting the company and its business groups on technology and operational platforms, as well as the construction and operation of R&D management and data centers, TEG provides users with a full range of customer services. As the operator of the largest networking, devices, and data center in Asia,TEG also leads the Tencent Technology Committee in strengthening infrastructure R&D through internal and distributed open source collaboration, constructing new platforms and supporting business innovation.
What the Role Entails
1. Responsible for the operation and maintenance of overseas model services at Hunyuan, ensuring stable, reliable, and efficient service operations;
2. Responsible for capacity management and planning, resource cost optimization, ensuring reasonable online service capacity and improving resource efficiency;
3. Responsible for continuous integration and delivery, efficient and automated operational optimization, enhancing service stability and research and development efficiency;
4. Participate in the design of online systems and various service architectures, providing professional solutions for stability and architecture improvement;
5. Analyze and deeply explore the shortcomings of existing systems, data-driven to find weak points, and promote system optimization implementation and improvement;
6. Pay attention to industry front-end technology trends, explore technologies and directions for automation and intelligence in the operation and maintenance of complex business systems.
Who We Look For
1. Bachelor's degree or above, with 2 years or more experience in internet operations and maintenance;
2. Familiar with Linux operating system, with solid system management and network knowledge;
3. Familiar with deploying, configuring, and tuning components such as Nginx, Redis, MySQL;
4. Proficient in monitoring systems such as Zabbix, Prometheus, Grafana, real-time graspin