Data Scientist

Hyderabad, Gurugram April 15, 2026 Full Time

The Team: As a member of the EDO, Collection Platforms & AI Cognitive Engineering team you will

work on building GenAI-driven and ML-powered products and capabilities to power natural language

understanding, data extraction, information retrieval and data sourcing solutions for S&P Global.

You will define AI strategy, mentor others, and drive production-ready AI products and pipelines

while leading by example in a highly engaging work environment. You will work in a (truly) global

team and be encouraged for thoughtful risk-taking and self-initiative.

What’s in it for you:

• Be a part of a global company and build solutions at enterprise scale

• Lead and grow a highly skilled, hands-on technical team (including mentoring junior data

scientists)

• Contribute to solving high-complexity, high-impact problems end-to-end

• Architect and oversee production-ready pipelines from ideation to deployment

Responsibilities:

• Define AI roadmap, tooling choices, and best practices for model building, prompt

engineering, fine-tuning, and vector retrieval systems

• Architect, develop and deploy large-scale ML and GenAI-powered products and pipelines

• Own all stages of the data science project lifecycle, including:

Identification and scoping of high-value data science and AI opportunities
Partnering with business leaders, domain experts, and end-users to gather

requirements and align on success metrics

Evaluation, interpretation, and communication of results to executive stakeholders
Lead exploratory data analysis, proof-of-concepts, model benchmarking, and

validation experiments for both ML and GenAI approaches

Establish and enforce coding standards, perform code reviews, and optimize data

science workflows

Drive deployment, monitoring, and scaling strategies for models in production

(including both ML and GenAI services)

Mentor and guide junior data scientists; foster a culture of continuous learning and

innovation

Manage stakeholders across functions to ensure alignment and timely delivery

Technical Requirements:

• Hands-on experience with large language models (e.g., OpenAI, Anthropic, Llama), prompt

engineering, fine-tuning/customization, and embedding-based retrieval

• Expert proficiency in Python (NumPy, Pandas, SpaCy, scikit-learn, PyTorch/TF 2, Hugging

Face Transformers)

• Deep understanding of ML & Deep Learning models, including architectures for NLP (e.g.,

transformers), GNNs, and multimodal systems

• Strong grasp of statistics, probability, and the mathematics underpinning modern AI

• Ability to surf and synthesize current AI/ML research, with a track record of applying new

methods in production

• Proven experience on at least one end-to-end GenAI or advanced NLP project: custom NER,

table extraction via LLMs, Q&A systems, summarization pipelines, OCR integrations, or GNN

solutions

• Familiarity with orchestration and deployment tools: Docker, Airflow, Kubernetes, Redis,

Flask/Django/FastAPI, PySpark, SQL, R-Shiny/Dash/Streamlit

• Openness to evaluate and adopt emerging technologies and programming languages as

needed

Good to have:

• Master’s or Ph.D. in Computer Science, Statistics, Mathematics, or related field (minimum

Bachelor’s)

• 6+ years of relevant experience in Data Science/AI, with at least 2 years in a leadership or

technical lead role

• Prior experience in the Economics/Financial industry, especially with market-intelligence or

risk analytics products

• Public contributions or demos on GitHub, Kaggle, StackOverflow, technical blogs, or

publications