Data Scientist
The Team: As a member of the EDO, Collection Platforms & AI Cognitive Engineering team you will
work on building GenAI-driven and ML-powered products and capabilities to power natural language
understanding, data extraction, information retrieval and data sourcing solutions for S&P Global.
You will define AI strategy, mentor others, and drive production-ready AI products and pipelines
while leading by example in a highly engaging work environment. You will work in a (truly) global
team and be encouraged for thoughtful risk-taking and self-initiative.
What’s in it for you:
• Be a part of a global company and build solutions at enterprise scale
• Lead and grow a highly skilled, hands-on technical team (including mentoring junior data
scientists)
• Contribute to solving high-complexity, high-impact problems end-to-end
• Architect and oversee production-ready pipelines from ideation to deployment
Responsibilities:
• Define AI roadmap, tooling choices, and best practices for model building, prompt
engineering, fine-tuning, and vector retrieval systems
• Architect, develop and deploy large-scale ML and GenAI-powered products and pipelines
• Own all stages of the data science project lifecycle, including:
- Identification and scoping of high-value data science and AI opportunities
- Partnering with business leaders, domain experts, and end-users to gather
requirements and align on success metrics
- Evaluation, interpretation, and communication of results to executive stakeholders
- Lead exploratory data analysis, proof-of-concepts, model benchmarking, and
validation experiments for both ML and GenAI approaches
- Establish and enforce coding standards, perform code reviews, and optimize data
science workflows
- Drive deployment, monitoring, and scaling strategies for models in production
(including both ML and GenAI services)
- Mentor and guide junior data scientists; foster a culture of continuous learning and
innovation
- Manage stakeholders across functions to ensure alignment and timely delivery
Technical Requirements:
• Hands-on experience with large language models (e.g., OpenAI, Anthropic, Llama), prompt
engineering, fine-tuning/customization, and embedding-based retrieval
• Expert proficiency in Python (NumPy, Pandas, SpaCy, scikit-learn, PyTorch/TF 2, Hugging
Face Transformers)
• Deep understanding of ML & Deep Learning models, including architectures for NLP (e.g.,
transformers), GNNs, and multimodal systems
• Strong grasp of statistics, probability, and the mathematics underpinning modern AI
• Ability to surf and synthesize current AI/ML research, with a track record of applying new
methods in production
• Proven experience on at least one end-to-end GenAI or advanced NLP project: custom NER,
table extraction via LLMs, Q&A systems, summarization pipelines, OCR integrations, or GNN
solutions
• Familiarity with orchestration and deployment tools: Docker, Airflow, Kubernetes, Redis,
Flask/Django/FastAPI, PySpark, SQL, R-Shiny/Dash/Streamlit
• Openness to evaluate and adopt emerging technologies and programming languages as
needed
Good to have:
• Master’s or Ph.D. in Computer Science, Statistics, Mathematics, or related field (minimum
Bachelor’s)
• 6+ years of relevant experience in Data Science/AI, with at least 2 years in a leadership or
technical lead role
• Prior experience in the Economics/Financial industry, especially with market-intelligence or
risk analytics products
• Public contributions or demos on GitHub, Kaggle, StackOverflow, technical blogs, or
publications