Master's Internship on adapting automatic evaluation metrics for scientific document translation
Master's Internship on adapting automatic evaluation metrics for scientific document translation
Type de contrat : CDD
Niveau de diplôme exigé : Bac + 4 ou équivalent
Fonction : Stagiaire de la recherche
Contexte et atouts du poste
This internship will be financed by the ANR project MaTOS Machine Translation for Open Science which aims to develop new methods of automatically translating and evaluating scientific documents. The project focuses on translation between English and French, for which resources are readily available and translations are of a reasonable quality and coherence. The internship will be co-supervised by Rachel Bawden (Inria, ALMAnaCH project-team) and François Yvon (CNRS).
Mission confiée
Internship topic
Context
Clear progress has been in machine translation (MT) in recent years, particularly with the use of neural architectures (Bahdanau et al., 2015; Vaswani et al., 2017) and more recently large language models (LLMs) (Vilar et al., 2023; Hendy et al., 2023). The use of LLMs in particular make it easier to integrate extended linguistic context and additional resources such as terminologies (Oncevay et al., 2025), which are especially important for the translation of documents in specialised domains such as finance (Oncevay et al., 2025) and biomedical sciences (Rios, 2025).
Specialised document translation is still a major challenge in MT. However evaluation of the task, necessary for tracking progress, arguably represents an even greater challenge. In these settings, translation quality must take into account document-level properties of translation such as consistency, coherence (Abdul Rauf and Yvon, 2020; Peng et al., 2024; Dahan et al., 2024), and concentrate on aspects that are highly important for domain-specific translation such as the correct use of domain terminology (Neves et al., 2024; Semenov et al., 2025). However, it is still commonplace for generic metrics such as BLEU (Papineni et al., 2002), ChrF (Popović, 2015), and COMET (Rei et al., 2020) to be used for evaluation. Surface-form metrics such as BLEU and ChrF are well known to be limited by their reliance on the exact wording of reference translations (Callison-Burch et al., 2006). More recent learned metrics, while better correlated with human judgments in general settings (Freitag et al., 2024), can still struggle in specialised domains (Zouhar et al., 2024) and in the evaluation of longer text segments.
Aims
The aim of this internship is to investigate how neural metrics such as COMET can be adapted for document-level MT evaluation in specialised domains. A first task will be to compare different domain adaptation strategies for trained metrics, for example (i) using an in-domain pretrained model fine-tuned on general domain human judgments and (ii) using a generic pretrained model fine-tuned on domain-specific human judgments. A second task will be to explore fine-tuning approaches that increase sensitivity of the metric to document-level errors, in particular those related to consistency and coherence.
The internship will be carried out in the context of the MaTOS (Machine Translation for Open Science) ANR project, a project dedicated to machine translation of scholarly documents. As a first step, you will work with data in the biomedical domain, due to the availability of parallel evaluation data, system outputs and human judgments Zouhar et al. (2024), but with the possibility of extending the chosen approach to the two fields that are the focus of the MaTOS project, natural language processing and (NLP) and earth and planetary sciences (EPS). The internship will take place at Inria, Paris and will be supervised by Rachel Bawden (Inria) and François Yvon (CNRS), with potential interaction with other project members.
References
Sadaf Abdul Rauf and François Yvon. 2020. Document level contexts for neural machine translation. Research Report 2020-003, LIMSI-CNRS.
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the first International Conference on Learning Representations, San Diego, CA.
Chris Callison-Burch, Miles Osborne, and Philipp Koehn. 2006. Re-evaluating the Role of Bleu in Machine Translation Research. In 11th Conference of the European Chapter of the Association for Computational Linguistics, Trento, Italy.
Nicolas Dahan, Rachel Bawden, and François Yvon. 2024. Survey of Automatic Metrics for Evaluating Machine Translation at the Document Level. Technical report, Inria Paris, Sorbonne Université; Sorbonne Universite ; Inria Paris.
Markus Freitag, Nitika Mathur, Daniel Deutsch, Chi-Kiu Lo, Eleftherios Avramidis, Ricardo Rei, Brian Thompson, Frederic Blain, Tom Kocmi, Jiayi Wang, David Ifeoluwa Adelani, Marianna Buchicchio, Chrysoula Zerva, and Alon Lavie. 2024. Are LLMs breaking MT metrics? Results of the WMT24 metrics shared task. In Proceedings of the Ninth Conference on Machine Translation, pages 47–81, Stroudsburg, PA, USA. Association for Computational Linguistics.
Amr Hendy, Mohamed Abdelrehim, Amr Sharaf, Vikas Raunak, Mohamed Gabr, Hitokazu Matsushita, Young Jin Kim, Mohamed Afify, and Hany Hassan Awadalla. 2023. How Good Are GPT Models at Machine Translation? A Comprehensive Evaluation.
Mariana Neves, Cristian Grozea, Philippe Thomas, Roland Roller, Rachel Bawden, Aurélie Névéol, Steffen Castle, Vanessa Bonato, Giorgio Maria Di Nunzio, Federica Vezzani, Maika Vicente Navarro, Lana Yeganova, and Antonio Jimeno Yepes. 2024. Findings of the WMT 2024 biomedical translation shared task: Test sets on abstract level. In Proceedings of the Ninth Conference on Machine Translation, pages 124–138, Miami, Florida, USA. Association for Computational Linguistics.
Arturo Oncevay, Charese Smiley, and Xiaomo Liu. 2025. The impact of domain-specific terminology on machine translation for finance in European languages. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 2758–2775, Stroudsburg, PA, USA. Association for Computational Linguistics.
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 311–318, Philadelphia, Pennsylvania, USA. Association for Computational Linguistics.
Ziqian Peng, Rachel Bawden, and François Yvon. 2024. Handling Very Long Contexts in Neural Machine Translation: a Survey. Technical Report Livrable D3-2.1, Projet ANR MaTOS.
Maja Popović. 2015. chrF: character n-gram F-score for automatic MT evaluation. In Proceedings of the Tenth Workshop on Statistical Machine Translation, pages 392–395, Lisbon, Portugal. Association for Computational Linguistics.
Ricardo Rei, Craig Stewart, Ana C Farinha, and Alon Lavie. 2020. COMET: A Neural Framework for MT Evaluation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2685–2702, Online.
Miguel Rios. 2025. Instruction-tuned Large Language Models for Machine Translation in the Medical Domain. In Proceedings of Machine Translation Summit XX: Volume 1, pages 162–172.
Kirill Semenov, Xu Huang, Vil´em Zouhar, Nathaniel Berger, Dawei Zhu, Arturo Oncevay, and Pinzhen Chen. 2025. Findings of the WMT25 terminology translation task: Terminology is useful especially for good MTs. In Proceedings of the Tenth Conference on Machine Translation, pages 554–576, Stroudsburg, PA, USA. Association for Computational Linguistics.
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, pages 6000–6010, Long Beach, CA, USA.
David Vilar, Markus Freitag, Colin Cherry, Jiaming Luo, Viresh Ratnakar, and George Foster. 2023. Prompting PaLM for translation: Assessing strategies and performance. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Stroudsburg, PA, USA. Association for Computational Linguistics.
Vilém Zouhar, Shuoyang Ding, Anna Currey, Tatyana Badeka, Jenyuan Wang, and Brian Thompson. 2024. Fine-tuned machine translation metrics struggle in unseen domains. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 488–500, Stroudsburg, PA, USA. Association for Computational Linguistics.
Principales activités
The main activities of the internship will include :
- keeping up-to-date with related work on the topic
- weekly meetings to discuss reading and progress on the topic
- carrying out research on the topic outlined above, both in the development of new ideas, positioning with respect to related work and validation of the methodology via experiments and analysis
- the presentation of work both internally to colleagues and (depending on the progress carried out and the results obtained) externally in the form of a conference/journal/workshop paper
- interacting and exchanging with colleagues
Compétences
Candidates should be currently carrying out a Master 2 or equivalent (e.g. engineering school) in computer science (speciality artificial intelligence, machine learning or natural language processing).
They should have a good level in programming (python), experience with neural networks and an interest in natural language processing. A good written and spoken level of English is required, and knowledge of French is preferred.
Avantages
- Subsidized meals
- Partial reimbursement of public transport costs
- Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.)
- Possibility of teleworking and flexible organization of working hours
- Professional equipment available (videoconferencing, loan of computer equipment, etc.)
- Social, cultural and sports events and activities
- Access to vocational training
- Social security coverage
Informations générales
- Thème/Domaine : Langue, parole et audio
- Ville : Paris
- Centre Inria : Centre Inria de Paris
- Date de prise de fonction souhaitée : 2026-04-01
- Durée de contrat : 6 mois
- Date limite pour postuler : 2026-03-26
Attention: Les candidatures doivent être déposées en ligne sur le site Inria. Le traitement des candidatures adressées par d'autres canaux n'est pas garanti.
Consignes pour postuler
- A cover letter describing your motivation for the internship
- An up-to-date CV
- Grades obtained from the first year of your master's and any grades already obtained during the current academic year
Sécurité défense :
Ce poste est susceptible d’être affecté dans une zone à régime restrictif (ZRR), telle que définie dans le décret n°2011-1425 relatif à la protection du potentiel scientifique et technique de la nation (PPST). L’autorisation d’accès à une zone est délivrée par le chef d’établissement, après avis ministériel favorable, tel que défini dans l’arrêté du 03 juillet 2012, relatif à la PPST. Un avis ministériel défavorable pour un poste affecté dans une ZRR aurait pour conséquence l’annulation du recrutement.
Politique de recrutement :
Dans le cadre de sa politique diversité, tous les postes Inria sont accessibles aux personnes en situation de handicap.
Contacts
- Équipe Inria : ALMANACH
-
Recruteur :
Bawden Rachel / [email protected]
A propos d'Inria
Inria est l’institut national de recherche dédié aux sciences et technologies du numérique. Il emploie 2600 personnes. Ses 215 équipes-projets agiles, en général communes avec des partenaires académiques, impliquent plus de 3900 scientifiques pour relever les défis du numérique, souvent à l’interface d’autres disciplines. L’institut fait appel à de nombreux talents dans plus d’une quarantaine de métiers différents. 900 personnels d’appui à la recherche et à l’innovation contribuent à faire émerger et grandir des projets scientifiques ou entrepreneuriaux qui impactent le monde. Inria travaille avec de nombreuses entreprises et a accompagné la création de plus de 200 start-up. L'institut s'efforce ainsi de répondre aux enjeux de la transformation numérique de la science, de la société et de l'économie.