Master's Internship on adapting automatic evaluation metrics for scientific document translation

March 21, 2026 Custom Inria Recruitment Portal (Jobs.inria.fr)

Master's Internship on adapting automatic evaluation metrics for scientific document translation

Le descriptif de l’offre ci-dessous est en Anglais

Type de contrat : CDD

Niveau de diplôme exigé : Bac + 4 ou équivalent

Fonction : Stagiaire de la recherche

Contexte et atouts du poste

This internship will be financed by the ANR project MaTOS Machine Translation for Open Science which aims to develop new methods of automatically translating and evaluating scientific documents. The project focuses on translation between English and French, for which resources are readily available and translations are of a reasonable quality and coherence. The internship will be co-supervised by Rachel Bawden (Inria, ALMAnaCH project-team) and François Yvon (CNRS).

Mission confiée

Internship topic

Context

Clear progress has been in machine translation (MT) in recent years, particularly with the use of neural architectures (Bahdanau et al., 2015; Vaswani et al., 2017) and more recently large language models (LLMs) (Vilar et al., 2023; Hendy et al., 2023). The use of LLMs in particular make it easier to integrate extended linguistic context and additional resources such as terminologies (Oncevay et al., 2025), which are especially important for the translation of documents in specialised domains such as finance (Oncevay et al., 2025) and biomedical sciences (Rios, 2025).

Specialised document translation is still a major challenge in MT. However evaluation of the task, necessary for tracking progress, arguably represents an even greater challenge. In these settings, translation quality must take into account document-level properties of translation such as consistency, coherence (Abdul Rauf and Yvon, 2020; Peng et al., 2024; Dahan et al., 2024), and concentrate on aspects that are highly important for domain-specific translation such as the correct use of domain terminology (Neves et al., 2024; Semenov et al., 2025). However, it is still commonplace for generic metrics such as BLEU (Papineni et al., 2002), ChrF (Popović, 2015), and COMET (Rei et al., 2020) to be used for evaluation. Surface-form metrics such as BLEU and ChrF are well known to be limited by their reliance on the exact wording of reference translations (Callison-Burch et al., 2006). More recent learned metrics, while better correlated with human judgments in general settings (Freitag et al., 2024), can still struggle in specialised domains (Zouhar et al., 2024) and in the evaluation of longer text segments.

Aims

The aim of this internship is to investigate how neural metrics such as COMET can be adapted for document-level MT evaluation in specialised domains. A first task will be to compare different domain adaptation strategies for trained metrics, for example (i) using an in-domain pretrained model fine-tuned on general domain human judgments and (ii) using a generic pretrained model fine-tuned on domain-specific human judgments. A second task will be to explore fine-tuning approaches that increase sensitivity of the metric to document-level errors, in particular those related to consistency and coherence.

The internship will be carried out in the context of the MaTOS (Machine Translation for Open Science) ANR project, a project dedicated to machine translation of scholarly documents. As a first step, you will work with data in the biomedical domain, due to the availability of parallel evaluation data, system outputs and human judgments Zouhar et al. (2024), but with the possibility of extending the chosen approach to the two fields that are the focus of the MaTOS project, natural language processing and (NLP) and earth and planetary sciences (EPS). The internship will take place at Inria, Paris and will be supervised by Rachel Bawden (Inria) and François Yvon (CNRS), with potential interaction with other project members.

References

Sadaf Abdul Rauf and François Yvon. 2020. Document level contexts for neural machine translation. Research Report 2020-003, LIMSI-CNRS.

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the first International Conference on Learning Representations, San Diego, CA.

Chris Callison-Burch, Miles Osborne, and Philipp Koehn. 2006. Re-evaluating the Role of Bleu in Machine Translation Research. In 11th Conference of the European Chapter of the Association for Computational Linguistics, Trento, Italy.

Nicolas Dahan, Rachel Bawden, and François Yvon. 2024. Survey of Automatic Metrics for Evaluating Machine Translation at the Document Level. Technical report, Inria Paris, Sorbonne Université; Sorbonne Universite ; Inria Paris.

Markus Freitag, Nitika Mathur, Daniel Deutsch, Chi-Kiu Lo, Eleftherios Avramidis, Ricardo Rei, Brian Thompson, Frederic Blain, Tom Kocmi, Jiayi Wang, David Ifeoluwa Adelani, Marianna Buchicchio, Chrysoula Zerva, and Alon Lavie. 2024. Are LLMs breaking MT metrics? Results of the WMT24 metrics shared task. In Proceedings of the Ninth Conference on Machine Translation, pages 47–81, Stroudsburg, PA, USA. Association for Computational Linguistics.

Amr Hendy, Mohamed Abdelrehim, Amr Sharaf, Vikas Raunak, Mohamed Gabr, Hitokazu Matsushita, Young Jin Kim, Mohamed Afify, and Hany Hassan Awadalla. 2023. How Good Are GPT Models at Machine Translation? A Comprehensive Evaluation.

Mariana Neves, Cristian Grozea, Philippe Thomas, Roland Roller, Rachel Bawden, Aurélie Névéol, Steffen Castle, Vanessa Bonato, Giorgio Maria Di Nunzio, Federica Vezzani, Maika Vicente Navarro, Lana Yeganova, and Antonio Jimeno Yepes. 2024. Findings of the WMT 2024 biomedical translation shared task: Test sets on abstract level. In Proceedings of the Ninth Conference on Machine Translation, pages 124–138, Miami, Florida, USA. Association for Computational Linguistics.

Arturo Oncevay, Charese Smiley, and Xiaomo Liu. 2025. The impact of domain-specific terminology on machine translation for finance in European languages. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 2758–2775, Stroudsburg, PA, USA. Association for Computational Linguistics.

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 311–318, Philadelphia, Pennsylvania, USA. Association for Computational Linguistics.

Ziqian Peng, Rachel Bawden, and François Yvon. 2024. Handling Very Long Contexts in Neural Machine Translation: a Survey. Technical Report Livrable D3-2.1, Projet ANR MaTOS.

Maja Popović. 2015. chrF: character n-gram F-score for automatic MT evaluation. In Proceedings of the Tenth Workshop on Statistical Machine Translation, pages 392–395, Lisbon, Portugal. Association for Computational Linguistics.

Ricardo Rei, Craig Stewart, Ana C Farinha, and Alon Lavie. 2020. COMET: A Neural Framework for MT Evaluation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2685–2702, Online.

Miguel Rios. 2025. Instruction-tuned Large Language Models for Machine Translation in the Medical Domain. In Proceedings of Machine Translation Summit XX: Volume 1, pages 162–172.

Kirill Semenov, Xu Huang, Vil´em Zouhar, Nathaniel Berger, Dawei Zhu, Arturo Oncevay, and Pinzhen Chen. 2025. Findings of the WMT25 terminology translation task: Terminology is useful especially for good MTs. In Proceedings of the Tenth Conference on Machine Translation, pages 554–576, Stroudsburg, PA, USA. Association for Computational Linguistics.

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, pages 6000–6010, Long Beach, CA, USA.

David Vilar, Markus Freitag, Colin Cherry, Jiaming Luo, Viresh Ratnakar, and George Foster. 2023. Prompting PaLM for translation: Assessing strategies and performance. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Stroudsburg, PA, USA. Association for Computational Linguistics.

Vilém Zouhar, Shuoyang Ding, Anna Currey, Tatyana Badeka, Jenyuan Wang, and Brian Thompson. 2024. Fine-tuned machine translation metrics struggle in unseen domains. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 488–500, Stroudsburg, PA, USA. Association for Computational Linguistics.

Principales activités

 The main activities of the internship will include :

  • keeping up-to-date with related work on the topic
  • weekly meetings to discuss reading and progress on the topic
  • carrying out research on the topic outlined above, both in the development of new ideas, positioning with respect to related work and validation of the methodology via experiments and analysis
  • the presentation of work both internally to colleagues and (depending on the progress carried out and the results obtained) externally in the form of a conference/journal/workshop paper
  • interacting and exchanging with colleagues

Compétences

Candidates should be currently carrying out a Master 2 or equivalent (e.g. engineering school) in computer science (speciality artificial intelligence, machine learning or natural language processing).

They should have a good level in programming (python), experience with neural networks and an interest in natural language processing. A good written and spoken level of English is required, and knowledge of French is preferred.

Avantages

  • Subsidized meals
  • Partial reimbursement of public transport costs
  • Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.)
  • Possibility of teleworking and flexible organization of working hours
  • Professional equipment available (videoconferencing, loan of computer equipment, etc.)
  • Social, cultural and sports events and activities
  • Access to vocational training
  • Social security coverage
Apply on company site

How to Get Hired at INRIA

  • Inria is a French public research institute (EPST) under joint supervision of the Ministry of Research and the Ministry of the Economy, employing around 2,800 staff across nine research centres in France plus Inria Chile, with headquarters at Le Chesnay-Rocquencourt near Versailles and Bruno Sportisse as Chairman and CEO since 2018.
  • All open positions are published on the custom Inria recruitment portal at jobs.inria.fr, with English and French interfaces, structured filters, and unique offer reference numbers in the format YYYY-NNNNN that you must quote in every document and email.
Read the full guide

How well do you match this role?

Check My Resume