Research interests

  1. My work in the HORAE project mainly concerns segmentation and textual alignment including plagiarism.
    My previous work, which i still continue, is about textual similarity ranging from words and phrases to sentences, paragraphs and documents similarity.
    This includs distributional methods as well as words and sentence embeddings.
    I am also interested in machine translation, multilingual terminology extraction and synonyms extraction.
    Also, i work in my spare time on sentiment analysis.

Research projects

HORAE (Hours – Recognition – Analysis – Editions)

  1. The HORAE project aims at studying the religious practices of the middle age period through books of hours.
    Date: 10/2017 - 04/2020

  2. Role: Am involved in the HORAE project as a post-doctoral researcher.
    My work consists in studying books of hours in order to:
    (i) develop segmentation approaches that deal with books of hours characteristics and particularities;
    (ii) develop alignment and plagiarism approaches for extracting similarities between books of hours and the Bible.

PASTEL (Performing – Automated – Speech – Transcription – for – Enhancing – Learning)

  1. PASTEL is a research project that aims to explore the potential of real time and automatic transcriptions for the instrumentation of mixed educational situations
    where the modalities of the interactions can be face-to-face or online, synchronous or asynchronous.

    Period: 10/2016 - 04/2020
  2. Role: I was involved in the PASTEL project as a post-doctoral researcher.
    My work consisted in studying E-learning plateforms to enrich them with external educational resources with IR and alignment approaches through different modalities.
    (i) Documents indexation;
    (ii) Textual alignment;
    (iii) Question/Answering approaches.

ODISAE (Optimizing – Digital – Interaction – with a – Social – and – Automated – Environment)

  1. The ODISAE project aims at developing a semantic analyser of written online conversations across several modalities
    (i.e. chat, forum, email) in the context of CRM (Customer Relation Management).
    These capabilities are : multi-modal text information retrieval (e.g. finding the solution to a problem in a modality different from the one in which the request was formulated),
    automated FAQ and documentation management (e.g. automatic detection of the absence of a suitable solution to a recurring request),
    automated assistance generation (e.g. helping users to formulate problems, evaluating answers’ exhaustivity), or conversation supervision (e.g. detecting attrition, irritation).
    Period: 2014 - 2016

  2. Role: I was involved in the ODISAE project as a post-doctoral researcher.
    My work consisted in developping alignment approaches for textual entries.
    useful link

MateCat (Machine – Translation – Enhanced – Computer – Assisted – Translation)

  1. The MateCat project aims at pushing what can be considered the new frontier of CAT technology:
    how to effectively integrate statistical MT within the translation workflow. Pursuing this objective is definitely relevant to improve the European competitive position
    in the multilingual digital market as well as its scientific and technological leadership in this area. Europe's flagship of statistical MT is currently
    represented by Moses, an open source toolkit now widely adopted by research labs and SMEs around the world. MateCat will build on this asset by joining the forces of three research labs,
    including the developers of Moses, and a Web-based language service provider, owner of MyMemory, the largest TM in the world. MateCat will pursue its ambitious goals by: (i) establishing new operating conditions for MT in the CAT scenario;
    (ii) making MT aware of its use, self-tuning to the task, learning from the user feedback, and more informative;
    (iii) developing and field testing a new CAT tool integrating novel MT functionalities. To foster rapid exploitation, MateCat will release its main outcomes in open source and set-up a User Group, including end-users, service providers, and technology developers.

    Period: 11/2011 - 10/2014
    Report: the final report can be found here
  2. Role: I was involved in the MateCat project as a post-doctoral researcher.
    My work consisted in using domain adaptation techniques to improve language models in machine translation systems.

METRICC (MEmoire de Traduction, Recherche d'Information et Corpus Comparables)

  1. METRICC is a research project that aims to exploit comparable corpora and extract useful informations that serve translation memories, information retrieval and multilinugual terminologies.
    Date: 10/2016 - 04/2020

  2. Role: I was involved in the METRICC project as a phd-student.
    My main focus was on using comparable corpora to extract bilingual lexicons.