My work in the HORAE project mainly concerns segmentation and textual alignment including plagiarism.
My previous work, which i still continue, is about textual similarity ranging from words and phrases to sentences, paragraphs and documents similarity.
This includs distributional methods as well as words and sentence embeddings.
I am also interested in machine translation, multilingual terminology extraction and synonyms extraction.
Also, i work in my spare time on sentiment analysis.
The HORAE project aims at studying the religious practices of the middle age period through books of hours.
Date: 10/2017 - 04/2020
PASTEL is a research project that aims to explore the potential of real time and automatic transcriptions for the instrumentation of mixed educational situations
where the modalities of the interactions can be face-to-face or online, synchronous or asynchronous.
The ODISAE project aims at developing a semantic analyser of written online conversations across several modalities
(i.e. chat, forum, email) in the context of CRM (Customer Relation Management).
These capabilities are : multi-modal text information retrieval (e.g. finding the solution to a problem in a modality different from the one in which the request was formulated),
automated FAQ and documentation management (e.g. automatic detection of the absence of a suitable solution to a recurring request),
automated assistance generation (e.g. helping users to formulate problems, evaluating answers’ exhaustivity), or conversation supervision (e.g. detecting attrition, irritation).
Period: 2014 - 2016
The MateCat project aims at pushing what can be considered the new frontier of CAT technology:
how to effectively integrate statistical MT within the translation workflow. Pursuing this objective is definitely relevant to improve the European competitive position
in the multilingual digital market as well as its scientific and technological leadership in this area. Europe's flagship of statistical MT is currently
represented by Moses, an open source toolkit now widely adopted by research labs and SMEs around the world. MateCat will build on this asset by joining the forces of three research labs,
including the developers of Moses, and a Web-based language service provider, owner of MyMemory, the largest TM in the world. MateCat will pursue its ambitious goals by: (i) establishing new operating conditions for MT in the CAT scenario;
(ii) making MT aware of its use, self-tuning to the task, learning from the user feedback, and more informative;
(iii) developing and field testing a new CAT tool integrating novel MT functionalities. To foster rapid exploitation, MateCat will release its main outcomes in open source and set-up a User Group, including end-users, service providers, and technology developers.
METRICC is a research project that aims to exploit comparable corpora and extract useful informations that serve translation memories, information retrieval and multilinugual terminologies.
Date: 10/2016 - 04/2020