My research lies at the intersection of Natural Language Processing (NLP), Artificial Intelligence (AI),
and text data engineering. Its core is the full pipeline of corpus construction and exploitation:
data collection, filtering, annotation, gold standard construction, model training, and evaluation.
I build corpora and datasets from scratch — multilingual, thematically specialized, web-harvested,
or domain-specific. A central conviction shapes this work: even when using large language models,
the quality of outputs depends on the data we feed them. Being the author of one's datasets,
not merely a consumer of existing benchmarks, is both a methodological choice and a scientific stance.
I automate each step of the pipeline as much as possible, developing custom scripts to make annotation
robust and reproducible — always using and promoting free and open-source software.
I am a member of APRIL, the French association
for the promotion of free software (logiciels libres).
I have applied this approach to a wide range of tasks: opinion mining, sentiment analysis,
explicit and implicit aspect detection, keyword-driven lexicon construction, and press corpus analysis
(e.g., women's professions in French cinema).
Beyond the pipeline itself, I actively question annotation methodologies: What makes a valid gold standard?
When does human expertise become irreplaceable — or not fully replaceable? How do we choose and justify
evaluation metrics? These questions are at the heart of my current CNRS delegation at
IRIT, where colleagues in the
ADRIA team
help me explore formal frameworks — belief functions, non-monotonic reasoning — to ground and validate
these methodological choices.
I am committed to interdisciplinary research, convinced that NLP can serve as a methodological partner
for the humanities, social sciences, and the arts. Projects such as
Litte_Bot, MoliAIre, and BrAIcht explore the creative potential of
language models in literary and theatrical contexts.
In 2023, I co-founded the interdisciplinary research group
GLAÇON
with my colleagues
R. Kyriakoglou
and A. Millour.
Finally, I care deeply about teaching programming and AI literacy. Since late 2023, I have developed
courses and workshops to introduce students — in computer science and in the humanities and arts —
to generative AI tools: how they work, how to use them critically and ethically,
and why they should be seen as collaborators, not replacements. Because teaching CS is not only
about technical skills — it is also about forming responsible, critical thinkers.
I have given public seminars during the Fête des Sciences and participate in a national
working group on the use of LLMs in higher education in France, anticipating the impact on teaching
practices and adapting CS curricula for the generative AI era.