Bayes in Bayonne: Methods Workshop for Linguistics and Cognitive Sciences (17th June, 2025)

Registration (Free!)

Please register by filling out the following registration form.

Aim

Bayes in Bayonne: Methods Workshop for Linguistics and Cognitive Sciences is the first edition of what we hope will become a regular meeting point for researchers interested in the use of Bayesian inference within linguistics and cognitive sciences. This workshop was conceived to address the growing importance of Bayesian methods in these fields, offering researchers and students a space to learn, exchange ideas, and discuss cutting-edge applications.

For this inaugural event, we are thrilled to host a lineup of internationally recognized speakers whose work represents some of the most innovative and rigorous approaches to Bayesian data analysis in the cognitive and language sciences. Through their talks and tutorials, participants will gain valuable insights into both theoretical and practical aspects of Bayesian modeling, from handling complex data structures to exploring model limitations and advances.

Our aim is to foster a collaborative, welcoming environment where methodological discussions are at the center, laying the foundation for a recurring event that strengthens and connects the community working at the intersection of Bayesian statistics, linguistics, and cognitive sciences.

Invited Keynote Speakers

Dr. Bruno Nicenboim (Tilburg University)

Bruno Nicenboim is an assistant professor in the Department of Cognitive Science and AI at Tilburg University and the principal investigator of the Computational Psycholinguistics Lab. He is currently co-authoring the book An Introduction to Bayesian Data Analysis for Cognitive Science with Daniel Schad and Shravan Vasishth, to be published by CRC Press. At the workshop, he will present work-in-progress on the limitations of model comparison using Bayes factors and cross-validation for computational models.

All models are wrong, and so is model comparison

It is often assumed that the model closest to the data-generating process will yield the best predictive performance. This assumption is usually tested in simulations where one of the candidate models matches the true generative process. In real applications, however, we are never that close to the truth. All models are wrong or, at best, approximate some important aspects of the true process. In this talk, I examine what happens when we compare models that are all misspecified, as is always the case in practice. Using a case study based on reaction time and choice data, I compare cognitive and theory-agnostic models using Bayes Factors and cross-validation. The results show that predictive performance does not necessarily favor the model that most closely resembles the true process. Flexible, theory-agnostic models often perform better, not because they reflect the underlying mechanism more accurately, but because they adapt more easily to the data. Moreover, I show that Bayes Factor comparisons are highly sensitive to prior specifications when models make qualitatively different assumptions, while cross-validation tends to remain agnostic unless clear predictive gains are present. These findings call for a more nuanced interpretation of model comparison metrics in cognitive modeling, especially since all candidate models are, inevitably, wrong.

Dr. Elizabeth Pankratz (University of Edinburgh)

Elizabeth Pankratz will join the Department of Psychology at the University of Edinburgh as a lecturer in statistics this fall. At the workshop, she will present her recent work on how language users generalize linguistic rules to novel items. Using Bayesian modeling, she investigates the broader, latent population of words from which each linguistic rule may be drawn.

A Bayesian mechanism for rule generalisation in language

Part of language’s great expressivity comes from its users applying familiar rules to novel items. But rules in language aren’t all created equal. Some rules can be more readily generalised to new items than others. The project I’ll discuss focuses on how rule generalisation is affected by certain properties of frequency distributions. I’ll briefly report an artificial language learning experiment which shows that people prefer to generalise the rule they’ve seen with more low-frequency items. I’ll then discuss how I approached modelling one possible mechanism behind this result. The basic idea is this: if you guess based on your sample that a rule is likely to apply to more items than just the ones you’ve seen, then you might reason that you yourself can also use the rule in a way that goes beyond your previous input—in other words, you can generalise that rule. So the task is to model the latent population of items that a rule applies to, items both seen and unseen. I’ll take you through my methodological journey of discovery: from an adequate but slightly arbitrary model based on maximum likelihood estimation to a more principled model based on Bayesian inference. And overall, based on the results from experiment and model, I’ll suggest that linguistic rule generalisation is a self-sustaining process: by creating novel and therefore low-frequency items, rule generalisation produces the very same distributional structure that feeds it.

Dr. Santiago Barreda (UC Davis)

Santiago Barreda is an associate professor in the Department of Linguistics at UC Davis and co-director of the UC Davis Phonetics Lab. He is the co-author, with Noah Silbert, of the book Bayesian Multilevel Models for Repeated Measures Data: A Conceptual and Practical Introduction in R.

Applications of Multinomial Models in Speech Perception Research

This presentation details the application of multinomial models to address diverse research questions in speech perception. Multinomial models provide a robust and statistically principled framework for analyzing categorical listener judgments involving more than two response categories. An advantage of this approach lies in its minimal reliance on distributional assumptions about the data, in contrast to other approaches (e.g. Linear Discriminant Analysis). Consequently, multinomial models are particularly well-suited for estimating perceptual spaces from responses to stimuli arranged in systematic grid patterns that do not reflect the natural distribution of tokens within speech categories. The presentation will demonstrate how these models can be employed to investigate diverse phenomena, including phonemic classification, perceptual normalization, and the perception of gradient subphonemic variation, where acoustic variation is sometimes reflected in smooth changes in classification probabilities, and sometimes relates to no perceptual difference at all.

Provisional Programme

08:45 – 09:00: Reception and Opening
09:00 – 10:00: All models are wrong, and so is model comparison by Bruno Nicenboim
10:00 – 10:15: Coffee Break
10:15 – 11:15: A Bayesian mechanism for rule generalisation in language by Elizabeth Pankratz
11:15 – 11:30: Coffee Break
11:30 – 12:30: Applications of Multinomial Models in Speech Perception Research by Santiago Barreda
12:30 – 14:30: Lunch Break
14:30 – 16:30: Hands-on Tutorial on Mixture Models by Bruno Nicenboim

What are mixture models?

Often, the data we collect is generated by multiple underlying processes relevant to our research questions. For example, reaction times in a decision task may result from the decision-making process itself (process 1) and attentional lapses (process 2), with attention loss leading to longer reaction times.

Mixture models allow us to combine these different processes into a single model, helping us understand and infer properties of each latent process. This makes it possible to answer questions about both attention and decision-making mechanisms.

See Chapter 17 of Nicenboim, Bruno; Schad, Daniel J.; & Vasishth, Shravan, Introduction to Bayesian Data Analysis for Cognitive Science, available at https://bruno.nicenboim.me/bayescogsci/.

16:30 – 16:45: Coffee Break
16:45 – 18:00: Poster Session / Flash Presentations

Date and Location

Date: June 17, 2025
Location: The workshop will be held on the Bayonne Campus of the University of Pau and the Adour Region.

Code of Conduct

We follow the R Conference Code of Conduct.

The organizers of the Workshop on Bayesian Statistical Methods for Linguistics and Cognitive Sciences are committed to providing a harassment-free conference experience for everyone, regardless of age, gender, sexual orientation, disability, physical appearance, race, or religion (or lack thereof).

This code of conduct applies to all conference activities, including talks, panels, workshops, and social events. Organizers will actively enforce this code and expect cooperation from all participants to help ensure a safe and respectful environment.

Funding

This workshop is made possible thanks to the support of several organizations:

CNRS & IKER UMR5478
UPPA
MIND2WALL Project (ANR)