Spent the afternoon reading “Martingale Posterior Neural Networks for Fast Sequential Decision Making” (Duran-Martin et al., 2025) which focuses on algorithms for online learning of neural net parameters used for Bayesian decision making. My understanding in a few sentences: they use a low-rank Kalman filter to obtain a distribution over neural net parameters and model predictions. Then they use the predictive distribution for sequential decision making by sampling from it (instead of the parameter posterior, as in classical Thompson sampling). They call this “predictive sampling” and they use the Martingale posterior framework as a way of bridging the gap between their frequentist parameter update and the Bayesian-like goal of decision making under uncertainty. Finally, they show their method on a variety of online learning examples such as online MNIST, a recommender algorithm, and Bayesian optimization.

Here’s my summary of some of the most important ideas:

Problem statement

Consider an environment that returns a reward (or observation) based on an input according to a distribution

Given a dataset of datapoints , we can approximate with a model

where is a zero-mean random vector with covariance , is a neural network and are unknown time-varying model parameters.

The objective of the paper is to obtain a predictive distribution of the predictive distribution (where the subscript denotes this is the distribution after observing ), and to then use this distribution for sequential decision making tasks based on the model uncertainty.

Online learning with a Kalman filter

The authors use an extended Kalman filter (EKF) to compute the statistics for a Gaussian distribution over the parameters . This belief is denoted as , where and are the mean and covariance of the parameters given , respectively. They also partition the neural network parameters into last-layer weights and hidden layer weights, which allows for different uncertainty treatments for different parts of the network.

A vanilla EKF does not scale to large networks because the memory cost scales quadratically (due to the square covariance matrix) and cubically with parameter dimension (due to the filter’s matrix inversions). Therefore, they propose a variety of low-rank approximations for the covariance matrix which make their approach feasible for modern neural networks.

For example, their algorithm HiLoFi uses a low-rank approximation for the covariance of the hidden layers and a full-rank last layer covariance. Another example is LRKF which uses a low-rank approximation for the covariance of all the parameters.

I won’t go into the gory details of the math, but the important parts are that

The Kalman filter approach provides statistics to define a parameter distribution, which is then propagated into a predictive distribution by linearizing the neural network and matching moments.
Their low-rank approximations allow them to use large networks while maintaining tractability.

Decision making and martingales.

The authors then introduce a novel approach for sequential decision making that avoids sampling from the high-dimensional parameter posterior (as classical Thompson sampling would do), and instead samples from the posterior predictive distribution and then chooses the action with the highest reward.

This is justified with a “posterior-first” perspective which, as I understand it (and I don’t understand it very well, yet), uses martingales to justify how working only with the posterior predictive (which is much lower-dimension than the parameter distribution) is sufficient for decision making. This is the part of the paper I’m least sure about so I apologize for being vague here.

Final thoughts

Overall, very cool paper. I’m very interested in online learning and I’ve implemented with EKFs and neural networks so this is very relevant. I’m still unsure how the martingales justify using the posterior predictive for decision making. The authors also cite work on mis-specified BNN priors and likelihoods, which I want to look into. I also wonder if this framework could be extended to multi-step predictions for model-based control.

This is blogpost 16/100

Duran-Martin, G., Sánchez-Betancourt, L., Cartea, A., & Murphy, K. P. (2025). Martingale Posterior Neural Networks for Fast Sequential Decision Making. The Thirty-Ninth Annual Conference on Neural Information Processing Systems. https://openreview.net/forum?id=fqfYzp4GKi

Fernando Palafox

Table of Contents

On Bayesian Neural Networks and Martingale Posteriors

Problem statement

Online learning with a Kalman filter

Decision making and martingales.

Final thoughts

Fernando Palafox

Table of Contents

On Bayesian Neural Networks and Martingale Posteriors

Problem statement §

Online learning with a Kalman filter §

Decision making and martingales. §

Final thoughts §

Problem statement

Online learning with a Kalman filter

Decision making and martingales.

Final thoughts