Contents lists available at ScienceDirect

Comput. Methods Appl. Mech. Engrg.

journal homepage: www.elsevier.com/locate/cma

A physics-augmented GraphGPS framework for the reconstruction 

of 3D Riemann problems from sparse data 

Rami Cassia ∗, Rich Kerswell
Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Wilberforce Rd, Cambridge, CB3 0WA, UK

a r t i c l e  i n f o

Keywords:
Machine learning
Attention
Graphs
Fluid dynamics
Shocks
Riemann problems

 a b s t r a c t

In compressible fluid flow, reconstructing shocks, discontinuities, rarefactions, and their inter-
actions from sparse measurements is an important inverse problem with practical applications. 
Moreover, physics-informed machine learning has recently become an increasingly popular ap-
proach for performing reconstructions tasks. In this work we explore a machine learning recipe, 
known as GraphGPS, for reconstructing canonical compressible flows known as 3D Riemann prob-
lems from sparse observations, in a physics-informed manner. The GraphGPS framework com-
bines the benefits of positional encodings, local message-passing of graphs, and global contextual 
awareness, and we explore the latter two components through an ablation study. Furthermore, we 
modify the aggregation step of message-passing such that it is aware of shocks and discontinuities, 
resulting in sharper reconstructions of these features. Additionally, we modify message-passing 
such that information flows strictly from known nodes only, which results in computational sav-
ings, better training convergence, and no degradation of reconstruction accuracy. We also show 
that the GraphGPS framework outperforms numerous machine learning and classical benchmarks.

1.  Introduction

A common inverse problem is the reconstruction of a set of variables from sparse data. Applications of this inverse problem span 
a range of fields such as imaging, audio and speech processing, medicine, structural engineering, seismology, finance, astronomy, 
and fluid dynamics. Particular applications in fluid dynamics include climate modeling, oceanography, and aerospace engineering. In 
aerospace engineering, for example, it is common practice to mount pressure sensors on a wing during wind-tunnel testing in order 
to reconstruct the surrounding flow-field.

In this paper we are interested in the sparse reconstruction of the 3D Riemann problems, which are a set of problems that 
investigate the interaction of elementary waves (shocks, rarefactions, contact discontinuities) as governed by the compressible Euler 
equations. We are specifically interested in performing the reconstruction using a physics-informed machine learning (ML) model 
that is a) graph-based and b) contextually-aware. A model that is graph-based can leverage data directionality, which we hypothesize 
is beneficial to reconstruction where information should flow from known to unknown points. A model that is contextually-aware is 
capable of capturing long-range dependencies in the data, which is crucial in flows where there are long-range correlations due to the 
presence of shocks, contact discontinuities and rarefactions. As such we briefly review graph-based and contextually-aware models.

Graph-based models are those that operate on graphs comprising of vertices (or nodes) and edges. Graph models typically 
involve message-passing, which refers to the nodal aggregation of messages or information from neighbours. A key property of

∗ Corresponding author.
 E-mail addresses: rgc38@cam.ac.uk (R. Cassia), rrk26@cam.ac.uk (R. Kerswell).

https://doi.org/10.1016/j.cma.2025.118328
Received 27 May 2025; Received in revised form 11 August 2025; Accepted 15 August 2025

Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 

Available online 2 September 2025 
0045-7825/© 2025 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license 
( http://creativecommons.org/licenses/by/4.0/ ). 

https://www.elsevier.com/locate/cma
https://www.elsevier.com/locate/cma
https://orcid.org/0009-0002-4643-2717

$\mathcal {F}_{\Theta }: \left ( \mathbf {X} \in \mathbb {R}^{N_{xyz} \times N_{tc}}, \mathbf {m} \in \left \{0,1\right \}^{N_{xyz}} \right ) \longrightarrow \mathbf {\hat {X}} \in \mathbb {R}^{N_{xyz} \times N_{tc}}$


$\epsilon \left (\mathbf {\hat {X}}, \mathbf {\bar {X}}\right )$


$\mathbf {X}$


$\mathbf {\hat {X}}$


$\mathbf {\bar {X}}$


$\mathbf {m}$


$\mathcal {F}_{\Theta }$


$\Theta $


$N_{xyz} = N_{x}N_{y}N_{z}$


$N_{tc} = N_{t}N_{c}$


$N_t$


$N_c$


$\epsilon $


$\mathbf {\hat {X}}$


$\mathbf {\bar {X}}$


$N_c = 5$


\begin {align}&\partial _{t}\mathbf {U} + \partial _{x}\mathbf {F}\left (\mathbf {U}\right ) + \partial _{y}\mathbf {G}\left (\mathbf {U}\right ) + \partial _{z}\mathbf {H}\left (\mathbf {U}\right ) = \mathbf {0},\\[6pt] &{\partial _t} \begin {pmatrix} \rho \\ \rho u \\ \rho v \\ \rho w \\ E \\ \end {pmatrix} + {\partial _x} \begin {pmatrix} \rho u \\ \rho u^{2} + p \\ \rho u v \\ \rho uw \\ u\left (E + p\right ) \\ \end {pmatrix} + {\partial _y} \begin {pmatrix} \rho v \\ \rho u v \\ \rho v^{2} + p \\ \rho vw \\ v\left (E + p\right ) \\ \end {pmatrix} + {\partial _z} \begin {pmatrix} \rho w \\ \rho uw \\ \rho vw \\ \rho w^{2} + p \\ w\left (E + p\right ) \\ \end {pmatrix} = \mathbf {0}, \\[6pt] &E = \frac {1}{2}\rho \left (u^2 + v^2 + w^2\right ) + \frac {p}{\left (\gamma - 1\right )},\end {align}


$\rho $


$u$


$v$


$w$


$p$


$E$


$\gamma $


$x$


$y$


$z$


$\partial _{\mathbf {n}} \mathbf {U} = \mathbf {0}$


$\mathbf {X}$


\begin {equation}\mathbf {X} = \left ( \rho , u, v,w, E - \frac {1}{2}\rho (u^{2} + v^{2} + w^{2}) \right )^T, \label {Xeqn1-4}\end {equation}


$\mathcal {T}$


$\mathcal {F}_{\Theta }$


\begin {equation}\mathcal {T}: \left ( \rho , u, v,w, E - \frac {1}{2}\rho \left (u^{2} + v^{2} + w^{2}\right ) \right )^T \longrightarrow \left ( \left |{\rho }\right |, u, v,w, \left |E - \frac {1}{2}\rho \left (u^{2} + v^{2} + w^{2}\right )\right |\right )^T. \label {Xeqn2-5}\end {equation}


$\mathbf {\bar {X}}$


$\mathbf {m}$


$m_i$


$\mathbf {m}$


\begin {equation}m_{i} = \begin {cases} 0 & \text {if} \,\,\,\,\, a_{i} < 0.9, \\ 1 & \text {if} \,\,\,\,\, a_{i} \geq 0.9. \end {cases}, \,\,\,\,\,\,\,\,\,\, i \in [1, N_{xyz}], \,\,\,\,\,\,\,\,\,\, a_{i} \sim \mathcal {U}\left (0,1\right ). \label {Xeqn3-6}\end {equation}


$\mathbf {X} = \mathbf {m}\odot \mathbf {\bar {X}}$


$\odot $


\begin {align}\epsilon = \frac {\norm { \mathbf {\hat {X}} - \mathbf {\bar {X}} }_{2}}{\norm {\mathbf {\bar {X}}}_{2}}.\end {align}


$\left [0,1\right ]^{3}$


$\left (\rho , u, v,w, p\right )^{T}$


$x>0.5, y>0.5, z>0.5$


$x>0.5, y>0.5, z<0.5$


$R$


$S$


$J$


$lr = \{21, 32, 34, 41, 51, 62, 73, 84, 65, 76, 78,85\}$


$\omega $


$\bm {\omega ^{'}}$


${R}_{lr}^{\pm }$


\begin {align}& \omega _{l} - \omega _{r} = \pm \frac {2\sqrt {\gamma }}{\gamma - 1}\left ( \sqrt {\frac {p_l}{\rho _l}} - \sqrt {\frac {p_r}{\rho _r}} \right ),\,\,\,\,\,\,\,\,\frac {\rho _l}{\rho _r} = \left (\frac {p_l}{p_r}\right )^{\frac {1}{\gamma }},\\[6pt] & \bm {\omega ^{'}}_{l} = \bm {\omega ^{'}}_{r}, \,\,\,\,\,\,\,\, \omega _l < \omega _r, \,\,\,\,\,\,\,\,sgn\left ( p_{l} - p_{r}\right ) = \mp 1.\end {align}


${S}_{lr}^{\pm }$


\begin {align}&\omega _{l} - \omega _{r} = \sqrt {\frac {\left (p_{l} - p_{r}\right )\left (\rho _{l} - \rho _{r}\right )}{\rho _l \rho _r}},\,\,\,\,\,\,\,\, \frac {\rho _{l}}{\rho _{r}} = \frac {\gamma \left (p_{l} + p_{r}\right ) + \left (p_{l} - p_{r}\right )}{\gamma \left (p_{l} + p_{r}\right ) - \left (p_{l} - p_{r}\right )},\\[6pt] & \bm {\omega ^{'}}_{l} = \bm {\omega ^{'}}_{r},\,\,\,\,\,\,\,\, \omega _l > \omega _r,\,\,\,\,\,\,\,\, sgn\left ( p_{l} - p_{r}\right ) = \pm 1.\end {align}


${J}_{lr}$


$\omega _{l} = \omega _{r}$


$p_{l} = p_{r}$


$\rho $


$\bm {\omega ^{'}}$


$\pm \pm $


$\mathcal {F}_{\Theta }$


$\mathcal {F}_{\mathcal {MP}}^{\mathbf {E}^{(n)}}$


$\mathbf {E}^{(n)}$


$n$


$\mathcal {F}_{\mathcal {GC}}$


$pe$


$\mathbf {X}$


$N$


$\mathcal {F}_{\mathcal {MP}}$


$\mathcal {F}_{\mathcal {GC}}$


$\mathcal {F}_{\mathcal {MP}}$


$\mathbf {E}^{(n)}$


$n$


$\mathcal {F}_{\mathcal {GC}}$


$\mathcal {F}_{\mathcal {MP}}$


$\mathcal {F}_{\mathcal {GC}}$


$N$


$\cup _{n=1}^{N}\mathbf {E}^{(n)}$


$\mathbf {m}$


$\mathbf {m} = \mathbf {m}^{(0)} \in \left \{0,1\right \}^{N_{xyz}}$


$d_{1}\left (i,j\right )$


$l_1$


$i$


$j$


$\cup _{n=1}^{N}\mathbf {E}^{(n)}$


$\mathbf {m}^{(0)}$


\begin {align}&\mathbf {E}^{(n)} = \left \{ \left (i,j \right ) \,\, \mid \,\, d_{1}\left (i,j\right ) \in \mathbb {Z^{+}} \cap \left [1,\lambda \right ], \,\,\, m^{(n-1)}_{i} = 1, \,\,\, \forall i \right \}, \,\,\, \lambda \in \left \{1,2,3\right \}, \\[6pt] &{m}^{(n)}_{j} = {m}^{(n-1)}_{j} \lor \left ( \bigvee _{(i,j) \in \mathbf {E}^{(n)}} {m}_{i}^{(n-1)} \right ),\end {align}


$\lor $


$\bigvee $


$\lambda $


$\lambda =1$


$\lambda =2$


$\lambda =3$


$\lambda =2$


$\mathbf {X}$


$\mathbf {X}$


$\mathbf {X}$


$\mathcal {F}_{\mathcal {MP}}$


$i$


\begin {equation}\mathbf {h}_i^{\text {out}} = \sigma \left ( \sum _{j \in \mathcal {N}(i) \cup \{i\}} \mathbf {W} \mathbf {h}_j \right ), \label {Xeqn4-14}\end {equation}


$\mathbf {h}$


$\mathbf {W}$


$\sigma $


$\mathcal {N}\left (i\right )$


$i$


$\mathbf {E}$


$\alpha _{ij}$


\begin {equation}\mathbf {h}_i^{\text {out}} = \sigma \left ( \sum _{j \in \mathcal {N}(i) \cup \{i\}} \alpha _{ij} \mathbf {W} \mathbf {h}_j \right ), \label {Xeqn5-15}\end {equation}


$\alpha _{ij}$


\begin {equation}\alpha _{ij} = \frac {\exp \left ( \text {LeakyReLU} \left ( \mathbf {a}^T_{\text {src}} \mathbf {W}\mathbf {h}_i + \mathbf {a}^T_{\text {trg}}\mathbf {W}\mathbf {h}_j \right ) \right )} {\sum _{k \in \mathcal {N}(i) \cup \{i\}} \exp \left ( \text {LeakyReLU} \left ( \mathbf {a}^T_{\text {src}} \mathbf {W}\mathbf {h}_i + \mathbf {a}^T_{\text {trg}}\mathbf {W}\mathbf {h}_k \right ) \right )}, \label {Xeqn6-16}\end {equation}


$\mathbf {a}_{\text {src}}^{T}$


$\mathbf {a}_{\text {trg}}^{T}$


$i$


$j$


$\mathbf {\tilde {W}}$


$\mathbf {b}_{\text {src}}^{T}$


$\mathbf {b}_{\text {trg}}^{T}$


$\mathbf {Z}$


\begin {equation}\alpha _{ij} = \frac {\exp \left ( \text {LeakyReLU} \left ( \mathbf {a}^T_{\text {src}} \mathbf {W}\mathbf {h}_i + \mathbf {a}^T_{\text {trg}}\mathbf {W}\mathbf {h}_j + \mathbf {b}^T_{\text {src}} \mathbf {\tilde {W}}\mathbf {Z}_i + \mathbf {b}^T_{\text {trg}}\mathbf {\tilde {W}}\mathbf {Z}_j\right ) \right )} {\sum _{k \in \mathcal {N}(i) \cup \{i\}} \exp \left ( \text {LeakyReLU} \left ( \mathbf {a}^T_{\text {src}} \mathbf {W}\mathbf {h}_i + \mathbf {a}^T_{\text {trg}}\mathbf {W}\mathbf {h}_k + \mathbf {b}^T_{\text {src}} \mathbf {\tilde {W}}\mathbf {Z}_i + \mathbf {b}^T_{\text {trg}}\mathbf {\tilde {W}}\mathbf {Z}_k \right ) \right )}, \label {Xeqn7-17}\end {equation}


$\mathbf {Z}\in \mathbb {R}^{N_{xyz}\times 3N_{t}}$


$\mathbf {\hat {X}}$


\begin {align}\mathbf {Z} = \left \{\norm {\nabla \rho }_2\in \mathbb {R}^{N_{xyz}\times N_{t}}, \,\,\,\,\,\, \norm {\nabla p}_2\in \mathbb {R}^{N_{xyz}\times N_{t}}, \,\,\,\,\,\, \text {max}\left (0,-\text {div}(u,v,w)\right )\in \mathbb {R}^{N_{xyz}\times N_{t}}\right \}.\end {align}


$s = \text {ln}\left (p/\rho ^{\gamma }\right )$


\begin {equation}\mathbf {h}_i^{\text {out}} = \sigma \left ( \mathbf {W}_{1}\mathbf {h}_{i} + \mathbf {W}_{2}\cdot \text {aggregate} \left ( \{ \mathbf {h}_j, \,\,\, \forall j \in \mathcal {N}_{\text {samp}}(i) \subseteq \mathcal {N}(i) \} \right ) \right ), \label {Xeqn8-19}\end {equation}


$\mathcal {F}_{\mathcal {GC}}$


$\mathcal {F}_{\mathcal {GC}}$


\begin {equation}\mathbf {h}^{\text {out}} = \text {softmax}\left (\frac {\mathbf {Q}\mathbf {K}^{T}}{\sqrt {d}}\right )\mathbf {V}, \label {Xeqn9-20}\end {equation}


$\mathbf {Q}$


$\mathbf {K}$


$\mathbf {V}$


$\mathbf {Q} = \mathbf {h}\mathbf {W}_{\mathbf {Q}}$


$\mathbf {K} = \mathbf {h}\mathbf {W}_{\mathbf {K}}$


$\mathbf {V} = \mathbf {h}\mathbf {W}_{\mathbf {V}}$


$d$


$\phi \left (\cdot \right )$


\begin {equation}\mathbf {h}^{\text {out}} = \frac {\left (\phi \left (\mathbf {Q}\right )\left (\phi \left (\mathbf {K}\right )^{T}\mathbf {V}\right )\right )}{\left (\phi \left (\mathbf {Q}\right )\left (\phi \left (\mathbf {K}\right )^{T}\mathbf {1}\right )\right )}, \label {Xeqn10-21}\end {equation}


$\mathbf {h}$


$i$


$\mathbf {h}_i$


\begin {equation}\mathbf {h}_{i}^{\text {out}} = \frac {\sum _{j \in \mathcal {N}_{\text {eg}}(i)} \mathbf {O}_{ij} \mathbf {V}_j} {\sum _{j \in \mathcal {N}_{\text {eg}}(i)} \mathbf {O}_{ij}}, \,\,\,\,\, \mathbf {O}_{ij} = \exp \left ( \frac {\mathbf {Q}_i \mathbf {K}_j}{\sqrt {d}} \right ), \label {Xeqn11-22}\end {equation}


$\mathcal {N}_{\text {eg}}(i)$


$i$


$\mathbf {h}$


$\mathcal {F}_{\mathcal {GC}}$


$\mathbf {h}$


$\mathbf {h}^{\text {out}}$


$\mathbf {h}'$


$\mathbf {W}_1$


$\mathbf {W}_2$


$\mathbf {W}_3$


$\sigma $


$\mathbf {h}'$


$\mathbf {\Psi }$


$\mathbf {W}_1$


$\mathbf {W}_2$


$\mathbf {W}_3$


$n$


\begin {align}&\mathbf {\Psi }^{(n)} = \mathbf {W}_{1}\mathbf {\Psi }^{(n-1)} + \mathbf {W}_{2}\mathbf {h}'^{(n)}, \\ & \mathbf {y}^{(n)} = \mathbf {W}_3 \mathbf {\Psi }^{(n)}.\end {align}


$\mathbf {W}_1$


$\mathbf {W}_1$


$\mathcal {F}_{\Theta }$


$\mathcal {L}_{\text {data}}$


$\mathcal {L}_{\text {phy}}$


$\mathbf {m} = \mathbf {m}^{(0)}$


\begin {align}\mathcal {L}_{\text {data}} = \frac {1}{N_{tc}\sum \mathbf {m}}\norm {\mathbf {m}\odot \left (\mathbf {X} - \hat {\mathbf {X}}\right )}_{2}^{2}.\end {align}


$\mathbf {\hat {X}}$


$\mathbf {\hat {U}}\in \mathbb {R}^{N_{t}\times N_{c} \times N_{x} \times N_{y} \times N_{z}}$


$\left (i,j,k\right )$


$\left (i,j\right )$


\begin {align}\label {eq:godloss} &\mathcal {L}_{\text {phy}} = \frac {1}{N} \sum _{c}\sum _{t}\sum _{i,\,j,\,k} \left ( \mathbf {\hat {U}}_{i,\,j,\,k}^{(t+1,\,c)} - \mathbf {\hat {U}}_{i,\,j,\,k}^{(t,\,c)} + \lambda _{x}\left [\Delta {\mathbf {F}}_{i,\,j,\,k}^{(t,\,c)} \right ] + \lambda _{y}\left [\Delta {\mathbf {G}}_{i,\,j,\,k}^{(t,\,c)}\right ] + \lambda _{z}\left [\Delta {\mathbf {H}}_{i,\,j,\,k}^{(t, \, c)}\right ]\right )^{2},\\ &\Delta {\mathbf {F}}_{i,\,j,\,k}^{(t,\,c)} = {\mathbf {F}}_{i + \frac {1}{2},\,j,\,k}^{(t,\,c)} - {\mathbf {F}}_{i - \frac {1}{2},\,j,\,k}^{(t,\,c)}, \\[8pt] &\Delta {\mathbf {G}}_{i,\,j,\,k}^{(t,\,c)} = {\mathbf {G}}_{i,\,j + \frac {1}{2},\,k}^{(t,\,c)} - {\mathbf {G}}_{i,\,j - \frac {1}{2}\,,k}^{(t,\, c)}, \\[8pt] &\Delta {\mathbf {H}}_{i,\,j,\,k}^{(t,\,c)} = {\mathbf {H}}_{i,\,j,\,k+\frac {1}{2}}^{(t,\,c)} - {\mathbf {H}}_{i,\,j,\,k-\frac {1}{2}}^{(t,\,c)},\end {align}


$\lambda _{x} = \Delta t/\Delta x$


$\lambda _{y} = \Delta t/\Delta y$


$\lambda _{z} = \Delta t/\Delta z$


$N = N_{xyz}N_{tc}$


$\Delta {\mathbf {F}}$


$\Delta {\mathbf {G}}$


$\Delta {\mathbf {H}}$


$x,y,z$


$\mathbf {\hat {U}}$


$\mathcal {L}_{\text {data}} + \mathcal {L}_{\text {phy}}$


$\mathcal {L} = \mathcal {L}_{\text {data}} + \mathcal {L}_{\text {phy}}$


$\partial _{\mathbf {n}} \mathbf {U} = \mathbf {0}$


$10^{-4}$


$64^3$


$5\times 10^{-4}$


$\left (\pm 1,\pm 1,\pm 1\right )^{T}$


$\lambda = 3$


$\mathcal {F}_{\mathcal {MP}}$


$\mathcal {F}_{\mathcal {GC}}$


$\mathcal {F}_{\mathcal {MP}}$


$(\lambda =3)$


$\mathbf {X}$


$\mathcal {F}_{\mathcal {MP}}$


$(\lambda =3)$


$\mathcal {F}_{\mathcal {GC}}$


$\mathcal {F}_{\mathcal {MP}}$


$\phi = \text {ELU}(x) + 1$


$\mathcal {F}_{\mathcal {GC}}$


$(\lambda =3)$


$\mathbf {X}$


$\mathcal {F}_{\mathcal {GC}}$


$(\lambda =3)$


$\mathcal {F}_{\mathcal {MP}}$


$\mathcal {F}_{\mathcal {GC}}$


$\mathcal {F}_{\mathcal {MP}}$


$\mathcal {F}_{\mathcal {GC}}$


$(\lambda = 1)$


$(\lambda = 2)$


$(\lambda = 3)$


$\lambda = \left \{1,2,3\right \}$


$\mathcal {F}_{\mathcal {MP}}$


$\mathcal {F}_{\mathcal {GC}}$


$\mathbf {X}$


$\lambda =3$


$\lambda =2$


$\lambda =1$


$\lambda = 1$


$\lambda =2$


$\lambda =3$


$\lambda =1$


$\lambda = \left \{1,2,3\right \}$


$\mathcal {F}_{\mathcal {MP}}$


$\mathcal {F}_{\mathcal {GC}}$


$\lambda = 2$


$\lambda = 1$


$\lambda = \left \{1,2,3\right \}$


$\mathcal {F}_{\mathcal {MP}}$


$\mathcal {F}_{\mathcal {GC}}$


$\lambda = 2$


$\lambda = 3$


$\lambda =1$


$\lambda =1$


$(\lambda = 3)$


$\mathcal {F}_{\mathcal {MP}}$


$\mathcal {F}_{\mathcal {GC}}$


$\mathbf {X}$


$\lambda = 3$


$(\lambda = 3)$


$\mathcal {F}_{\mathcal {MP}}$


$\mathcal {F}_{\mathcal {GC}}$


$\mathcal {F}_{\mathcal {MP}}$


$\mathcal {F}_{\mathcal {GC}}$


$(\lambda = 3)$


$\mathcal {F}_{\mathcal {MP}}$


$\mathcal {F}_{\mathcal {GC}}$


$(\lambda = 3)$


$\mathbf {X}$


$\mathcal {F}_{\mathcal {MP}}$


$\mathcal {F}_{\mathcal {GC}}$


$(\lambda = 3)$


$\mathcal {F}_{\mathcal {MP}}$


$\mathcal {F}_{\mathcal {GC}}$


$\lambda =3$


$\mathcal {F}_{\mathcal {MP}}$


$\mathcal {F}_{\mathcal {GC}}$


$\lambda =3$


$\mathbf {X}$


$\mathcal {F}_{\mathcal {MP}}$


$\mathcal {F}_{\mathcal {GC}}$


$\lambda =3$


$\cancel {\mathcal {L}_{\text {phy}}}$


$\lambda $


$s_l$


$s_r$


$x=0$


$s_*$


$x$


\begin {equation}{\partial _t} \begin {pmatrix} \rho \\ \rho u \\ \rho v \\ \rho w \\ E \\ \end {pmatrix} + {\partial _x} \begin {pmatrix} \rho u \\ \rho u^{2} + p \\ \rho u v \\ \rho u w \\ u( E + p) \\ \end {pmatrix} = \mathbf {0} \label {Xeqn12-A.1}\end {equation}


$s_l$


$s_r$


\begin {equation}s_l = \tilde {u} - \tilde {a}, \,\,\,\,\,\,\,\, s_r = \tilde {u} + \tilde {a}, \label {Xeqn13-A.2}\end {equation}


$\tilde {u}$


$\tilde {a}$


\begin {equation}\tilde {u} = \frac {\sqrt {\rho _l}u_l + \sqrt {\rho _r}u_r}{\sqrt {\rho _l} + \sqrt {\rho _r}}, \,\,\,\,\,\,\,\, \tilde {a} = \sqrt {\left (\gamma - 1\right )\left (\tilde {H} - \frac {1}{2}\tilde {u}^2\right )}, \label {Xeqn14-A.3}\end {equation}


$\tilde {H}$


\begin {equation}\tilde {H} =\frac {\sqrt {\rho _l}H_l + \sqrt {\rho _r}H_r}{\sqrt {\rho _l} + \sqrt {\rho _r}}, \,\,\,\,\,\,\,\, H =\frac {E + p}{\rho }. \label {Xeqn15-A.4}\end {equation}


$s_l$


$s_r$


$s_*$


\begin {equation}s_{*} = \frac {p_r - p_l + \rho _{l}u_{l}(s_l - u_l) - \rho _{r}u_{r}(s_r - u_r)}{ \rho _{l}(s_l - u_l) - \rho _{r}(s_r - u_r)}. \label {Xeqn16-A.5}\end {equation}


$\mathbf {U}_{*l}$


$\mathbf {U}_{*r}$


$s_*$


$s_l$


$s_r$


\begin {equation}\mathbf {U}_{*\eta } = \rho _{\eta }\left (\frac {s_\eta - u_\eta }{s_\eta - s_{*}}\right )\begin {pmatrix} 1 \\ s_{*} \\ v_\eta \\ w_\eta \\ \frac {E_\eta }{\rho _\eta } + (s_{*} - u_\eta )\left [s_{*} + \frac {p_\eta }{\rho _\eta (s_\eta - u_\eta )}\right ] \end {pmatrix}, \label {Xeqn17-A.6}\end {equation}


$\eta = \left \{l,r\right \}$


$\mathbf {F}_{*l}$


$\mathbf {F}_{*r}$


$\mathbf {F}_{i + \frac {1}{2}}^n$


\begin {equation}\label {eq:HLLC} \mathbf {F}_{*\eta } = \mathbf {F}_\eta + s_\eta (\mathbf {U}_{*\eta } - \mathbf {U}_\eta ), \;\;\;\;\;\; \mathbf {F}_{i + \frac {1}{2}}^t = \begin {cases} \mathbf {F}_l & 0 \leq s_l,\\ \mathbf {F}_{*l} & s_l \leq 0 \leq s_{*},\\ \mathbf {F}_{*r} & s_{*} \leq 0 \leq s_{r},\\ \mathbf {F}_{r} & 0 \geq s_{r}. \end {cases}\end {equation}


$y$


$\mathbf {G}_{j + \frac {1}{2}}^t$


$z$


$\mathbf {H}_{k + \frac {1}{2}}^t$


$\mathbf {\bar {X}}$


$\mathbf {U}$


$x$


$y$


$z$


\begin {align}&\bm {\alpha }_{0} = \frac {1}{3}\mathbf {U}_{i-2} -\frac {7}{6}\mathbf {U}_{i-1} - \frac {11}{6}\mathbf {U}_{i}\\[6pt] &\bm {\alpha }_{1} =-\frac {1}{6}\mathbf {U}_{i-1} + \frac {5}{6}\mathbf {U}_{i} + \frac {1}{3}\mathbf {U}_{i + 1} \\[6pt] &\bm {\alpha }_{2} = \frac {1}{3}\mathbf {U}_{i} + \frac {5}{6}\mathbf {U}_{i + 1} - \frac {1}{6}\mathbf {U}_{i + 2}\end {align}


$\bm {\beta }$


\begin {align}&\bm {\beta }_{0} = \frac {13}{12}\left (\mathbf {U}_{i-2}-2\mathbf {U}_{i-1}+\mathbf {U}_{i}\right )^{2} + \frac {1}{4}\left (\mathbf {U}_{i-2} - 4\mathbf {U}_{i-1} + 3\mathbf {U}_{i}\right )^{2} \\[6pt] &\bm {\beta }_{1} = \frac {13}{12}\left (\mathbf {U}_{i-1}-2\mathbf {U}_{i}+\mathbf {U}_{i+1}\right )^{2} + \frac {1}{4}\left (\mathbf {U}_{i-1} - \mathbf {U}_{i+1}\right )^{2}\\[6pt] &\bm {\beta }_{2} = \frac {13}{12}\left (\mathbf {U}_{i}-2\mathbf {U}_{i+1}+\mathbf {U}_{i+2}\right )^{2} + \frac {1}{4}\left (3\mathbf {U}_{i} - 4\mathbf {U}_{i+1} + \mathbf {U}_{i+2}\right )^{2}\end {align}


$\bm {\omega }$


\begin {equation}\bm {\omega }_n = \frac {\bm {\zeta }_{n}}{\bm {\zeta }_{0} + \bm {\zeta }_{1} + \bm {\zeta }_{2}}, \,\,\,\,\,\,\,\, \bm {\zeta }_{n} = \frac {d_{n}}{\left (\epsilon + \bm {\beta }_{n}\right )^2}, \,\,\,\,\,\,\,\, \left (d_0, d_1, d_2\right ) = \left (\frac {1}{10}, \frac {3}{5}, \frac {3}{10} \right ), \,\,\,\,\,\,\,\, n \in \left \{0,1,2 \right \}, \label {Xeqn19-B.7}\end {equation}


$\epsilon $


${\mathbf {U}}_{i + \frac {1}{2}}^{\text {WENO}}$


\begin {equation}{\mathbf {U}}_{i + \frac {1}{2}}^{\text {WENO}} = \bm {\omega }_{0} \odot \bm {\alpha }_0 + \bm {\omega }_{1} \odot \bm {\alpha }_1 + \bm {\omega }_{2} \odot \bm {\alpha }_2 . \label {Xeqn20-B.8}\end {equation}


$[0,1]^3$


$\,\,{J}_{21}^{} {J}_{32}^{} {S}_{34}^{-} {S}_{41}^{+} {J}_{51}^{} {S}_{62}^{+} {J}_{73}^{} {S}_{84}^{-} {R}_{65}^{-} {S}_{76}^{-} {J}_{78}^{} {J}_{85}^{}$


${S}_{21}^{+} {S}_{32}^{-} {S}_{34}^{-} {S}_{41}^{+} {S}_{51}^{+} {S}_{62}^{-} {S}_{73}^{+} {S}_{84}^{-} {S}_{65}^{-} {S}_{76}^{+} {S}_{78}^{+} {S}_{85}^{-}$


${R}_{21}^{+} {R}_{32}^{-} {R}_{34}^{-} {R}_{41}^{+} {S}_{51}^{-} {S}_{62}^{+} {S}_{73}^{-} {S}_{84}^{+} {R}_{65}^{-} {R}_{76}^{+} {R}_{78}^{+} {R}_{85}^{-}$


${J}_{21}^{} {J}_{32}^{} {J}_{34}^{} {J}_{41}^{} {S}_{51}^{-} {R}_{62}^{+} {S}_{73}^{-} {R}_{84}^{+} {J}_{65}^{} {J}_{76}^{} {J}_{78}^{} {J}_{85}^{}$


${J}_{21}^{} {J}_{32}^{} {J}_{34}^{} {J}_{41}^{} {J}_{51}^{} {J}_{62}^{} {J}_{73}^{} {J}_{84}^{} {J}_{65}^{} {J}_{76}^{} {J}_{78}^{} {J}_{85}^{}$


${S}_{21}^{+} {J}_{32}^{} {J}_{34}^{} {S}_{41}^{+} {S}_{51}^{+} {J}_{62}^{} {S}_{73}^{+} {J}_{84}^{} {J}_{65}^{} {S}_{76}^{+} {S}_{78}^{+} {J}_{85}^{}$


$\text {SNR} = 25 \text { dB}$


$\mathcal {F}_{\mathcal {MP}}$


$\mathcal {F}_{\mathcal {GC}}$


$(\lambda = 3)$


$\mathbf {X}$


$\text {SNR} = 25 \text { dB}$


$\text {SNR} = 25 \text { dB}$

https://orcid.org/0000-0001-5460-5337
mailto:rgc38@cam.ac.uk
mailto:rrk26@cam.ac.uk
https://doi.org/10.1016/j.cma.2025.118328
https://doi.org/10.1016/j.cma.2025.118328
http://creativecommons.org/licenses/by/4.0/


R. Cassia and R. Kerswell

message-passing is permutation-invariance, meaning features can be updated in any node ordering. Furthermore, message-passing 
inherently captures local dependencies in data. Graph neural networks (GNNs) and message-passing were first introduced by Scarselli 
et al. [1]. Graph convolutional networks were then introduced by Kipf and Welling[2], which perform the message-passing aggre-
gation through convolutional operations, increasing efficiency and interpretability. Veličković et al. [3] improve over GCNs by in-
troducing graph attention networks (GATs) which use attention scores in the aggregation process. The attention mechanism reduces 
oversmoothing typically encountered in GCNs. Hamilton et al. [4] also improve over GCNs by using sampling-based aggregation 
(graphSAGE), enabling scalability to larger graphs. Xu et al. [5] use a MLP-based aggregation of neighbours in their graph isomor-
phism network (GIN). They show that their network is as expressive as the Weisfeiler-Lehman test (a graph isomorphism test), which 
enables their model to better distinguish graph structures.

Contextually-aware models are those that capture long-range dependencies, typically through global attention mechanisms. The 
origins of attention mechanisms can be traced to the work of Bahdanau et al. [6] and Luong et al. [7] who introduce additive and 
dot-product attention in their recurrent-based architectures for neural machine translation. However, the major breakthrough came 
when the transformer architecture, presented by Vaswani et al. [8], completely removed recurrence through usage of sinusoidal po-
sitional encodings and multi-head self attention. This novel architecture is parallelizable over sequences which significantly reduces 
training time. They generalize the attention mechanism in terms of querys, keys and values where the multiple heads allows learning 
of different aspects of the input. Their attention is a scaled dot-product attention where the scaling prevents exploding gradients. 
The transformer breakthrough led to numerous milestones in language modeling and vision [9–11]. There has also been work on 
reducing the computational complexity of transformers from quadratic to linear. Katharopoulos et al. [12] achieve linear complexity 
by expressing self-attention as a linear dot-product of kernel feature maps and by making use of the associativity of matrix multi-
plication. Choromanski et al. [13] also achieve linear complexity by approximating kernelized attention using random feature maps. 
Recently, there has been work on replacing attention with state space models (SSMs) that are contextually-aware to achieve linear 
complexity. Gu and Dao[14] do this by making the SSM parameters input-dependent, allowing their model to selectively propa-
gate or forget information along a sequence depending on the current token. Their model, named Mamba, is implemented using a 
hardware-aware algorithm to address the computational inefficiencies caused by the recurrent-nature of the model. Dao and Gu[15] 
later develop a theoretical framework that connects SSMs with certain variants of attention - enabling the design of a newer model, 
Mamba-2, that also leverages the modern hardware optimizations of attention. There has also been work on extending transformers 
and selective-SSMs to handle graphical data [16–20].

In this paper, we are interested in using the framework proposed by Rampášek et al. [21] in performing our fluid reconstruction 
task. The framework, known as GraphGPS, comprises three main components: a local message-passing mechanism via graphs, a global 
attention mechanism, and positional/structural encodings. The contributions of our paper are as follows:

• To the best of our knowledge, this is the first study on the use of GraphGPS for flow field reconstruction. In particular, we make 
use of the modularity of the framework to explore the effect of different global attention mechanisms and different message-
passing mechanisms. We also show superior performance of GraphGPS against other ML architectures as well as against classical 
approaches.

• For the message-passing module, we also introduce a modified GAT layer, known as SA-GATConv, that leverages information 
on shocks and discontinuities in computing the local attention weights. We show that this produces sharper reconstructions in 
regions of shocks and discontinuities.

• We further leverage the graphical structure during message-passing such that a node that is initially unknown cannot propagate 
information to neighbours unless information has first been propagated to it from a known node. Compared to a naive form of 
message passing where all nodes (known or unknown) propagate information to each other, this form of guided message-passing is 
more interpretable, reduces computational overhead, and improves convergence.

• We ensure that in the projected space of GraphGPS, the representation of unknown nodes is zeroed such that the only information 
available is through their positional encoding. We achieve this through a masked projection of the input, where the masking is 
based on whether a node is known or unknown. This also ensures the reconstruction is agnostic to the initialization values at 
unknown nodes. We show empirically that this setup performs just as well as an unmasked (normal) projection of the input where 
feature propagation [22] has been used to provide an initial guess at unknown nodes.

• To the best of our knowledge, this is the first study on the 3D Riemann problems in the SciML literature.
This paper is organized as follows. In Section 2, we review prior work on flow field reconstruction from sparse measurements. In 

Section 3, we mathematically formulate our reconstruction problem, and outline the GraphGPS architecture and its various aspects. 
In Section 4, we show our experimental findings. Finally, we finish with conclusions in Section 5.

2.  Prior work on flow reconstruction

One popular approach for reconstruction in fluid dynamics is through proper orthogonal decomposition (POD), which involves 
finding orthonormal basis vectors that minimize the difference between the known data points and their projection onto a lower 
dimensional subspace. Gappy POD uses a mask to enforce the POD to be applied on known points only, and the basis vectors can then 
be used to estimate the variables at all grid points. The method was first introduced by Everson and Sirovich[23] for reconstructing 
human face images, and later used by Bui-Thanh et al. [24] in an aerodynamic setting where the pressure field around aerofoils were 
reconstructed from sensors on the aerofoil surface. Vendl and Faßbender[25] also used the method to recover the transonic flow over 
a wing-body configuration. The downside of POD is that it assumes a linear decomposition and may struggle to capture non-linear 

Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 

2 


R. Cassia and R. Kerswell

flow behaviour. Other classical methods are interpolation-based. One such method constructs a function that is a weighted sum of 
radial basis functions (RBFs). Another such method is Kriging interpolation, which is a probabilistic method that estimates values 
at unmeasured locations based on spatial correlations in the data, whilst also providing uncertainty estimates. Huang et al. [26] 
investigate the use of Kriging and RBFs in the reconstruction of hypersonic flow through a nozzle.

More recently machine learning has become a powerful paradigm for performing reconstruction. Duthé et al. [27] use a graph 
neural network to simultaneously reconstruct both the pressure and velocity fields around aerofoils from the surface pressure distri-
butions. Their model is also capable of predicting global properties of the flow such as inflow velocity and angle of attack. They make 
use of feature propagation to initialize unknown nodes. Xu et al. [28] use a physics-informed neural network to reconstruct the wake 
flow past a cylinder. They place emphasis on the importance of training procedures in drastically reducing the number of training 
epochs required, by using a cosine-annealing learning rate schedule with warm restarts. Kim et al. [29] use a gappy autoencoder to 
perform reconstruction via non-linear manifold representations. The method shows improvement over gappy POD (which utilizes 
linear subspaces), as tested on diffusion, advection and wave problems. They also assess the impact of different sensor placements 
on reconstruction accuracy. Peng et al. [30] use a physics-informed graph neural network to reconstruct high speed transient flows. 
They use a fully-connected neural network to map the sparse data to the entire domain, and then assign the mapped data to graph 
nodes to facilitate information exchange. Finally the output state variables are used to compute a PINN loss summed together with 
initial condition and boundary condition losses. Their model exhibits good spatial and temporal generalization but has low accuracy 
when reconstructing flow near wave-fronts. Hosseini and Shiri [31] use a physics-informed neural network to reconstruct the wake 
flow past a cylinder from sparse sensors placed around the domain boundary and cylinder wall. Danciu et al. [32] use graph atten-
tion convolutional layers to reconstruct the flow in an internal combustion engine where the geometry is time-varying due to piston 
motion. They use a feature propagation step to leverage information from known nodes to initialize missing features, and use a mask 
to distinguish between original and propagated data points which allows more effective learning from the sparse inputs. Quattromini 
et al. [33] make use of the adjoint method to compute Reynolds-averaged Navier Stokes (RANS) gradients which are then used as 
optimization terms during the training of a GNN. They use this hybrid ML-CFD approach to reconstruct the mean flow past cylinders. 
Nguyen et al. [34] introduce FLRNet which uses a variational autoencoder to learn a latent representation of the flow-field past a 
cylinder. The VAE makes use of Fourier feature layers and a perceptual loss term to enhance its ability to learn detailed features of 
the flow-field. An MLP is then used to correlate sparse sensor measurements to the learned representation. Dang et al. [35] use an 
operator learning framework to perform reconstruction. The framework utilizes a branch-trunk architecture. The branch network, 
which uses Fourier neural operators, incorporates sensor measurements at different timestamps. The trunk network uses an MLP to 
encode the entire temporal domain. Combining the outputs of the branch and trunk nets yields the final reconstruction which is 
discretization-independent.

3.  Methodology

3.1.  Problem definition

We are interested in the following inverse problem:
Find Θ ∶

(

𝐗 ∈ ℝ𝑁𝑥𝑦𝑧×𝑁𝑡𝑐 ,𝐦 ∈ {0, 1}𝑁𝑥𝑦𝑧
)

⟶ �̂� ∈ ℝ𝑁𝑥𝑦𝑧×𝑁𝑡𝑐  such that 𝜖(�̂�, �̄�) is minimized, where:
• 𝐗, �̂�, and �̄� are the sparse, reconstructed, and true flow fields, respectively.
• 𝐦 is the binary mask which indicates where values are known or unknown.
• Θ is the machine learning model with learnable weights Θ.
• 𝑁𝑥𝑦𝑧 = 𝑁𝑥𝑁𝑦𝑁𝑧 represents the number of points in 3D space. 𝑁𝑡𝑐 = 𝑁𝑡𝑁𝑐 represents the number of features where 𝑁𝑡 is number of 
timesteps and 𝑁𝑐 is the number of variables in the physical system.

• 𝜖 is the error metric between �̂�, and �̄� for measuring accuracy (not to be confused with the training loss function).

In this paper, we are interested in flow fields pertaining to the 3D Euler equations, so 𝑁𝑐 = 5:

𝜕𝑡𝐔 + 𝜕𝑥𝐅(𝐔) + 𝜕𝑦𝐆(𝐔) + 𝜕𝑧𝐇(𝐔) = 𝟎, (1)

𝜕𝑡

⎛

⎜

⎜

⎜

⎜

⎜

⎝

𝜌
𝜌𝑢
𝜌𝑣
𝜌𝑤
𝐸

⎞

⎟

⎟

⎟

⎟

⎟

⎠

+ 𝜕𝑥

⎛

⎜

⎜

⎜

⎜

⎜

⎝

𝜌𝑢
𝜌𝑢2 + 𝑝
𝜌𝑢𝑣
𝜌𝑢𝑤

𝑢(𝐸 + 𝑝)

⎞

⎟

⎟

⎟

⎟

⎟

⎠

+ 𝜕𝑦

⎛

⎜

⎜

⎜

⎜

⎜

⎝

𝜌𝑣
𝜌𝑢𝑣

𝜌𝑣2 + 𝑝
𝜌𝑣𝑤

𝑣(𝐸 + 𝑝)

⎞

⎟

⎟

⎟

⎟

⎟

⎠

+ 𝜕𝑧

⎛

⎜

⎜

⎜

⎜

⎜

⎝

𝜌𝑤
𝜌𝑢𝑤
𝜌𝑣𝑤

𝜌𝑤2 + 𝑝
𝑤(𝐸 + 𝑝)

⎞

⎟

⎟

⎟

⎟

⎟

⎠

= 𝟎, (2)

𝐸 = 1
2
𝜌
(

𝑢2 + 𝑣2 +𝑤2) +
𝑝

(𝛾 − 1)
, (3)

where 𝜌, 𝑢, 𝑣, 𝑤, 𝑝, 𝐸, 𝛾 are density, 𝑥-velocity, 𝑦-velocity, 𝑧-velocity, pressure, energy and ratio of specific heats, respectively. The 
boundary conditions are assumed to be Neumann, i.e., 𝜕𝐧𝐔 = 𝟎. To enforce positivity preservation (i.e., non-negative density and 
energy, [36]) as a hard constraint, we work with the following representation for 𝐗 at each spatiotemporal point:

𝐗 =
(

𝜌, 𝑢, 𝑣, 𝑤,𝐸 − 1
2
𝜌(𝑢2 + 𝑣2 +𝑤2)

)𝑇
, (4)

Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 

3 


R. Cassia and R. Kerswell

and apply during training, for each spatiotemporal point, the transformation   after applying Θ:

 ∶
(

𝜌, 𝑢, 𝑣, 𝑤,𝐸 − 1
2
𝜌
(

𝑢2 + 𝑣2 +𝑤2)
)𝑇

⟶

(

|𝜌|, 𝑢, 𝑣, 𝑤,
|

|

|

|

𝐸 − 1
2
𝜌
(

𝑢2 + 𝑣2 +𝑤2)|
|

|

|

)𝑇
. (5)

We obtain �̄�, the true field, by numerically solving the 3D Euler equations using a 5th-order HLLC-WENO scheme (see Appendices A 
and B for more details). The mask 𝐦 is randomly generated from a uniform distribution such that 90% of points are unknown. For 
element 𝑚𝑖 of 𝐦:

𝑚𝑖 =

{

0 if 𝑎𝑖 < 0.9,
1 if 𝑎𝑖 ≥ 0.9.

, 𝑖 ∈ [1, 𝑁𝑥𝑦𝑧], 𝑎𝑖 ∼  (0, 1). (6)

The sparse field is then generated from the mask as 𝐗 = 𝐦⊙ �̄�, where ⊙ is the Hadamard product. For the error metric, we choose: 

𝜖 =
‖

‖

‖

�̂� − �̄�‖‖
‖2

‖

‖

�̄�‖
‖2

. (7)

3.2.  3D Riemann problems and characteristic waves

We are interested in solutions of the 3D Euler equations where the initial conditions are described by Riemann problems. As such 
we briefly discuss this class of problems. For a more in-depth discussion of Riemann problems, refer to Schulz-Rinne[37], Hoppe 
et al. [38].

A unit cube domain, with dimensions [0, 1]3, is divided into 8 octants where each octant is uniformly initialized with its own set of 
(𝜌, 𝑢, 𝑣, 𝑤, 𝑝)𝑇 . The top octants are numbered 1 to 4, anticlockwise, starting from the octant where 𝑥 > 0.5, 𝑦 > 0.5, 𝑧 > 0.5. The bottom 
octants are numbered 5 to 8 anticlockwise, starting from the octant where 𝑥 > 0.5, 𝑦 > 0.5, 𝑧 < 0.5. The initial states of the octants 
are set such that there exists a single characteristic wave at each of the 12 interfaces that separate the octants. The characteristic 
wave can be either a rarefaction 𝑅, shock 𝑆, or a contact discontinuity 𝐽 . We identify these waves by their positions based on octant 
numbering, i.e., 𝑙𝑟 = {21, 32, 34, 41, 51, 62, 73, 84, 65, 76, 78, 85}. In the following let 𝜔 denote the velocity normal to an interface, and 
let ω′  denote the tangential velocities.

The relations for rarefaction waves 𝑅±
𝑙𝑟 are as follows (assuming polytropic flow):

𝜔𝑙 − 𝜔𝑟 = ±
2
√

𝛾
𝛾 − 1

(√

𝑝𝑙
𝜌𝑙

−
√

𝑝𝑟
𝜌𝑟

)

,
𝜌𝑙
𝜌𝑟

=
(

𝑝𝑙
𝑝𝑟

)
1
𝛾
, (8)

ω
′
𝑙 = ω

′
𝑟 , 𝜔𝑙 < 𝜔𝑟, 𝑠𝑔𝑛

(

𝑝𝑙 − 𝑝𝑟
)

= ∓1. (9)

The relations for shock waves 𝑆±
𝑙𝑟 are as follows:

𝜔𝑙 − 𝜔𝑟 =

√

(

𝑝𝑙 − 𝑝𝑟
)(

𝜌𝑙 − 𝜌𝑟
)

𝜌𝑙𝜌𝑟
,

𝜌𝑙
𝜌𝑟

=
𝛾
(

𝑝𝑙 + 𝑝𝑟
)

+
(

𝑝𝑙 − 𝑝𝑟
)

𝛾
(

𝑝𝑙 + 𝑝𝑟
)

−
(

𝑝𝑙 − 𝑝𝑟
) , (10)

ω
′
𝑙 = ω

′
𝑟 , 𝜔𝑙 > 𝜔𝑟, 𝑠𝑔𝑛

(

𝑝𝑙 − 𝑝𝑟
)

= ±1. (11)

For contact discontinuities 𝐽𝑙𝑟, the pressure and normal velocities are constant, i.e., 𝜔𝑙 = 𝜔𝑟, 𝑝𝑙 = 𝑝𝑟. Additionally, in polytropic flows, 
the density 𝜌 varies arbitrarily. The tangential velocities ω′  can also vary arbitrarily, in which case a rotation is induced as a slip 
line.1

Riemann problems (2D or 3D) yield complex flow patterns which arise from the interaction of the initialized elementary waves. 
They are therefore used as canonical problems for testing compressible flow solvers. We argue that they are also a plausible choice 
for exploring inverse problems of the Euler equations, such as our reconstruction task. The configurations or cases we explore are 
shown in Fig. 1, and their initialization values are summarised in Appendix C.

3.3.  GraphGPS framework

In this subsection we outline our design of model Θ, which follows the GraphGPS framework [21]. Fig. 2 provides a high-level 
overview of this framework. The framework takes the sparse field 𝐗, projects it to higher dimensional space along with positional 
encodings, then applies 𝑁 GraphGPS layers before projecting back to the original space. Each GraphGPS layer involves contributions 
from two modules:   and  .   is the message-passing module which captures information at a local level, in the neighbour-
hood of each node, where the connections between the nodes are defined by edge indices 𝐄(𝑛) for layer 𝑛.  is the module used 
for computing global context between distant nodes. There are also MLP layers, each involving a linear-ReLU-linear sequence, for 
processing the combined contributions of   and  , as well as for mixing node features with node positional encodings. The 
number of layers 𝑁 is determined by the size of the set of edge indices, i.e., ∪𝑁

𝑛=1𝐄
(𝑛). The set of edge indices is determined from the 

mask 𝐦 via guided message-passing, which we explain next.

1 It is possible to determine directions ±± for contact discontinuities if the tangential velocities differ. However in 3D this will make the notation 
cumbersome and so we decide to omit directions for contact discontinuities.

Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 

4 


R. Cassia and R. Kerswell

Fig. 1. Density fields of the 3D Riemann configurations that we explore in this study.

Fig. 2. Overview of GraphGPS framework. 𝐄(𝑛)

 represents a message-passing layer acting on edge indices 𝐄(𝑛) for layer 𝑛.  denotes a layer that 
captures global dependencies of its input features. 𝑝𝑒 denotes positional encodings. Residual connections and normalization layers are omitted for 
clarity.

3.3.1.  Guided message-passing
Guided message-passing is underpinned by two rules. The first rule is that during message-passing, information flows strictly from 

known nodes only (to immediate neighbours). The second rule is that an unknown node becomes ’known’ if information has been 
propagated to it at least once. Formally:

Definition 1  (Guided Message-Passing). Let 𝐦 = 𝐦(0) ∈ {0, 1}𝑁𝑥𝑦𝑧  denote the initial mask. Let 𝑑1(𝑖, 𝑗) denote the 𝑙1 distance between 
node 𝑖 and node 𝑗 in their Euclidean space. The set of edge indices ∪𝑁

𝑛=1𝐄
(𝑛) can be iteratively generated starting from 𝐦(0) based on 

the following two rules:

𝐄(𝑛) =
{

(𝑖, 𝑗) ∣ 𝑑1(𝑖, 𝑗) ∈ ℤ+ ∩ [1, 𝜆], 𝑚(𝑛−1)
𝑖 = 1, ∀𝑖

}

, 𝜆 ∈ {1, 2, 3}, (12)

Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 

5 


R. Cassia and R. Kerswell

Fig. 3. Illustration of guided message-passing. Red squares indicate known nodes. Blue circles indicate unknown nodes.

𝑚(𝑛)
𝑗 = 𝑚(𝑛−1)

𝑗 ∨
⎛

⎜

⎜

⎝

⋁

(𝑖,𝑗)∈𝐄(𝑛)

𝑚(𝑛−1)
𝑖

⎞

⎟

⎟

⎠

, (13)

where ∨ denotes the logical OR operator, and ⋁ denotes the logical OR over a set. 
Parameter 𝜆 determines whether to consider orthogonal immediate neighbours only (𝜆 = 1), orthogonal+diagonal immediate 

neighbours (𝜆 = 2), or orthogonal+diagonal+corner immediate neighbours (𝜆 = 3), in determining the next set of edge connections. 
Fig. 3 illustrates guided message-passing in 2D with 𝜆 = 2.

3.3.2.  Masked projection and positional encoding
We use masked linear transformations to project input 𝐗 to a higher dimensional space, such that in the projected space, the repre-

sentation of unknown nodes is zeroed. We concatenate these projected features with the linear transformation of Laplacian positional 
encodings of 𝐗 [16,39]. Through this masked projection, we ensure that the initially unknown nodes contribute to subsequent layers 
only through their positional information. This masked projection also ensures the model is agnostic to the padding value used for 
unknown nodes in 𝐗 (should they be non-zero).

3.3.3. Candidates for 
This section outlines the message-passing modules we will explore. One popular and foundational message-passing layer is the 

graph convolutional layer (GCNConv). For node 𝑖, GCNConv is expressed as [2]:

𝐡out𝑖 = 𝜎
⎛

⎜

⎜

⎝

∑

𝑗∈ (𝑖)∪{𝑖}
𝐖𝐡𝑗

⎞

⎟

⎟

⎠

, (14)

where 𝐡 represents an intermediate feature, 𝐖 is the layer weight, 𝜎 is the ReLU activation function, and  (𝑖) denotes the neighbour-
hood of node 𝑖 (according to edge indices 𝐄). A graph attention convolutional layer (GATConv) is an extension on GCNConv through 
the introduction of attention weights 𝛼𝑖𝑗 [3]:

𝐡out𝑖 = 𝜎
⎛

⎜

⎜

⎝

∑

𝑗∈ (𝑖)∪{𝑖}
𝛼𝑖𝑗𝐖𝐡𝑗

⎞

⎟

⎟

⎠

, (15)

where the attention weights 𝛼𝑖𝑗 are computed as:

𝛼𝑖𝑗 =
exp

(

LeakyReLU
(

𝐚𝑇src𝐖𝐡𝑖 + 𝐚𝑇trg𝐖𝐡𝑗
))

∑

𝑘∈ (𝑖)∪{𝑖} exp
(

LeakyReLU
(

𝐚𝑇src𝐖𝐡𝑖 + 𝐚𝑇trg𝐖𝐡𝑘
)) , (16)

and 𝐚𝑇src, 𝐚𝑇trg are learnable parameters pertaining to the source (𝑖) and target (𝑗) nodes. In this paper we propose to modify the 
GATConv attention mechanism by introducing additional parameters �̃�, 𝐛𝑇src, 𝐛𝑇trg acting on a representation 𝐙 which encapsulates 
information about flow discontinuities:

𝛼𝑖𝑗 =
exp

(

LeakyReLU
(

𝐚𝑇src𝐖𝐡𝑖 + 𝐚𝑇trg𝐖𝐡𝑗 + 𝐛𝑇src�̃�𝐙𝑖 + 𝐛𝑇trg�̃�𝐙𝑗

))

∑

𝑘∈ (𝑖)∪{𝑖} exp
(

LeakyReLU
(

𝐚𝑇src𝐖𝐡𝑖 + 𝐚𝑇trg𝐖𝐡𝑘 + 𝐛𝑇src�̃�𝐙𝑖 + 𝐛𝑇trg�̃�𝐙𝑘

)) , (17)

where 𝐙 ∈ ℝ𝑁𝑥𝑦𝑧×3𝑁𝑡  is obtained from the reconstruction �̂� at the previous training epoch by stacking the following physical quanti-
ties: 

𝐙 =
{

‖∇𝜌‖2 ∈ ℝ𝑁𝑥𝑦𝑧×𝑁𝑡 , ‖∇𝑝‖2 ∈ ℝ𝑁𝑥𝑦𝑧×𝑁𝑡 , max
(

0,−div(𝑢, 𝑣,𝑤)
)

∈ ℝ𝑁𝑥𝑦𝑧×𝑁𝑡
}

. (18)

The density gradient alone is able to capture information on both shocks and contact discontinuities indiscriminately, since they both 
involve density jumps. The combination of density and pressure gradients however enables the attention mechanism to implicitly 

Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 

6 


R. Cassia and R. Kerswell

distinguish these two wave types since entropy 𝑠 = ln(𝑝∕𝜌𝛾 ) increases in the case of shocks. Lastly, the divergence of velocity is 
present to enable explicit identification of shocks by finding compressible regions. We refer to our modified, shock-aware GATConv 
as SA-GATConv.

The final message passing layer we explore is SAGEConv which aggregates over a sampled subset of a node’s neighbourhood, 
enabling scalability [4]:

𝐡out𝑖 = 𝜎
(

𝐖1𝐡𝑖 +𝐖2 ⋅ aggregate
(

{𝐡𝑗 , ∀𝑗 ∈ samp(𝑖) ⊆  (𝑖)}
))

, (19)

where the aggregation function is either mean, max-pool, or LSTM-based.

3.3.4. Candidates for 
We now explore candidates for  , the module for capturing global context. A popular choice for capturing global context is the 

scaled dot-product attention proposed by Vaswani et al. [8]:

𝐡out = softmax
(

𝐐𝐊𝑇
√

𝑑

)

𝐕, (20)

where 𝐐, 𝐊, and 𝐕 are the queries, keys and values given by 𝐐 = 𝐡𝐖𝐐, 𝐊 = 𝐡𝐖𝐊, and 𝐕 = 𝐡𝐖𝐕, and 𝑑 is the number of hidden 
channels. This form of attention is however quadratic in complexity and therefore unsuitable for large graphs. A linear-complexity 
alternative proposed by Katharopoulos et al. [12] aims to approximate the softmax function using kernel functions 𝜙(⋅):

𝐡out =
(

𝜙(𝐐)
(

𝜙(𝐊)𝑇𝐕
))

(

𝜙(𝐐)
(

𝜙(𝐊)𝑇 𝟏
))

, (21)

where the kernel function could be exponential, sigmoid, ReLU, ELU + 1, etc. Another way of capturing global context is the 
Exphormer attention, which is also linear in complexity and takes a graph-centric approach. If we view 𝐡 graphically, then for node 
𝑖 (corresponding to 𝐡𝑖) the attention (and its propagation) are computed as [40]:

𝐡out𝑖 =

∑

𝑗∈eg(𝑖) 𝐎𝑖𝑗𝐕𝑗
∑

𝑗∈eg(𝑖) 𝐎𝑖𝑗
, 𝐎𝑖𝑗 = exp

(

𝐐𝑖𝐊𝑗
√

𝑑

)

, (22)

where eg(𝑖) denotes the neighbours of node 𝑖 as defined by an expander graph of 𝐡. Expander graphs are sparse approximations of 
complete graphs which introduce edge connections between nodes that are distant in the original graph. For details on how expander 
graphs are generated, we refer the reader to Shirzad et al. [40]. The expander graph is usually complemented with a small number of 
virtual nodes which form edge connections with all the nodes of the expander graph. These global nodes allow for communication 
between nodes that are not directly connected in the expander graph, introducing another flavour of global attention.

The final form of  we explore is the Mamba-2 module, which is best explained schematically as in Fig. 4. Mamba-2 does 
not explicitly compute attention scores to capture global context. Rather, it relies on a gating mechanism and a selective state space 

Fig. 4. Mamba-2 module, which acts on 𝐡, and outputs 𝐡out. The module takes on the structure of a gating mechanism which, along with the SSM 
layer, helps determine which information to discard and retain. The SSM updates the intermediate input 𝐡′ through a fixed size latent variable, and 
its selectivity arises from making the weights 𝐖1, 𝐖2, and 𝐖3 themselves input-dependent. The module also comprises of projection layers as well 
as a normalization layer that improves stability. 𝜎 denotes the SiLU activation function.

Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 

7 


R. Cassia and R. Kerswell

model to determine which information to discard and retain. Examining Fig. 4, the state space model updates the intermediate input 𝐡′
through a fixed size latent variable 𝚿, and its selectivity arises from making the weights 𝐖1, 𝐖2, and 𝐖3 themselves input-dependent 
[14,15]. If we view the intermediate input as a sequence (rather than as a graph) indexed by 𝑛, then the recurrent state-space model 
is defined as:

𝚿(𝑛) = 𝐖1𝚿(𝑛−1) +𝐖2𝐡′(𝑛), (23)

𝐲(𝑛) = 𝐖3𝚿(𝑛). (24)

Mamba-2 is linear in complexity and has been proposed as an alternative to the transformer model. We remark that a crucial feature 
of Mamba-2 is that weight 𝐖1 has a scalar times identity matrix structure. By imposing this structure on 𝐖1, a duality can be derived 
between state space models and attention, enabling the design of an efficient algorithm for computing the state space model which 
has linear complexity whilst making use of the GPU-friendly matrix multiplications of attention [15].

3.4.  Multi-loss function

To learn the weights of function Θ, we optimize a data loss data and a physical loss phy with respect to the weights. The data 
loss is computed only for the initially known nodes as defined by 𝐦 = 𝐦(0): 

data =
1

𝑁𝑡𝑐
∑

𝐦
‖

‖

‖

‖

𝐦⊙
(

𝐗 − �̂�
)

‖

‖

‖

‖

2

2
. (25)

For the physical loss, we first map reconstruction �̂� back to conserved variables �̂� ∈ ℝ𝑁𝑡×𝑁𝑐×𝑁𝑥×𝑁𝑦×𝑁𝑧 , which we then use to compute 
a Godunov loss function2:

phy =
1
𝑁

∑

𝑐

∑

𝑡

∑

𝑖, 𝑗, 𝑘

(

�̂�(𝑡+1, 𝑐)
𝑖, 𝑗, 𝑘 − �̂�(𝑡, 𝑐)

𝑖, 𝑗, 𝑘 + 𝜆𝑥
[

Δ𝐅(𝑡, 𝑐)
𝑖, 𝑗, 𝑘

]

+ 𝜆𝑦
[

Δ𝐆(𝑡, 𝑐)
𝑖, 𝑗, 𝑘

]

+ 𝜆𝑧
[

Δ𝐇(𝑡, 𝑐)
𝑖, 𝑗, 𝑘

])2
, (26)

Δ𝐅(𝑡, 𝑐)
𝑖, 𝑗, 𝑘 = 𝐅(𝑡, 𝑐)

𝑖+ 1
2 , 𝑗, 𝑘

− 𝐅(𝑡, 𝑐)
𝑖− 1

2 , 𝑗, 𝑘
, (27)

Δ𝐆(𝑡, 𝑐)
𝑖, 𝑗, 𝑘 = 𝐆(𝑡, 𝑐)

𝑖, 𝑗+ 1
2 , 𝑘

−𝐆(𝑡, 𝑐)
𝑖, 𝑗− 1

2 ,𝑘
, (28)

Δ𝐇(𝑡, 𝑐)
𝑖, 𝑗, 𝑘 = 𝐇(𝑡, 𝑐)

𝑖, 𝑗, 𝑘+ 1
2

−𝐇(𝑡, 𝑐)
𝑖, 𝑗, 𝑘− 1

2

, (29)

where 𝜆𝑥 = Δ𝑡∕Δ𝑥, 𝜆𝑦 = Δ𝑡∕Δ𝑦, 𝜆𝑧 = Δ𝑡∕Δ𝑧, 𝑁 = 𝑁𝑥𝑦𝑧𝑁𝑡𝑐 , and Δ𝐅, Δ𝐆, Δ𝐇 are the net 𝑥, 𝑦, 𝑧 fluxes computed from �̂� using a Riemann 
solver. For the purposes of this paper we use the HLLC Riemann solver to estimate the fluxes, which assumes a three-wave structure 
at each cell interface, as outlined in Appendix A. We refer the reader to Cassia and Kerswell [41] for more information on Godunov 
losses and the justification of their use for modeling hyperbolic conservation laws.

Rather than optimizing the summation data + phy with one Adam optimizer, we choose to optimize the two losses separately 
using two separate Adam optimizers. Therefore, within each epoch, two back-propagations are applied, and two updates are made 
to the model weights. Despite the increased computational overhead, we find that this setup significantly stabilizes training.3

3.5.  Handling of boundary conditions

Boundary conditions are incorporated into the ML architecture using padding operations. After each GraphGPS layer, we 1) 
reshape the representation to a 3D grid, 2) remove the first and last values of each spatial dimension, 3) replace the removed values 
with padded values, and 4) flatten the final representation for subsequent processing. In our case of homogeneous Neumann boundary 
conditions (i.e. 𝜕𝐧𝐔 = 𝟎), we employ replicative padding.

For non-zero Neumann boundary conditions, the padded values must be interpolated from the interior domain. If the boundary 
conditions were Dirichlet or periodic, constant or circular padding should be employed, respectively. We also note that boundary 
conditions could be more mathematically complex (coupled, non-linear, time-dependent), and act on irregular, curved, or moving 
boundaries. Exploring the implementation of such boundary conditions into ML architectures is an interesting topic but is beyond 
the scope of this paper.

4.  Experiments

This section is divided into four parts. Firstly, we perform an ablation study on the message-passing and global attention compo-
nents of the GraphGPS framework. Secondly, we examine the performance of guided-message passing compared to standard, dense 
message-passing under different node connectivities and initializations. Thirdly, we examine performance of our GraphGPS model 

2 We use here notation (𝑖, 𝑗, 𝑘) to denote spatial indices in 3D. This notation is not to be confused with earlier sections where notation (𝑖, 𝑗) is used 
to denote graph nodes.
3 Specifically, we did not encounter any NaN loss values with this setup, which we did encounter when using a single additive loss  = data + phy.

Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 

8 


R. Cassia and R. Kerswell

Fig. 5. Random and biased distribution of known points (sensor locations) on the faces of the Riemann cube for Case 6. The density field is shown.

against numerous benchmarks. Lastly, we examine the performance gains resulting from the physics-based loss and optimal sensor 
placement.

All models examined are trained to 5000 epochs using an Adam optimizer, with a learning rate of 10−4, on a NVIDIA A100 GPU. 
A model is trained for each flow case outlined in Section 3.2. The number of hidden channels (or projected dimension) is set to 64 for 
all models. We explore a grid of spatial resolution 643 and time resolution 5 × 10−4. For visualization, we find that plotting diagonal 
slices through the Riemann cube, defined by normal vectors (±1,±1,±1)𝑇 , is most effective for revealing the 3D nature of the flow.

Fig. 5 shows the distribution of observed points (sensor locations) on the faces of the Riemann cube for Case 6, to give a sense 
of the level of sparsity we explore. In tangible terms, 90% of points are unknown. We explore two types of sensor distributions: a 
purely random sensor distribution, and a more optimal distribution that has bias towards sharp flow gradients. The latter assumes 
prior knowledge of the flow and results in a non-uniform (or non-homogeneous) sensor distribution. Unless stated otherwise, all 
experiments assume the random sensor distribution, where all cases explored have the same random distribution of known points. 
Sensor locations for all cases can be found in Appendix D.

4.1.  Ablation study

For all experiments explored in this subsection, we use guided message-passing with 𝜆 = 3 and masked projection, which are 
outlined in Sections 3.3.1 and 3.3.2, respectively.

We first examine the effect of using different message-passing layers,  , on the performance of the framework in Fig. 2, whist 
keeping  the same (Mamba-2). We examine the performance when using GATConv, our SA-GATConv, GCNConv, and SAGEConv. 
See Section 3.3.3 for an overview of these layers. For GATConv and SA-GATConv, we use one attention head. For SAGEConv we use 
max pooling as the aggregation method. Fig. 6 shows the density plots and error for each message-passing layer and for each case. 
We find that SA-GATConv and GATConv give the most faithful reconstructions both visually and numerically. On the other hand, the 
susceptibility of GCNConv to oversmoothing can be observed where there are shocks and slip lines. SAGEConv performs the worst 
and gives noisy reconstructions. Table 1 compares the computational expense of GraphGPS when using the different message-passing 
layers. Due to their attention mechanisms, GATConv and SA-GATConv require more memory and time. Furthermore the modified 
attention mechanism of SA-GATConv slightly increases the overhead compared to GATConv.

Examining Fig. 6 for the top performers in more detail, we find that SA-GATConv yields slightly lower errors than GATConv. It 
is also interesting to examine the physical features of the cases in more detail for these two message-passing layers, which appear 
sharper for SA-GATConv. For Case 1, we observe this in the shock and slip line towards the bottom right. For Case 2, we observe the 
same at the tips of the arrow-like subsonic region produced from the interaction of the initial shock waves. The same can be observed 
in Case 3 for the shock that bends towards the center and dissolves into rarefaction fans. For Case 4, this observation can be made for 
the slip line that bends the rarefaction, as well as for the bow-shock on the bottom right. In Case 5, the bent slip lines have a more 
pronounced shape for SA-GATConv that better match the truth compared to GATConv. Finally, Case 6 shows the most significant 
difference in shock definition between the two message-passing layers when observing the two shock fronts towards the left of the 
density plots.

Next, we examine the effect of different methods for capturing global context,  , on the performance of the framework in Fig. 2, 
whist keeping   the same (SA-GATConv). We examine the performance when using Mamba-2, linear transformer, and Exphormer. 

Table 1 
Compute information of different  candidates when combined with Mamba-2, 
along with the use of masked projection and guided message-passing (𝜆 = 3).

 SA-GATConv  GATConv  GCNConv  SAGEConv
 Max memory allocated (GB)  37.70  36.90  15.40  15.10
 Avg time per epoch (s)  1.64  1.60  1.43  1.40
 Total training time (h)  2.28  2.22  1.99  1.95

Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 

9 


R. Cassia and R. Kerswell

Fig. 6. Performance of different  candidates when combined with Mamba-2, along with the use of masked projection and guided message-
passing (𝜆 = 3). The relative L2 error across all physical channels in 𝐗 is indicated on the top left of each subplot, and the density field is plotted 
with values normalized between [0,1]. The normal vectors that define the planes of view are indicated at the bottom for each case. The full 3D 
density fields for the cases are shown in Fig. 1. The corresponding error fields for density are shown in Appendix F, Fig. F.1.

Table 2 
Compute information of different  candidates when combined with 
SA-GATConv, along with the use of masked projection and guided 
message-passing (𝜆 = 3).

 Mamba-2  Exphormer  Transformer
 Max memory allocated (GB)  37.70  77.00  37.80
 Avg time per epoch (s)  1.63  2.59  2.50
 Total training time (h)  2.26  3.60  3.48

See Section 3.3.4 for an overview of these modules. For all three modules we use 16 heads. For the linear transformer we use kernel 
function 𝜙 = ELU(𝑥) + 1. For the Exphormer, the degree of the expander graph is 10, and the number of virtual nodes is 5. Examining 
Fig. 7 we find that the linear transformer marginally outperforms the other two candidates, with the exception of Case 2 where the 
reconstruction appears least faithful. Of more significance is the computational expense of the three candidates outlined in Table 2, 
where Mamba-2 has the least overhead in terms of time and memory. Despite the sparsification of the graph used in Exphormer, 
the memory consumption is almost double that of the other candidates, due to the propagation step and due to the introduction of 
additional edge connections through virtual nodes.

It is evident from this ablation study that   is more influential to the performance of GraphGPS compared to  , in the context 
of our reconstruction task.

Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 

10 


R. Cassia and R. Kerswell

Fig. 7. Performance of different  candidates when combined with SA-GATConv, along with the use of masked projection and guided message-
passing (𝜆 = 3). The relative L2 error across all physical channels in 𝐗 is indicated on the top left of each subplot, and the density field is plotted 
with values normalized between [0,1]. The normal vectors that define the planes of view are indicated at the bottom for each case. The full 3D 
density fields for the cases are shown in Fig. 1. The corresponding error fields for density are shown in Appendix F, Fig. F.2.

4.2.  Guided message-passing vs. dense message-passing

For all experiments explored in this subsection, we use SA-GATConv for   and Mamba-2 for  , which are explained in detail 
in Sections 3.3.3 and 3.3.4, respectively.

We are interested to see how guided message-passing performs in comparison to dense-message passing. An overview of guided 
message-passing is outlined in Section 3.3.1. By dense-message passing, we mean that for all layers of GraphGPS, each node is 
connected to all its immediate neighbours. Like guided message-passing, the immediate neighbours could be orthogonal neighbours 
(𝜆 = 1), orthogonal+diagonal neighbours (𝜆 = 2), or orthogonal+diagonal+corner neighbours (𝜆 = 3). It is interesting to compare 
the performance of guided and dense message-passing under these three different types of edge connectivities, assuming masked 
projection for all runs. This is done in Fig. 8. In terms of error we observe only a marginal difference in performance between the 
message-passing types at 𝜆 = 3 and 𝜆 = 2, with no significant difference from a visual point of view. We see however that performance 
significantly degrades for both message-passing types when 𝜆 = 1, indicating that the additional diagonal neighbours are critical to 
the reconstruction process. Furthermore, for 𝜆 = 1, guided message-passing performs significantly worse than dense message-passing 
for Cases 2, 3, 5, and 6. We thus conclude that in the case of 𝜆 = 2 and 𝜆 = 3, the use of guided message-passing, which is sparser in 
terms of number of edge connections, does not come at the cost of performance degradation, as in the case of 𝜆 = 1. This sparseness of 
guided message-passing also results in reduced computational overhead as indicated in Table 3, especially in terms of memory. Also 
of note is that the savings also appear less significant going from 𝜆 = 2 to 𝜆 = 1. Another performance gain of guided message-passing 
appears to be in terms of initial convergence behaviour. Fig. 9 plots the evolution of error in the first 500 epochs, where it is evident 
that guided message-passing at 𝜆 = 2 and 𝜆 = 3 tend towards a stable solution quicker than the other settings, because they do not 
encounter plateaus as the other settings do. These plateaus are longer and more frequent at 𝜆 = 1 for both message-passing types, 
and are accompanied by sudden jumps in error, which indicate unstable training. This may explain the degraded reconstructions at
𝜆 = 1.

It is also of interest to compare the two message-passing types when the input projection is masked, or normal (unmasked). 
Section 3.3.2 outlines masked projection. Where unmasked projection is used, we initialize unknown nodes using feature propagation 
as is usually done in the literature for graph-based reconstructions [27,32].4 For all runs in Fig. 10 we use 𝜆 = 3. Figs. 10 and 11 
suggest there is no notable improvement in accuracy or stability when introducing unmasked projection (with feature propagation) 

4 Feature propagation is a pre-processing step where missing features are interpolated from known features using propagation.

Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 

11 


R. Cassia and R. Kerswell

Fig. 8. Performance of guided/dense message-passing with 𝜆 = {1, 2, 3}. We use SA-GATConv and Mamba-2 for  and  , respectively, and use 
masked projection. The relative L2 error across all physical channels in 𝐗 is indicated on the top left of each subplot, and the density field is plotted 
with values normalized between [0,1]. The normal vectors that define the planes of view are indicated at the bottom for each case. The full 3D 
density fields for the cases are shown in Fig. 1. The corresponding error fields for density are shown in Appendix F, Fig. F.3.

Table 3 
Compute information of guided/dense message-passing with 𝜆 = {1, 2, 3}. We use SA-GATConv and Mamba-2 for 
 and  , respectively, and use masked projection.

 GMP, 𝜆 = 3  DMP, 𝜆 = 3  GMP, 𝜆 = 2  DMP, 𝜆 = 2  GMP, 𝜆 = 1  DMP, 𝜆 = 1

 Max memory allocated (GB)  37.70  41.80  29.80  32.90  28.30  29.70
 Avg time per epoch (s)  1.63  1.75  1.46  1.54  1.63  1.67
 Total training time (h)  2.26  2.43  2.03  2.14  2.26  2.32

Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 

12 


R. Cassia and R. Kerswell

Fig. 9. Convergence behaviour of guided/dense message-passing with 𝜆 = {1, 2, 3} over the first 500 epochs. We use SA-GATConv and Mamba-2 
for  and  , respectively, and use masked projection.

Table 4 
Recommended setup of GraphGPS based on parameters 
explored in Sections 4.1 and 4.2.
   MP Type 𝝀  Projection
 SA-GATConv  Mamba-2  Guided  2  Masked

and dense message-passing. This suggests that the zeroed representation of unknown nodes in the projected space when using masked 
projection is sufficient, and no pre-processing such as feature propagation is required. This also justifies the use of masked projection, 
which unlike normal projection, is agnostic to the initialization of unknown nodes. For Case 2, one can even observe a degraded 
reconstruction when using unmasked projection, for both guided and dense message-passing. This appears to coincide with the 
unstable jumps in error observed in Fig. 11 for Case 2, which is not encountered for masked, guided message-passing.

Table 4 gives the final recommended setup of GraphGPS based on parameters explored in Sections 4.1 and 4.2, taking into account 
both accuracy and computational expense.

4.3.  Comparison with benchmarks

We compare our GraphGPS architecture with a number of benchmarks, such as graph attention network (GAT), linear transformer, 
autoencoder, and 3D convolutional neural network (3D CNN), as well as classical benchmarks such as Kriging interpolation and gappy 
proper orthogonal decomposition (GPOD). The architectural details of the ML models are outlined in Appendix G. For our GraphGPS 
architecture, we use SA-GATConv and Mamba-2 for   and  , respectively, along with guided message-passing (𝜆 = 3), and 
masked projection. From Fig. 12, we find that our model has superior accuracy compared to the benchmarks for all cases, albeit at an 
increased computational cost as shown in Table 5. All the benchmarks result in over-smoothing of shocks and sliplines. Moreover, the 
transformer and autoencoder models produce jagged contours, as does the 3D CNN but to a lesser extent. The 3D CNN also produces 
noisy artifacts. Kriging interpolation and GPOD also produce jagged contours.

We also compare our architecture with the benchmarks when the sensor/known values are corrupted with noise, at a 25dB 
signal-to-noise ratio (SNR). The results are shown in Appendix E where we observe similar trends, with superior performance of our 
architecture across all cases. All the benchmarks with the exception of GAT also result in noisier reconstructions.

4.4.  Performance gains of physical loss and biased sensor placement

For all experiments explored in this subsection, we use SA-GATConv for   and Mamba-2 for  . We also use guided message 
passing with 𝜆 = 3 and masked projection.

Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 

13 


R. Cassia and R. Kerswell

Fig. 10. Performance of masked/unmasked projection and guided/dense message-passing (𝜆 = 3). We use SA-GATConv and Mamba-2 for  and 
 , respectively. The relative L2 error across all physical channels in 𝐗 is indicated on the top left of each subplot, and the density field is plotted 
with values normalized between [0,1]. The normal vectors that define the planes of view are indicated at the bottom for each case. The full 3D 
density fields for the cases are shown in Fig. 1. The corresponding error fields for density are shown in Appendix F, Fig. F.4.

Fig. 11. Convergence behaviour of masked/unmasked projection and guided/dense message-passing (𝜆 = 3) over the first 500 epochs. We use 
SA-GATConv and Mamba-2 for  and  , respectively.

Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 

14 


R. Cassia and R. Kerswell

Fig. 12. Performance of our architecture against different benchmarks. For our architecture, we use SA-GATConv and Mamba-2 for  and  , 
respectively, along with guided message-passing (𝜆 = 3) and masked projection. The relative L2 error across all physical channels in 𝐗 is indicated 
on the top left of each subplot, and the density field is plotted with values normalized between [0,1]. The normal vectors that define the planes of 
view are indicated at the bottom for each case. The full 3D density fields for the cases are shown in Fig. 1. The corresponding error fields for density 
are shown in Appendix F, Fig. F.5.

Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 

15 


R. Cassia and R. Kerswell

Table 5 
Compute information of our architecture against different ML benchmarks. For our ar-
chitecture, we use SA-GATConv and Mamba-2 for  and  , respectively, along with 
guided message-passing (𝜆 = 3) and masked projection.

 Ours  GAT  Transformer  Autoencoder  3D CNN
 Max memory allocated (GB)  37.70  29.00  9.30  6.20  4.30
 Avg time per epoch (s)  1.63  0.84  1.69  0.58  0.56
 Total training time (h)  2.26  1.16  2.34  0.81  0.77

Fig. 13. Performance when using random/biased sensor locations with/without the use of the physics-based loss. We use SA-GATConv and Mamba-
2 for  and  , respectively, and use masked projection and guided message-passing (𝜆 = 3). The relative L2 error across all physical channels 
in 𝐗 is indicated on the top left of each subplot, and the density field is plotted with values normalized between [0,1]. The normal vectors that 
define the planes of view are indicated at the bottom for each case. The full 3D density fields for the cases are shown in Fig. 1. The corresponding 
error fields for density are shown in Appendix F, Fig. F.6.

Fig. 13 examines the performance of GraphGPS when biasing the sparsely known points towards sharp gradients5, see Appendix D 
for the sensor placements for all cases. As expected, performance gains are noticable for all cases in terms of reconstruction error. 
Through this we also demonstrate how our framework can handle non-uniform/non-homogeneous sensor placements. The biasing 
of sensors means slightly more GraphGPS layers are required for guided-message passing to connect all the nodes with each other, 
resulting in slightly increased computational overhead as per Table 6.

Also from Fig. 13 we examine how GraphGPS performs when trained without the physical loss and using the data loss only (��phy) 
where we see a slight performance dip compared to utilizing the physical loss, as well as reduced sharpness at discontinuities as in 
Cases 2, 4, and 6. The slight performance gain of the physical loss is because it encourages adherence to physical laws and because 
it is evaluated using all points (both known and unknown). By comparison, the data loss is only evaluated using known points only. 
We do however see significantly reduced overhead according to Table 6 due to the elimination of an additional loss evaluation and 
backpropagation, and so the use of just the data loss offers a cheaper alternative where compute is scarce.

5 This of course requires some prior knowledge about the flow.

Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 

16 


R. Cassia and R. Kerswell

Table 6 
Compute information when using random/biased sensor locations 
with/without the use of the physics-based loss. We use SA-GATConv 
and Mamba-2 for  and  , respectively, and use masked projection 
and guided message-passing (𝜆 = 3).

 Random, phy  Biased, phy  Random, ��phy
 Max memory allocated (GB)  37.70  45.08  36.50
 Avg time per epoch (s)  1.63  1.92  0.81
 Total training time (h)  2.26  2.65  1.12

5.  Conclusion

This study explores the use of a GraphGPS framework for the reconstruction of the 3D Riemann problems from sparse data, where 
90% of points are missing. The key feature of GraphGPS is that it combines the benefits of 1) message-passing modules, 2) modules 
that capture global context, and 3) positional encodings. For message-passing, we introduce a modified GAT layer, SA-GATConv, 
that leverages information on shocks and discontinuities in its attention mechanism. We also introduce guided message-passing, 
which sparsifies node aggregation by enforcing information to flow strictly from known nodes only. To make the model agnostic to 
the initialization of unknown nodes we make use of masked projection, which zeroes the representation of unknown nodes in the 
projected space of GraphGPS.

An ablation study on the modules of GraphGPS indicates SA-GATConv to be the superior message-passing module in terms of 
accuracy, enabling sharper reconstructions of shocks and discontinuities. From the ablation study we also find that the Mamba-2 
module is the most efficient way for capturing global context whilst yielding acceptable accuracies. Experiments also show that guided-
message passing reduces the time and memory required for reconstruction compared to (standard) dense message passing, whilst being 
on-par in terms of accuracy. Guided message-passing also exhibits more stable training than dense message-passing based on initial 
convergence behaviour. Furthermore we find that masked projection of the input yields similar accuracies to normal projection, even 
when the value of unknown nodes is pre-processed via feature propagation in the case of normal projection. Additionally, we find 
that our GraphGPS framework outperforms a number of different benchmarks, both ML-based and classical, even when sensor/known 
values are noisy. Finally, we show that the framework can benefit from a more optimal and non-homogeneous sensor placement and 
verify the performance gain of using the physical loss (in addition to the data loss).

The most notable limitation encountered in this study is the relatively large memory footprint of SA-GATConv for message-passing, 
despite its superior accuracy. This is due to the computation of the attention scores when aggregating information from neighbouring 
nodes. Future work should aim to alleviate this memory requirement such that finer resolutions could be explored. One way to 
mitigate this would be to reduce parameter 𝜆 of guided message-passing from 3 to 2. Another possible way would be to make guided 
message-passing act on known-unknown node connections only, which would be an interesting extension.6 Another limitation is 
that we also train a model for each 3D Riemann configuration separately, and it would be interesting to explore whether a model 
can generalize across multiple configurations, given a fixed distribution of known points. This study also assumes that known and 
unknown points/nodes are arranged as a uniform grid, and in a cubic domain. In practice, grids tend to be unstructured/non-uniform 
and domain geometries tend to be complex and so this is therefore another direction to explore. One possible way of adapting the 
GraphGPS framework to unstructured/non-uniform grids and complex geometries would be to implement a pre-processing step that 
projects the unstructured/non-uniform grid (and its values) onto a uniform cubic grid, so that the reconstruction occurs in this simpler 
grid space. One can then implement a post-processing step that projects the output of the framework back to the original grid space.

CRediT authorship contribution statement

Rami Cassia: Writing – original draft, Visualization, Validation, Software, Project administration, Methodology, Investigation, 
Formal analysis, Data curation, Conceptualization; Rich Kerswell: Writing – review & editing, Supervision.

Code and data

Code and data can be accessed via the following link.

Data availability

The following link gives access to the GitHub repository of the paper, which includes code for data generation.

6 This is as opposed to applying guided message-passing on known-unknown connections and known-known connections, which is what we do.

Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 

17 

https://github.com/RamiCassia/GraphGPS-for-Sparse-Reconstruction
https://github.com/RamiCassia/GraphGPS-for-Sparse-Reconstruction


R. Cassia and R. Kerswell

Declaration of competing interest

The authors declare the following financial interests/personal relationships which may be considered as potential competing 
interests:

Rami Cassia reports financial support was provided by Cambridge Commonwealth European and International Trust. If there are 
other authors, they declare that they have no known competing financial interests or personal relationships that could have appeared 
to influence the work reported in this paper. 

Acknowledgments

Rami Cassia acknowledges the financial support provided by the Cambridge Commonwealth, European and International Trust. 

Appendix A.  HLLC flux estimation

An outline of the HLLC algorithm is presented in this section. Fig. A.1 illustrates the three-wave model assumed by the HLLC 
solver. In order to determine the HLLC flux normal to a cell interface for the 3D Euler equations, it is sufficient to consider a 𝑥-split 
version of the 3D Euler equations because of rotational invariance [42]:

𝜕𝑡

⎛

⎜

⎜

⎜

⎜

⎜

⎝

𝜌
𝜌𝑢
𝜌𝑣
𝜌𝑤
𝐸

⎞

⎟

⎟

⎟

⎟

⎟

⎠

+ 𝜕𝑥

⎛

⎜

⎜

⎜

⎜

⎜

⎝

𝜌𝑢
𝜌𝑢2 + 𝑝
𝜌𝑢𝑣
𝜌𝑢𝑤

𝑢(𝐸 + 𝑝)

⎞

⎟

⎟

⎟

⎟

⎟

⎠

= 𝟎 (A.1)

We then estimate wave speeds 𝑠𝑙 and 𝑠𝑟 as Toro[43]:
𝑠𝑙 = �̃� − �̃�, 𝑠𝑟 = �̃� + �̃�, (A.2)

where the Roe-average velocity and sound speed, �̃� and �̃�, are defined as:

�̃� =

√

𝜌𝑙𝑢𝑙 +
√

𝜌𝑟𝑢𝑟
√

𝜌𝑙 +
√

𝜌𝑟
, �̃� =

√

(𝛾 − 1)
(

�̃� − 1
2
�̃�2
)

, (A.3)

and the Roe-average enthalpy �̃� is defined as:

�̃� =

√

𝜌𝑙𝐻𝑙 +
√

𝜌𝑟𝐻𝑟
√

𝜌𝑙 +
√

𝜌𝑟
, 𝐻 =

𝐸 + 𝑝
𝜌

. (A.4)

We then use 𝑠𝑙 and 𝑠𝑟 to compute intermediate speed 𝑠∗ [42]:

𝑠∗ =
𝑝𝑟 − 𝑝𝑙 + 𝜌𝑙𝑢𝑙(𝑠𝑙 − 𝑢𝑙) − 𝜌𝑟𝑢𝑟(𝑠𝑟 − 𝑢𝑟)

𝜌𝑙(𝑠𝑙 − 𝑢𝑙) − 𝜌𝑟(𝑠𝑟 − 𝑢𝑟)
. (A.5)

The intermediate conservative vectors 𝐔∗𝑙 and 𝐔∗𝑟 are then computed using 𝑠∗, 𝑠𝑙 and 𝑠𝑟 as [42]:

𝐔∗𝜂 = 𝜌𝜂

( 𝑠𝜂 − 𝑢𝜂
𝑠𝜂 − 𝑠∗

)

⎛

⎜

⎜

⎜

⎜

⎜

⎜

⎝

1
𝑠∗
𝑣𝜂
𝑤𝜂

𝐸𝜂
𝜌𝜂

+ (𝑠∗ − 𝑢𝜂)
[

𝑠∗ +
𝑝𝜂

𝜌𝜂 (𝑠𝜂−𝑢𝜂 )

]

⎞

⎟

⎟

⎟

⎟

⎟

⎟

⎠

, (A.6)

where 𝜂 = {𝑙, 𝑟}. Finally, the intermediate flux vectors 𝐅∗𝑙 and 𝐅∗𝑟 are computed from the state vectors so that the flux 𝐅𝑛
𝑖+ 1

2

 is computed 
using Eq. (A.7).

𝐅∗𝜂 = 𝐅𝜂 + 𝑠𝜂(𝐔∗𝜂 − 𝐔𝜂), 𝐅𝑡
𝑖+ 1

2
=

⎧

⎪

⎪

⎨

⎪

⎪

⎩

𝐅𝑙 0 ≤ 𝑠𝑙 ,
𝐅∗𝑙 𝑠𝑙 ≤ 0 ≤ 𝑠∗,
𝐅∗𝑟 𝑠∗ ≤ 0 ≤ 𝑠𝑟,
𝐅𝑟 0 ≥ 𝑠𝑟.

(A.7)

Due to rotational invariance, an analogous procedure allows us to determine the 𝑦-flux 𝐆𝑡
𝑗+ 1

2

 and the 𝑧-flux 𝐇𝑡
𝑘+ 1

2

.

Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 

18 


R. Cassia and R. Kerswell

Fig. A.1. The three-wave model assumed by the HLLC solver. The left and right characteristic lines correspond to the fastest and slowest signals, 
𝑠𝑙 and 𝑠𝑟, emerging from the cell interface at 𝑥 = 0. The middle characteristic corresponds to the wave of speed 𝑠∗ which accounts for contact and 
shear waves.

Appendix B.  WENO reconstruction

The weighted essentially non-oscillatory (WENO) scheme is a method of reconstructing cell averaged values to cell interfaces at 
a higher order of accuracy. In this section we briefly outline the 5th-order WENO scheme used in generating the ground truth �̄�. 
We apply the WENO scheme on the conservative variables 𝐔 for each spatial dimension separately. As such we demonstrate the 
reconstruction step for the 𝑥-dimension only. The procedure is analogous for the 𝑦- and 𝑧-dimensions.

The first step is to find candidate stencil approximations. For the 5th-order scheme these are:

α0 =
1
3
𝐔𝑖−2 −

7
6
𝐔𝑖−1 −

11
6
𝐔𝑖 (B.1)

α1 = −1
6
𝐔𝑖−1 +

5
6
𝐔𝑖 +

1
3
𝐔𝑖+1 (B.2)

α2 =
1
3
𝐔𝑖 +

5
6
𝐔𝑖+1 −

1
6
𝐔𝑖+2 (B.3)

We then obtain the smoothness indicators β as:

β0 =
13
12

(

𝐔𝑖−2 − 2𝐔𝑖−1 + 𝐔𝑖
)2 + 1

4
(

𝐔𝑖−2 − 4𝐔𝑖−1 + 3𝐔𝑖
)2 (B.4)

β1 =
13
12

(

𝐔𝑖−1 − 2𝐔𝑖 + 𝐔𝑖+1
)2 + 1

4
(

𝐔𝑖−1 − 𝐔𝑖+1
)2 (B.5)

β2 =
13
12

(

𝐔𝑖 − 2𝐔𝑖+1 + 𝐔𝑖+2
)2 + 1

4
(

3𝐔𝑖 − 4𝐔𝑖+1 + 𝐔𝑖+2
)2 (B.6)

We then find the non-linear weights ω:

ω𝑛 =
ζ𝑛

ζ0 + ζ1 + ζ2
, ζ𝑛 =

𝑑𝑛
(

𝜖 + β𝑛
)2

,
(

𝑑0, 𝑑1, 𝑑2
)

=
( 1
10

, 3
5
, 3
10

)

, 𝑛 ∈ {0, 1, 2}, (B.7)

where 𝜖 is a small number to prevent division by 0. The 5th-order approximation at the cell interface 𝐔WENO
𝑖+ 1

2

 can then be obtained:

𝐔WENO
𝑖+ 1

2

= ω0 ⊙α0 +ω1 ⊙α1 +ω2 ⊙α2. (B.8)

Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 

19 


R. Cassia and R. Kerswell

Appendix C.  Initialization values for 3D Riemann problems

Initialization values are taken from Balsara[44] and Hoppe et al. [38] (Tables C.1, C.2, C.3, C.4, C.5, C.6, C.7).

Table C.1 
Octant definitions for domain [0, 1]3.

 Octant  x  y  z
 1 > 0.5 > 0.5 > 0.5
 2 ≤ 0.5 > 0.5 > 0.5
 3 ≤ 0.5 ≤ 0.5 > 0.5
 4 > 0.5 ≤ 0.5 > 0.5
 5 > 0.5 > 0.5 ≤ 0.5
 6 ≤ 0.5 > 0.5 ≤ 0.5
 7 ≤ 0.5 ≤ 0.5 ≤ 0.5
 8 > 0.5 ≤ 0.5 ≤ 0.5

Table C.2 
Case 1, 𝐽21𝐽32𝑆−

34𝑆
+
41𝐽51𝑆

+
62𝐽73𝑆

−
84𝑅

−
65𝑆

−
76𝐽78𝐽85.

 Octant 𝜌 𝑢 𝑣 𝑤 𝑝

 1  0.6223  0.1000 −0.6259 −0.1000  0.4000
 2  0.5313  0.1000  0.1000 −0.8276  0.4000
 3  0.6223  0.8259  0.1000 −0.1000  0.4000
 4  1.2219  0.1000  0.1000 −0.1000  1.0683
 5  0.5197  0.8259  0.1000 −0.1000  0.4000
 6  1.0000  0.1000  0.1000 −0.1000  1.0000
 7  0.5313  0.1000  0.8276 −0.1000  0.4000
 8  0.6223  0.1000  0.1000 −0.6259  0.4000

Table C.3 
Case 2, 𝑆+

21𝑆
−
32𝑆

−
34𝑆

+
41𝑆

+
51𝑆

−
62𝑆

+
73𝑆

−
84𝑆

−
65𝑆

+
76𝑆

+
78𝑆

−
85.

 Octant 𝜌 𝑢 𝑣 𝑤 𝑝

 1  0.5065  0.0000  0.0000  0.0000  0.3500
 2  1.1000  0.8939  0.0000  0.0000  1.1000
 3  0.5065  0.8939  0.8939  0.0000  0.3500
 4  1.1000  0.0000  0.8939  0.0000  1.1000
 5  1.1000  0.0000  0.0000  0.8939  1.1000
 6  0.5065  0.8939  0.0000  0.8939  0.3500
 7  1.1000  0.8939  0.8939  0.8939  1.1000
 8  0.5065  0.0000  0.8939  0.8939  0.3500

Table C.4 
Case 3, 𝑅+

21𝑅
−
32𝑅

−
34𝑅

+
41𝑆

−
51𝑆

+
62𝑆

−
73𝑆

+
84𝑅

−
65𝑅

+
76𝑅

+
78𝑅

−
85.

 Octant 𝜌 𝑢 𝑣 𝑤 𝑝

 1  1.0000  0.0000  0.0000 −0.7881  1.0000
 2  0.5000 −0.7658  0.0000 −0.7881  0.3789
 3  1.0000 −0.7658 −0.7658 −0.7881  1.0000
 4  0.5000  0.0000 −0.7658 −0.7881  0.3789
 5  0.5000  0.0000  0.0000  0.0000  0.3789
 6  1.0000 −0.7658  0.0000  0.0000  1.0000
 7  0.5000 −0.7658 −0.7658  0.0000  0.3789
 8  1.0000  0.0000 −0.7658  0.0000  1.0000

Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 

20 


R. Cassia and R. Kerswell

Table C.5 
Case 4, 𝐽21𝐽32𝐽34𝐽41𝑆−

51𝑅
+
62𝑆

−
73𝑅

+
84𝐽65𝐽76𝐽78𝐽85.

 Octant 𝜌 𝑢 𝑣 𝑤 𝑝

 1  1.5000  0.1000 −0.1500 −0.2000  1.0000
 2  1.0000  0.1000  0.1500 −0.3000  1.0000
 3  1.5000 −0.1000  0.1500 −0.5000  1.0000
 4  1.0000 −0.1000 −0.1500  1.2000  1.0000
 5  0.7969  0.1000 −0.1500  0.3941  0.4000
 6  0.5197  0.1000  0.1500 −1.0259  0.4000
 7  0.7969 −0.1000  0.1500  0.0941  0.4000
 8  0.5197 −0.1000 −0.1500  0.4741  0.4000

Table C.6 
Case 5, 𝐽21𝐽32𝐽34𝐽41𝐽51𝐽62𝐽73𝐽84𝐽65𝐽76𝐽78𝐽85.
 Octant 𝜌 𝑢 𝑣 𝑤 𝑝

 1  0.5000 −0.2500 −0.5000 −0.5000  1.0000
 2  2.0000 −0.2500  0.5000 −0.2500  1.0000
 3  0.5000  0.2500  0.5000  0.2500  1.0000
 4  1.0000  0.2500 −0.5000 −0.2500  1.0000
 5  1.0000  0.2500 −0.2500 −0.5000  1.0000
 6  0.5000  0.2500  0.2500 −0.2500  1.0000
 7  2.0000 −0.2500  0.2500  0.2500  1.0000
 8  0.5000 −0.2500 −0.2500 −0.2500  1.0000

Table C.7 
Case 6, 𝑆+

21𝐽32𝐽34𝑆
+
41𝑆

+
51𝐽62𝑆

+
73𝐽84𝐽65𝑆

+
76𝑆

+
78𝐽85.

 Octant 𝜌 𝑢 𝑣 𝑤 𝑝

 1  0.5313  0.0000  0.0000  0.0000  0.4000
 2  1.0000  0.7276  0.0000  0.0000  1.0000
 3  0.8000  0.0000  0.0000 −0.7276  1.0000
 4  1.0000  0.0000  0.7276  0.0000  1.0000
 5  1.0000  0.0000  0.0000  0.7276  1.0000
 6  0.8000  0.0000 −0.7276  0.0000  1.0000
 7  1.0162 −0.4014 −0.4014 −0.4014  1.4000
 8  0.8000 −0.7276  0.0000  0.0000  1.0000

Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 

21 


R. Cassia and R. Kerswell

Appendix D.  Sensor placement

Figs. D.1 and D.2 show the distribution of known points (sensor locations) on the faces of the Riemann cube for Cases 1–6.

Fig. D.1. Random and biased distribution of known points (sensor locations) on the faces of the Riemann cube for Cases 1–3. The density fields are 
shown.

Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 

22 


R. Cassia and R. Kerswell

Fig. D.2. Random and biased distribution of known points (sensor locations) on the faces of the Riemann cube for Cases 4–6. The density fields are 
shown.

Appendix E.  Comparison with benchmarks for noisy sensors

Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 

23 


R. Cassia and R. Kerswell

Fig. E.1. Performance of our architecture against different benchmarks with noise added to the values at known points (SNR = 25 dB). For our 
architecture, we use SA-GATConv and Mamba-2 for  and  , respectively, along with guided message-passing (𝜆 = 3) and masked projection. 
The relative L2 error across all physical channels in 𝐗 is indicated on the top left of each subplot, and the density field is plotted with values 
normalized between [0,1]. The normal vectors that define the planes of view are indicated at the bottom for each case. The full 3D density fields 
for the cases are shown in Fig. 1. The corresponding error fields for density are shown in Appendix F, Fig. F.7. To get a sense of the level of noise 
added, see Figs. E.2 and E.3.

Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 

24 


R. Cassia and R. Kerswell

Fig. E.2. Random distribution of known points (sensor locations) on the faces of the Riemann cube, corrupted with noise (SNR = 25 dB), for Cases 
1–3. The density fields are shown.

Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 

25 


R. Cassia and R. Kerswell

Fig. E.3. Random distribution of known points (sensor locations) on the faces of the Riemann cube, corrupted with noise (SNR = 25 dB), for Cases 
4–6. The density fields are shown.

Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 

26 


R. Cassia and R. Kerswell

Appendix F.  Error fields

Fig. F.1. Density error fields corresponding to Fig. 6.

Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 

27 


R. Cassia and R. Kerswell

Fig. F.2. Density error fields corresponding to Fig. 7.

Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 

28 


R. Cassia and R. Kerswell

Fig. F.3. Density error fields corresponding to Fig. 8.

Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 

29 


R. Cassia and R. Kerswell

Fig. F.4. Density error fields corresponding to Fig. 10.

Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 

30 


R. Cassia and R. Kerswell

Fig. F.5. Density error fields corresponding to Fig. 12.

Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 

31 


R. Cassia and R. Kerswell

Fig. F.6. Density error fields corresponding to Fig. 13.

Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 

32 


R. Cassia and R. Kerswell

Fig. F.7. Density error fields corresponding to Fig. E.1.

Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 

33 


R. Cassia and R. Kerswell

Appendix G.  Benchmark ML models

Tables G.1, G.2, G.3 and G.4 summarise the architectural details of the ML benchmarks.

Table G.1 
Architectural details of 3D CNN. For all Conv3D layers the kernel size 
is 3, the stride and padding are 1, and replicative padding is used.

 Layer no.  Layer type  Input channels  Output channels
 1  Conv3D+ReLU  5  16
 2  Conv3D+ReLU  16  32
 3  Conv3D+ReLU  32  64
 4–9  Conv3D+ReLU  64  64
 10  Conv3D+ReLU  64  32
 11  Conv3D+ReLU  32  16
 12  Conv3D  16  5

Table G.2 
Architectural details of GAT. For all GATConv layers the number of attention 
heads is 1.

 Layer no.  Layer type  Input channels  Output channels
 1  GATConv+ReLU  10  64
 2–9  GATConv+ReLU  64  64
 10  GATConv  64  10

Table G.3 
Architectural details of autoencoder.
 Layer no.  Layer type  Input channels  Output channels
 1  Linear+ReLU  10  512
 2  Linear+ReLU  512  256
 3  Linear+ReLU  256  128
 4  Linear+ReLU  128  64
 5  Linear+ReLU  64  32
 6  Linear+ReLU  32  64
 7  Linear+ReLU  64  128
 8  Linear+ReLU  128  256
 9  Linear+ReLU  256  512
 10  Linear  512  10

Table G.4 
Architectural details of linear transformer. The number of heads for transformer 
layers is 16.

 Layer no.  Layer type  Input channels  Output channels
 1  Linear  10  64
 2–6  MLP+Transformer+MLP  64  64
 7  Linear+ReLU  64  32
 8  Linear+ReLU  32  16
 9  Linear  16  10

Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 

34 


R. Cassia and R. Kerswell

References

[1] F. Scarselli, M. Gori, A.C. Tsoi, M. Hagenbuchner, G. Monfardini, The graph neural network model, IEEE Trans. Neural Netw. 20 (1) (2008) 61–80. 
[2] T.N. Kipf, M. Welling, Semi-supervised classification with graph convolutional networks, (2016). arXiv preprint arXiv:1609.02907
[3] P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Lio, Y. Bengio, Graph attention networks, (2017). arXiv preprint arXiv:1710.10903
[4] W. Hamilton, Z. Ying, J. Leskovec, Inductive representation learning on large graphs, Adv. Neural Inf. Process. Syst. 30 (2017) 1–19. 
[5] K. Xu, W. Hu, J. Leskovec, S. Jegelka, How powerful are graph neural networks? (2018). arXiv preprint arXiv:1810.00826
[6] D. Bahdanau, K. Cho, Y. Bengio, Neural machine translation by jointly learning to align and translate, (2014). arXiv preprint arXiv:1409.0473
[7] M.-T. Luong, H. Pham, C. Manning, Effective approaches to attention-based neural machine translation, (2015). arXiv preprint arXiv:1508.04025
[8] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. Gomez, L. Kaiser, I. Polosukhin, Attention is all you need, Adv. Neural Inf. Process. Syst. 30 (2017) 

1–11. 
[9] A. Radford, K. Narasimhan, T. Salimans, I. Sutskever, Improving language understanding by generative pre-training (2018). 
[10] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: pre-training of deep bidirectional transformers for language understanding, in: Proceedings of naacL-HLT, 

Minneapolis, Minnesota, 1, 2019. 
[11] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, N. Houlsby, An 

image is worth 16x16 words: Transformers for image recognition at scale, (2020). arXiv preprint arXiv:2010.11929
[12] A. Katharopoulos, A. Vyas, N. Pappas, F. Fleuret, Transformers are RNNs: fast autoregressive transformers with linear attention, in: International Conference on 

Machine Learning, PMLR, 2020, pp. 5156–5165. 
[13] K. Choromanski, V. Likhosherstov, D. Dohan, X. Song, A. Gane, T. Sarlos, P. Hawkins, J. Davis, A. Mohiuddin, L. Kaiser, et al., Rethinking attention with 

performers, (2020). arXiv preprint arXiv:2009.14794
[14] A. Gu, T. Dao, Mamba: Linear-time sequence modeling with selective state spaces, (2023). arXiv preprint arXiv:2312.00752
[15] T. Dao, A. Gu, Transformers are SSMs: Generalized models and efficient algorithms through structured state space duality, (2024). arXiv preprint 

arXiv:2405.21060
[16] V.P. Dwivedi, X. Bresson, A generalization of transformer networks to graphs, (2020). arXiv preprint arXiv:2012.09699
[17] D. Kreuzer, D. Beaini, W. Hamilton, V. Létourneau, P. Tossou, Rethinking graph transformers with spectral attention, Adv. Neural Inf. Process. Syst. 34 (2021) 

21618–21629. 
[18] C. Ying, T. Cai, S. Luo, S. Zheng, G. Ke, D. He, Y. Shen, T.-Y. Liu, Do transformers really perform badly for graph representation? Adv. Neural Inf. Process. Syst. 

34 (2021) 28877–28888. 
[19] A. Behrouz, F. Hashemi, Graph mamba: towards learning on graphs with state space models, in: Proceedings of the 30th ACM SIGKDD Conference on Knowledge 

Discovery and Data Mining, 2024, pp. 119–130. 
[20] C. Wang, O. Tsepa, J. Ma, B. Wang, Graph-mamba: Towards long-range graph sequence modeling with selective state spaces, (2024). arXiv preprint 

arXiv:2402.00789
[21] L. Rampášek, M. Galkin, V.P. Dwivedi, A.T. Luu, G. Wolf, D. Beaini, Recipe for a general, powerful, scalable graph transformer, Adv. Neural Inf. Process. Syst. 

35 (2022) 14501–14515. 
[22] E. Rossi, H. Kenlay, M.I. Gorinova, B.P. Chamberlain, X. Dong, M.M. Bronstein, On the unreasonable effectiveness of feature propagation in learning on graphs 

with missing node features, in: Proceedings of the First Learning on Graphs Conference, PMLR, 2022. 
[23] R. Everson, L. Sirovich, Karhunen–Loeve procedure for gappy data, JOSA A 12 (8) (1995) 1657–1664. 
[24] T. Bui-Thanh, M. Damodaran, K. Willcox, Aerodynamic data reconstruction and inverse design using proper orthogonal decomposition, AIAA J. 42 (8) (2004) 

1505–1516. 
[25] A. Vendl, H. Faßbender, Proper orthogonal decomposition for steady aerodynamic applications, PAMM 10 (1) (2010) 635–636. 
[26] H. Huang, Q. Wang, Z. Wu, A surrogate-based flow-field prediction and optimization strategy for hypersonic thrust nozzle, AIP Adv. 14 (12) (2024) 1–21. 
[27] G. Duthé, I. Abdallah, S. Barber, E. Chatzi, Graph neural networks for aerodynamic flow reconstruction from sparse sensing, (2023). arXiv preprint 

arXiv:2301.03228
[28] S. Xu, Z. Sun, R. Huang, D. Guo, G. Yang, S. Ju, A practical approach to flow field reconstruction with sparse or incomplete data through physics informed neural 

network, Acta Mech. Sin. 39 (3) (2023) 322302. 
[29] Y. Kim, Y. Choi, B. Yoo, Gappy AE: a nonlinear approach for gappy data reconstruction using auto-encoder, Comput. Methods Appl. Mech. Eng. 426 (2024) 

116978. 
[30] J.-Z. Peng, Z.-Q. Wang, X. Rong, M. Mei, M. Wang, Y. He, W.-T. Wu, Rapid and sparse reconstruction of high-speed steady-state and transient compressible flow 

fields using physics-informed graph neural networks, Phys. Fluids 36 (4) (2024) 1–19. 
[31] M.Y. Hosseini, Y. Shiri, Flow field reconstruction from sparse sensor measurements with physics-informed neural networks, Phys. Fluids 36 (7) (2024) 1–15. 
[32] B.A. Danciu, V.A. Pagone, B. Böhm, M. Schmidt, C.E. Frouzakis, Flow reconstruction in time-varying geometries using graph neural networks, (2024). arXiv 

preprint arXiv:2411.08764
[33] M. Quattromini, M.A. Bucci, S. Cherubini, O. Semeraro, Graph Neural Networks and Differential Equations: A hybrid approach for data assimilation of fluid 

flows, (2024). arXiv preprint arXiv:2411.09476
[34] P.C.H. Nguyen, J.B. Choi, Q.-T. Luu, FLRNet: A Deep Learning Method for Regressive Reconstruction of Flow Field From Limited Sensor Measurements, (2024). 

arXiv preprint arXiv:2411.13815
[35] H.V. Dang, J.B. Choi, P.C.H. Nguyen, FLRONet: Deep Operator Learning for High-Fidelity Fluid Flow Field Reconstruction from Sparse Sensor Measurements, 

(2024). arXiv preprint arXiv:2412.08009
[36] P. Batten, N. Clarke, C. Lambert, D.M. Causon, On the choice of wavespeeds for the HLLC Riemann solver, SIAM J. Sci. Comput. 18 (6) (1997) 1553–1570. 
[37] C.W. Schulz-Rinne, Classification of the Riemann problem for two-dimensional gas dynamics, SIAM J. Math. Anal. 24 (1) (1993) 76–88. 
[38] N. Hoppe, N. Fleischmann, B. Biller, S. Adami, N.A. Adams, A systematic analysis of three-dimensional Riemann problems for verification of compressible-flow 

solvers, Comput. Fluids 278 (2024) 106298. 
[39] V.P. Dwivedi, C.K. Joshi, A.T. Luu, T. Laurent, Y. Bengio, X. Bresson, Benchmarking graph neural networks, J. Mach. Learn. Res. 24 (43) (2023) 1–48. 
[40] H. Shirzad, A. Velingker, B. Venkatachalam, D.J. Sutherland, A.K. Sinop, Exphormer: sparse transformers for graphs, in: International Conference on Machine 

Learning, PMLR, 2023, pp. 31613–31632. 
[41] R. Cassia, R. Kerswell, Godunov loss functions for modelling of hyperbolic conservation laws, Comput. Methods Appl. Mech. Eng. 437 (2025) 117782. 
[42] E.F. Toro, The HLLC Riemann solver, Shock Waves 29 (8) (2019) 1065–1082. 
[43] E.F. Toro, Riemann Solvers and Numerical Methods for Fluid Dynamics: A Practical Introduction, Springer Science & Business Media, 2013. 
[44] D.S. Balsara, Three dimensional HLL Riemann solver for conservation laws on structured meshes; application to Euler and magnetohydrodynamic flows, J. 

Comput. Phys. 295 (2015) 1–23. 

Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 

35 

http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0001
http://arxiv.org/abs/1609.02907
http://arxiv.org/abs/1710.10903
http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0002
http://arxiv.org/abs/1810.00826
http://arxiv.org/abs/1409.0473
http://arxiv.org/abs/1508.04025
http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0003
http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0003
http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0004
http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0005
http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0005
http://arxiv.org/abs/2010.11929
http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0006
http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0006
http://arxiv.org/abs/2009.14794
http://arxiv.org/abs/2312.00752
http://arxiv.org/abs/2405.21060
http://arxiv.org/abs/2405.21060
http://arxiv.org/abs/2012.09699
http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0007
http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0007
http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0008
http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0008
http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0009
http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0009
http://arxiv.org/abs/2402.00789
http://arxiv.org/abs/2402.00789
http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0010
http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0010
http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0011
http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0011
http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0012
http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0013
http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0013
http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0014
http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0015
http://arxiv.org/abs/2301.03228
http://arxiv.org/abs/2301.03228
http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0016
http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0016
http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0017
http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0017
http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0018
http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0018
http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0019
http://arxiv.org/abs/2411.08764
http://arxiv.org/abs/2411.08764
http://arxiv.org/abs/2411.09476
http://arxiv.org/abs/2411.13815
http://arxiv.org/abs/2412.08009
http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0020
http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0021
http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0022
http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0022
http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0023
http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0024
http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0024
http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0025
http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0026
http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0027
http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0028
http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0028

	A physics-augmented GraphGPS framework for the reconstruction of 3D Riemann problems from sparse data 
	1 Introduction
	2 Prior work on flow reconstruction
	3 Methodology
	3.1 Problem definition
	3.2 3D Riemann problems and characteristic waves
	3.3 GraphGPS framework
	3.3.1 Guided message-passing
	3.3.2 Masked projection and positional encoding
	3.3.3  Candidates for FMP
	3.3.4  Candidates for FGC

	3.4 Multi-loss function
	3.5 Handling of boundary conditions

	4 Experiments
	4.1 Ablation study
	4.2 Guided message-passing vs. dense message-passing
	4.3 Comparison with benchmarks
	4.4 Performance gains of physical loss and biased sensor placement

	5 Conclusion
	A HLLC flux estimation
	B WENO reconstruction
	C Initialization values for 3D Riemann problems
	D Sensor placement
	E Comparison with benchmarks for noisy sensors
	F Error fields
	G Benchmark ML models