Contents lists available at ScienceDirect Comput. Methods Appl. Mech. Engrg. journal homepage: www.elsevier.com/locate/cma A physics-augmented GraphGPS framework for the reconstruction of 3D Riemann problems from sparse data Rami Cassia ∗, Rich Kerswell Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Wilberforce Rd, Cambridge, CB3 0WA, UK a r t i c l e i n f o Keywords: Machine learning Attention Graphs Fluid dynamics Shocks Riemann problems a b s t r a c t In compressible fluid flow, reconstructing shocks, discontinuities, rarefactions, and their inter- actions from sparse measurements is an important inverse problem with practical applications. Moreover, physics-informed machine learning has recently become an increasingly popular ap- proach for performing reconstructions tasks. In this work we explore a machine learning recipe, known as GraphGPS, for reconstructing canonical compressible flows known as 3D Riemann prob- lems from sparse observations, in a physics-informed manner. The GraphGPS framework com- bines the benefits of positional encodings, local message-passing of graphs, and global contextual awareness, and we explore the latter two components through an ablation study. Furthermore, we modify the aggregation step of message-passing such that it is aware of shocks and discontinuities, resulting in sharper reconstructions of these features. Additionally, we modify message-passing such that information flows strictly from known nodes only, which results in computational sav- ings, better training convergence, and no degradation of reconstruction accuracy. We also show that the GraphGPS framework outperforms numerous machine learning and classical benchmarks. 1. Introduction A common inverse problem is the reconstruction of a set of variables from sparse data. Applications of this inverse problem span a range of fields such as imaging, audio and speech processing, medicine, structural engineering, seismology, finance, astronomy, and fluid dynamics. Particular applications in fluid dynamics include climate modeling, oceanography, and aerospace engineering. In aerospace engineering, for example, it is common practice to mount pressure sensors on a wing during wind-tunnel testing in order to reconstruct the surrounding flow-field. In this paper we are interested in the sparse reconstruction of the 3D Riemann problems, which are a set of problems that investigate the interaction of elementary waves (shocks, rarefactions, contact discontinuities) as governed by the compressible Euler equations. We are specifically interested in performing the reconstruction using a physics-informed machine learning (ML) model that is a) graph-based and b) contextually-aware. A model that is graph-based can leverage data directionality, which we hypothesize is beneficial to reconstruction where information should flow from known to unknown points. A model that is contextually-aware is capable of capturing long-range dependencies in the data, which is crucial in flows where there are long-range correlations due to the presence of shocks, contact discontinuities and rarefactions. As such we briefly review graph-based and contextually-aware models. Graph-based models are those that operate on graphs comprising of vertices (or nodes) and edges. Graph models typically involve message-passing, which refers to the nodal aggregation of messages or information from neighbours. A key property of ∗ Corresponding author. E-mail addresses: rgc38@cam.ac.uk (R. Cassia), rrk26@cam.ac.uk (R. Kerswell). https://doi.org/10.1016/j.cma.2025.118328 Received 27 May 2025; Received in revised form 11 August 2025; Accepted 15 August 2025 Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 Available online 2 September 2025 0045-7825/© 2025 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license ( http://creativecommons.org/licenses/by/4.0/ ). https://www.elsevier.com/locate/cma https://www.elsevier.com/locate/cma https://orcid.org/0009-0002-4643-2717 $\mathcal {F}_{\Theta }: \left ( \mathbf {X} \in \mathbb {R}^{N_{xyz} \times N_{tc}}, \mathbf {m} \in \left \{0,1\right \}^{N_{xyz}} \right ) \longrightarrow \mathbf {\hat {X}} \in \mathbb {R}^{N_{xyz} \times N_{tc}}$ $\epsilon \left (\mathbf {\hat {X}}, \mathbf {\bar {X}}\right )$ $\mathbf {X}$ $\mathbf {\hat {X}}$ $\mathbf {\bar {X}}$ $\mathbf {m}$ $\mathcal {F}_{\Theta }$ $\Theta $ $N_{xyz} = N_{x}N_{y}N_{z}$ $N_{tc} = N_{t}N_{c}$ $N_t$ $N_c$ $\epsilon $ $\mathbf {\hat {X}}$ $\mathbf {\bar {X}}$ $N_c = 5$ \begin {align}&\partial _{t}\mathbf {U} + \partial _{x}\mathbf {F}\left (\mathbf {U}\right ) + \partial _{y}\mathbf {G}\left (\mathbf {U}\right ) + \partial _{z}\mathbf {H}\left (\mathbf {U}\right ) = \mathbf {0},\\[6pt] &{\partial _t} \begin {pmatrix} \rho \\ \rho u \\ \rho v \\ \rho w \\ E \\ \end {pmatrix} + {\partial _x} \begin {pmatrix} \rho u \\ \rho u^{2} + p \\ \rho u v \\ \rho uw \\ u\left (E + p\right ) \\ \end {pmatrix} + {\partial _y} \begin {pmatrix} \rho v \\ \rho u v \\ \rho v^{2} + p \\ \rho vw \\ v\left (E + p\right ) \\ \end {pmatrix} + {\partial _z} \begin {pmatrix} \rho w \\ \rho uw \\ \rho vw \\ \rho w^{2} + p \\ w\left (E + p\right ) \\ \end {pmatrix} = \mathbf {0}, \\[6pt] &E = \frac {1}{2}\rho \left (u^2 + v^2 + w^2\right ) + \frac {p}{\left (\gamma - 1\right )},\end {align} $\rho $ $u$ $v$ $w$ $p$ $E$ $\gamma $ $x$ $y$ $z$ $\partial _{\mathbf {n}} \mathbf {U} = \mathbf {0}$ $\mathbf {X}$ \begin {equation}\mathbf {X} = \left ( \rho , u, v,w, E - \frac {1}{2}\rho (u^{2} + v^{2} + w^{2}) \right )^T, \label {Xeqn1-4}\end {equation} $\mathcal {T}$ $\mathcal {F}_{\Theta }$ \begin {equation}\mathcal {T}: \left ( \rho , u, v,w, E - \frac {1}{2}\rho \left (u^{2} + v^{2} + w^{2}\right ) \right )^T \longrightarrow \left ( \left |{\rho }\right |, u, v,w, \left |E - \frac {1}{2}\rho \left (u^{2} + v^{2} + w^{2}\right )\right |\right )^T. \label {Xeqn2-5}\end {equation} $\mathbf {\bar {X}}$ $\mathbf {m}$ $m_i$ $\mathbf {m}$ \begin {equation}m_{i} = \begin {cases} 0 & \text {if} \,\,\,\,\, a_{i} < 0.9, \\ 1 & \text {if} \,\,\,\,\, a_{i} \geq 0.9. \end {cases}, \,\,\,\,\,\,\,\,\,\, i \in [1, N_{xyz}], \,\,\,\,\,\,\,\,\,\, a_{i} \sim \mathcal {U}\left (0,1\right ). \label {Xeqn3-6}\end {equation} $\mathbf {X} = \mathbf {m}\odot \mathbf {\bar {X}}$ $\odot $ \begin {align}\epsilon = \frac {\norm { \mathbf {\hat {X}} - \mathbf {\bar {X}} }_{2}}{\norm {\mathbf {\bar {X}}}_{2}}.\end {align} $\left [0,1\right ]^{3}$ $\left (\rho , u, v,w, p\right )^{T}$ $x>0.5, y>0.5, z>0.5$ $x>0.5, y>0.5, z<0.5$ $R$ $S$ $J$ $lr = \{21, 32, 34, 41, 51, 62, 73, 84, 65, 76, 78,85\}$ $\omega $ $\bm {\omega ^{'}}$ ${R}_{lr}^{\pm }$ \begin {align}& \omega _{l} - \omega _{r} = \pm \frac {2\sqrt {\gamma }}{\gamma - 1}\left ( \sqrt {\frac {p_l}{\rho _l}} - \sqrt {\frac {p_r}{\rho _r}} \right ),\,\,\,\,\,\,\,\,\frac {\rho _l}{\rho _r} = \left (\frac {p_l}{p_r}\right )^{\frac {1}{\gamma }},\\[6pt] & \bm {\omega ^{'}}_{l} = \bm {\omega ^{'}}_{r}, \,\,\,\,\,\,\,\, \omega _l < \omega _r, \,\,\,\,\,\,\,\,sgn\left ( p_{l} - p_{r}\right ) = \mp 1.\end {align} ${S}_{lr}^{\pm }$ \begin {align}&\omega _{l} - \omega _{r} = \sqrt {\frac {\left (p_{l} - p_{r}\right )\left (\rho _{l} - \rho _{r}\right )}{\rho _l \rho _r}},\,\,\,\,\,\,\,\, \frac {\rho _{l}}{\rho _{r}} = \frac {\gamma \left (p_{l} + p_{r}\right ) + \left (p_{l} - p_{r}\right )}{\gamma \left (p_{l} + p_{r}\right ) - \left (p_{l} - p_{r}\right )},\\[6pt] & \bm {\omega ^{'}}_{l} = \bm {\omega ^{'}}_{r},\,\,\,\,\,\,\,\, \omega _l > \omega _r,\,\,\,\,\,\,\,\, sgn\left ( p_{l} - p_{r}\right ) = \pm 1.\end {align} ${J}_{lr}$ $\omega _{l} = \omega _{r}$ $p_{l} = p_{r}$ $\rho $ $\bm {\omega ^{'}}$ $\pm \pm $ $\mathcal {F}_{\Theta }$ $\mathcal {F}_{\mathcal {MP}}^{\mathbf {E}^{(n)}}$ $\mathbf {E}^{(n)}$ $n$ $\mathcal {F}_{\mathcal {GC}}$ $pe$ $\mathbf {X}$ $N$ $\mathcal {F}_{\mathcal {MP}}$ $\mathcal {F}_{\mathcal {GC}}$ $\mathcal {F}_{\mathcal {MP}}$ $\mathbf {E}^{(n)}$ $n$ $\mathcal {F}_{\mathcal {GC}}$ $\mathcal {F}_{\mathcal {MP}}$ $\mathcal {F}_{\mathcal {GC}}$ $N$ $\cup _{n=1}^{N}\mathbf {E}^{(n)}$ $\mathbf {m}$ $\mathbf {m} = \mathbf {m}^{(0)} \in \left \{0,1\right \}^{N_{xyz}}$ $d_{1}\left (i,j\right )$ $l_1$ $i$ $j$ $\cup _{n=1}^{N}\mathbf {E}^{(n)}$ $\mathbf {m}^{(0)}$ \begin {align}&\mathbf {E}^{(n)} = \left \{ \left (i,j \right ) \,\, \mid \,\, d_{1}\left (i,j\right ) \in \mathbb {Z^{+}} \cap \left [1,\lambda \right ], \,\,\, m^{(n-1)}_{i} = 1, \,\,\, \forall i \right \}, \,\,\, \lambda \in \left \{1,2,3\right \}, \\[6pt] &{m}^{(n)}_{j} = {m}^{(n-1)}_{j} \lor \left ( \bigvee _{(i,j) \in \mathbf {E}^{(n)}} {m}_{i}^{(n-1)} \right ),\end {align} $\lor $ $\bigvee $ $\lambda $ $\lambda =1$ $\lambda =2$ $\lambda =3$ $\lambda =2$ $\mathbf {X}$ $\mathbf {X}$ $\mathbf {X}$ $\mathcal {F}_{\mathcal {MP}}$ $i$ \begin {equation}\mathbf {h}_i^{\text {out}} = \sigma \left ( \sum _{j \in \mathcal {N}(i) \cup \{i\}} \mathbf {W} \mathbf {h}_j \right ), \label {Xeqn4-14}\end {equation} $\mathbf {h}$ $\mathbf {W}$ $\sigma $ $\mathcal {N}\left (i\right )$ $i$ $\mathbf {E}$ $\alpha _{ij}$ \begin {equation}\mathbf {h}_i^{\text {out}} = \sigma \left ( \sum _{j \in \mathcal {N}(i) \cup \{i\}} \alpha _{ij} \mathbf {W} \mathbf {h}_j \right ), \label {Xeqn5-15}\end {equation} $\alpha _{ij}$ \begin {equation}\alpha _{ij} = \frac {\exp \left ( \text {LeakyReLU} \left ( \mathbf {a}^T_{\text {src}} \mathbf {W}\mathbf {h}_i + \mathbf {a}^T_{\text {trg}}\mathbf {W}\mathbf {h}_j \right ) \right )} {\sum _{k \in \mathcal {N}(i) \cup \{i\}} \exp \left ( \text {LeakyReLU} \left ( \mathbf {a}^T_{\text {src}} \mathbf {W}\mathbf {h}_i + \mathbf {a}^T_{\text {trg}}\mathbf {W}\mathbf {h}_k \right ) \right )}, \label {Xeqn6-16}\end {equation} $\mathbf {a}_{\text {src}}^{T}$ $\mathbf {a}_{\text {trg}}^{T}$ $i$ $j$ $\mathbf {\tilde {W}}$ $\mathbf {b}_{\text {src}}^{T}$ $\mathbf {b}_{\text {trg}}^{T}$ $\mathbf {Z}$ \begin {equation}\alpha _{ij} = \frac {\exp \left ( \text {LeakyReLU} \left ( \mathbf {a}^T_{\text {src}} \mathbf {W}\mathbf {h}_i + \mathbf {a}^T_{\text {trg}}\mathbf {W}\mathbf {h}_j + \mathbf {b}^T_{\text {src}} \mathbf {\tilde {W}}\mathbf {Z}_i + \mathbf {b}^T_{\text {trg}}\mathbf {\tilde {W}}\mathbf {Z}_j\right ) \right )} {\sum _{k \in \mathcal {N}(i) \cup \{i\}} \exp \left ( \text {LeakyReLU} \left ( \mathbf {a}^T_{\text {src}} \mathbf {W}\mathbf {h}_i + \mathbf {a}^T_{\text {trg}}\mathbf {W}\mathbf {h}_k + \mathbf {b}^T_{\text {src}} \mathbf {\tilde {W}}\mathbf {Z}_i + \mathbf {b}^T_{\text {trg}}\mathbf {\tilde {W}}\mathbf {Z}_k \right ) \right )}, \label {Xeqn7-17}\end {equation} $\mathbf {Z}\in \mathbb {R}^{N_{xyz}\times 3N_{t}}$ $\mathbf {\hat {X}}$ \begin {align}\mathbf {Z} = \left \{\norm {\nabla \rho }_2\in \mathbb {R}^{N_{xyz}\times N_{t}}, \,\,\,\,\,\, \norm {\nabla p}_2\in \mathbb {R}^{N_{xyz}\times N_{t}}, \,\,\,\,\,\, \text {max}\left (0,-\text {div}(u,v,w)\right )\in \mathbb {R}^{N_{xyz}\times N_{t}}\right \}.\end {align} $s = \text {ln}\left (p/\rho ^{\gamma }\right )$ \begin {equation}\mathbf {h}_i^{\text {out}} = \sigma \left ( \mathbf {W}_{1}\mathbf {h}_{i} + \mathbf {W}_{2}\cdot \text {aggregate} \left ( \{ \mathbf {h}_j, \,\,\, \forall j \in \mathcal {N}_{\text {samp}}(i) \subseteq \mathcal {N}(i) \} \right ) \right ), \label {Xeqn8-19}\end {equation} $\mathcal {F}_{\mathcal {GC}}$ $\mathcal {F}_{\mathcal {GC}}$ \begin {equation}\mathbf {h}^{\text {out}} = \text {softmax}\left (\frac {\mathbf {Q}\mathbf {K}^{T}}{\sqrt {d}}\right )\mathbf {V}, \label {Xeqn9-20}\end {equation} $\mathbf {Q}$ $\mathbf {K}$ $\mathbf {V}$ $\mathbf {Q} = \mathbf {h}\mathbf {W}_{\mathbf {Q}}$ $\mathbf {K} = \mathbf {h}\mathbf {W}_{\mathbf {K}}$ $\mathbf {V} = \mathbf {h}\mathbf {W}_{\mathbf {V}}$ $d$ $\phi \left (\cdot \right )$ \begin {equation}\mathbf {h}^{\text {out}} = \frac {\left (\phi \left (\mathbf {Q}\right )\left (\phi \left (\mathbf {K}\right )^{T}\mathbf {V}\right )\right )}{\left (\phi \left (\mathbf {Q}\right )\left (\phi \left (\mathbf {K}\right )^{T}\mathbf {1}\right )\right )}, \label {Xeqn10-21}\end {equation} $\mathbf {h}$ $i$ $\mathbf {h}_i$ \begin {equation}\mathbf {h}_{i}^{\text {out}} = \frac {\sum _{j \in \mathcal {N}_{\text {eg}}(i)} \mathbf {O}_{ij} \mathbf {V}_j} {\sum _{j \in \mathcal {N}_{\text {eg}}(i)} \mathbf {O}_{ij}}, \,\,\,\,\, \mathbf {O}_{ij} = \exp \left ( \frac {\mathbf {Q}_i \mathbf {K}_j}{\sqrt {d}} \right ), \label {Xeqn11-22}\end {equation} $\mathcal {N}_{\text {eg}}(i)$ $i$ $\mathbf {h}$ $\mathcal {F}_{\mathcal {GC}}$ $\mathbf {h}$ $\mathbf {h}^{\text {out}}$ $\mathbf {h}'$ $\mathbf {W}_1$ $\mathbf {W}_2$ $\mathbf {W}_3$ $\sigma $ $\mathbf {h}'$ $\mathbf {\Psi }$ $\mathbf {W}_1$ $\mathbf {W}_2$ $\mathbf {W}_3$ $n$ \begin {align}&\mathbf {\Psi }^{(n)} = \mathbf {W}_{1}\mathbf {\Psi }^{(n-1)} + \mathbf {W}_{2}\mathbf {h}'^{(n)}, \\ & \mathbf {y}^{(n)} = \mathbf {W}_3 \mathbf {\Psi }^{(n)}.\end {align} $\mathbf {W}_1$ $\mathbf {W}_1$ $\mathcal {F}_{\Theta }$ $\mathcal {L}_{\text {data}}$ $\mathcal {L}_{\text {phy}}$ $\mathbf {m} = \mathbf {m}^{(0)}$ \begin {align}\mathcal {L}_{\text {data}} = \frac {1}{N_{tc}\sum \mathbf {m}}\norm {\mathbf {m}\odot \left (\mathbf {X} - \hat {\mathbf {X}}\right )}_{2}^{2}.\end {align} $\mathbf {\hat {X}}$ $\mathbf {\hat {U}}\in \mathbb {R}^{N_{t}\times N_{c} \times N_{x} \times N_{y} \times N_{z}}$ $\left (i,j,k\right )$ $\left (i,j\right )$ \begin {align}\label {eq:godloss} &\mathcal {L}_{\text {phy}} = \frac {1}{N} \sum _{c}\sum _{t}\sum _{i,\,j,\,k} \left ( \mathbf {\hat {U}}_{i,\,j,\,k}^{(t+1,\,c)} - \mathbf {\hat {U}}_{i,\,j,\,k}^{(t,\,c)} + \lambda _{x}\left [\Delta {\mathbf {F}}_{i,\,j,\,k}^{(t,\,c)} \right ] + \lambda _{y}\left [\Delta {\mathbf {G}}_{i,\,j,\,k}^{(t,\,c)}\right ] + \lambda _{z}\left [\Delta {\mathbf {H}}_{i,\,j,\,k}^{(t, \, c)}\right ]\right )^{2},\\ &\Delta {\mathbf {F}}_{i,\,j,\,k}^{(t,\,c)} = {\mathbf {F}}_{i + \frac {1}{2},\,j,\,k}^{(t,\,c)} - {\mathbf {F}}_{i - \frac {1}{2},\,j,\,k}^{(t,\,c)}, \\[8pt] &\Delta {\mathbf {G}}_{i,\,j,\,k}^{(t,\,c)} = {\mathbf {G}}_{i,\,j + \frac {1}{2},\,k}^{(t,\,c)} - {\mathbf {G}}_{i,\,j - \frac {1}{2}\,,k}^{(t,\, c)}, \\[8pt] &\Delta {\mathbf {H}}_{i,\,j,\,k}^{(t,\,c)} = {\mathbf {H}}_{i,\,j,\,k+\frac {1}{2}}^{(t,\,c)} - {\mathbf {H}}_{i,\,j,\,k-\frac {1}{2}}^{(t,\,c)},\end {align} $\lambda _{x} = \Delta t/\Delta x$ $\lambda _{y} = \Delta t/\Delta y$ $\lambda _{z} = \Delta t/\Delta z$ $N = N_{xyz}N_{tc}$ $\Delta {\mathbf {F}}$ $\Delta {\mathbf {G}}$ $\Delta {\mathbf {H}}$ $x,y,z$ $\mathbf {\hat {U}}$ $\mathcal {L}_{\text {data}} + \mathcal {L}_{\text {phy}}$ $\mathcal {L} = \mathcal {L}_{\text {data}} + \mathcal {L}_{\text {phy}}$ $\partial _{\mathbf {n}} \mathbf {U} = \mathbf {0}$ $10^{-4}$ $64^3$ $5\times 10^{-4}$ $\left (\pm 1,\pm 1,\pm 1\right )^{T}$ $\lambda = 3$ $\mathcal {F}_{\mathcal {MP}}$ $\mathcal {F}_{\mathcal {GC}}$ $\mathcal {F}_{\mathcal {MP}}$ $(\lambda =3)$ $\mathbf {X}$ $\mathcal {F}_{\mathcal {MP}}$ $(\lambda =3)$ $\mathcal {F}_{\mathcal {GC}}$ $\mathcal {F}_{\mathcal {MP}}$ $\phi = \text {ELU}(x) + 1$ $\mathcal {F}_{\mathcal {GC}}$ $(\lambda =3)$ $\mathbf {X}$ $\mathcal {F}_{\mathcal {GC}}$ $(\lambda =3)$ $\mathcal {F}_{\mathcal {MP}}$ $\mathcal {F}_{\mathcal {GC}}$ $\mathcal {F}_{\mathcal {MP}}$ $\mathcal {F}_{\mathcal {GC}}$ $(\lambda = 1)$ $(\lambda = 2)$ $(\lambda = 3)$ $\lambda = \left \{1,2,3\right \}$ $\mathcal {F}_{\mathcal {MP}}$ $\mathcal {F}_{\mathcal {GC}}$ $\mathbf {X}$ $\lambda =3$ $\lambda =2$ $\lambda =1$ $\lambda = 1$ $\lambda =2$ $\lambda =3$ $\lambda =1$ $\lambda = \left \{1,2,3\right \}$ $\mathcal {F}_{\mathcal {MP}}$ $\mathcal {F}_{\mathcal {GC}}$ $\lambda = 2$ $\lambda = 1$ $\lambda = \left \{1,2,3\right \}$ $\mathcal {F}_{\mathcal {MP}}$ $\mathcal {F}_{\mathcal {GC}}$ $\lambda = 2$ $\lambda = 3$ $\lambda =1$ $\lambda =1$ $(\lambda = 3)$ $\mathcal {F}_{\mathcal {MP}}$ $\mathcal {F}_{\mathcal {GC}}$ $\mathbf {X}$ $\lambda = 3$ $(\lambda = 3)$ $\mathcal {F}_{\mathcal {MP}}$ $\mathcal {F}_{\mathcal {GC}}$ $\mathcal {F}_{\mathcal {MP}}$ $\mathcal {F}_{\mathcal {GC}}$ $(\lambda = 3)$ $\mathcal {F}_{\mathcal {MP}}$ $\mathcal {F}_{\mathcal {GC}}$ $(\lambda = 3)$ $\mathbf {X}$ $\mathcal {F}_{\mathcal {MP}}$ $\mathcal {F}_{\mathcal {GC}}$ $(\lambda = 3)$ $\mathcal {F}_{\mathcal {MP}}$ $\mathcal {F}_{\mathcal {GC}}$ $\lambda =3$ $\mathcal {F}_{\mathcal {MP}}$ $\mathcal {F}_{\mathcal {GC}}$ $\lambda =3$ $\mathbf {X}$ $\mathcal {F}_{\mathcal {MP}}$ $\mathcal {F}_{\mathcal {GC}}$ $\lambda =3$ $\cancel {\mathcal {L}_{\text {phy}}}$ $\lambda $ $s_l$ $s_r$ $x=0$ $s_*$ $x$ \begin {equation}{\partial _t} \begin {pmatrix} \rho \\ \rho u \\ \rho v \\ \rho w \\ E \\ \end {pmatrix} + {\partial _x} \begin {pmatrix} \rho u \\ \rho u^{2} + p \\ \rho u v \\ \rho u w \\ u( E + p) \\ \end {pmatrix} = \mathbf {0} \label {Xeqn12-A.1}\end {equation} $s_l$ $s_r$ \begin {equation}s_l = \tilde {u} - \tilde {a}, \,\,\,\,\,\,\,\, s_r = \tilde {u} + \tilde {a}, \label {Xeqn13-A.2}\end {equation} $\tilde {u}$ $\tilde {a}$ \begin {equation}\tilde {u} = \frac {\sqrt {\rho _l}u_l + \sqrt {\rho _r}u_r}{\sqrt {\rho _l} + \sqrt {\rho _r}}, \,\,\,\,\,\,\,\, \tilde {a} = \sqrt {\left (\gamma - 1\right )\left (\tilde {H} - \frac {1}{2}\tilde {u}^2\right )}, \label {Xeqn14-A.3}\end {equation} $\tilde {H}$ \begin {equation}\tilde {H} =\frac {\sqrt {\rho _l}H_l + \sqrt {\rho _r}H_r}{\sqrt {\rho _l} + \sqrt {\rho _r}}, \,\,\,\,\,\,\,\, H =\frac {E + p}{\rho }. \label {Xeqn15-A.4}\end {equation} $s_l$ $s_r$ $s_*$ \begin {equation}s_{*} = \frac {p_r - p_l + \rho _{l}u_{l}(s_l - u_l) - \rho _{r}u_{r}(s_r - u_r)}{ \rho _{l}(s_l - u_l) - \rho _{r}(s_r - u_r)}. \label {Xeqn16-A.5}\end {equation} $\mathbf {U}_{*l}$ $\mathbf {U}_{*r}$ $s_*$ $s_l$ $s_r$ \begin {equation}\mathbf {U}_{*\eta } = \rho _{\eta }\left (\frac {s_\eta - u_\eta }{s_\eta - s_{*}}\right )\begin {pmatrix} 1 \\ s_{*} \\ v_\eta \\ w_\eta \\ \frac {E_\eta }{\rho _\eta } + (s_{*} - u_\eta )\left [s_{*} + \frac {p_\eta }{\rho _\eta (s_\eta - u_\eta )}\right ] \end {pmatrix}, \label {Xeqn17-A.6}\end {equation} $\eta = \left \{l,r\right \}$ $\mathbf {F}_{*l}$ $\mathbf {F}_{*r}$ $\mathbf {F}_{i + \frac {1}{2}}^n$ \begin {equation}\label {eq:HLLC} \mathbf {F}_{*\eta } = \mathbf {F}_\eta + s_\eta (\mathbf {U}_{*\eta } - \mathbf {U}_\eta ), \;\;\;\;\;\; \mathbf {F}_{i + \frac {1}{2}}^t = \begin {cases} \mathbf {F}_l & 0 \leq s_l,\\ \mathbf {F}_{*l} & s_l \leq 0 \leq s_{*},\\ \mathbf {F}_{*r} & s_{*} \leq 0 \leq s_{r},\\ \mathbf {F}_{r} & 0 \geq s_{r}. \end {cases}\end {equation} $y$ $\mathbf {G}_{j + \frac {1}{2}}^t$ $z$ $\mathbf {H}_{k + \frac {1}{2}}^t$ $\mathbf {\bar {X}}$ $\mathbf {U}$ $x$ $y$ $z$ \begin {align}&\bm {\alpha }_{0} = \frac {1}{3}\mathbf {U}_{i-2} -\frac {7}{6}\mathbf {U}_{i-1} - \frac {11}{6}\mathbf {U}_{i}\\[6pt] &\bm {\alpha }_{1} =-\frac {1}{6}\mathbf {U}_{i-1} + \frac {5}{6}\mathbf {U}_{i} + \frac {1}{3}\mathbf {U}_{i + 1} \\[6pt] &\bm {\alpha }_{2} = \frac {1}{3}\mathbf {U}_{i} + \frac {5}{6}\mathbf {U}_{i + 1} - \frac {1}{6}\mathbf {U}_{i + 2}\end {align} $\bm {\beta }$ \begin {align}&\bm {\beta }_{0} = \frac {13}{12}\left (\mathbf {U}_{i-2}-2\mathbf {U}_{i-1}+\mathbf {U}_{i}\right )^{2} + \frac {1}{4}\left (\mathbf {U}_{i-2} - 4\mathbf {U}_{i-1} + 3\mathbf {U}_{i}\right )^{2} \\[6pt] &\bm {\beta }_{1} = \frac {13}{12}\left (\mathbf {U}_{i-1}-2\mathbf {U}_{i}+\mathbf {U}_{i+1}\right )^{2} + \frac {1}{4}\left (\mathbf {U}_{i-1} - \mathbf {U}_{i+1}\right )^{2}\\[6pt] &\bm {\beta }_{2} = \frac {13}{12}\left (\mathbf {U}_{i}-2\mathbf {U}_{i+1}+\mathbf {U}_{i+2}\right )^{2} + \frac {1}{4}\left (3\mathbf {U}_{i} - 4\mathbf {U}_{i+1} + \mathbf {U}_{i+2}\right )^{2}\end {align} $\bm {\omega }$ \begin {equation}\bm {\omega }_n = \frac {\bm {\zeta }_{n}}{\bm {\zeta }_{0} + \bm {\zeta }_{1} + \bm {\zeta }_{2}}, \,\,\,\,\,\,\,\, \bm {\zeta }_{n} = \frac {d_{n}}{\left (\epsilon + \bm {\beta }_{n}\right )^2}, \,\,\,\,\,\,\,\, \left (d_0, d_1, d_2\right ) = \left (\frac {1}{10}, \frac {3}{5}, \frac {3}{10} \right ), \,\,\,\,\,\,\,\, n \in \left \{0,1,2 \right \}, \label {Xeqn19-B.7}\end {equation} $\epsilon $ ${\mathbf {U}}_{i + \frac {1}{2}}^{\text {WENO}}$ \begin {equation}{\mathbf {U}}_{i + \frac {1}{2}}^{\text {WENO}} = \bm {\omega }_{0} \odot \bm {\alpha }_0 + \bm {\omega }_{1} \odot \bm {\alpha }_1 + \bm {\omega }_{2} \odot \bm {\alpha }_2 . \label {Xeqn20-B.8}\end {equation} $[0,1]^3$ $\,\,{J}_{21}^{} {J}_{32}^{} {S}_{34}^{-} {S}_{41}^{+} {J}_{51}^{} {S}_{62}^{+} {J}_{73}^{} {S}_{84}^{-} {R}_{65}^{-} {S}_{76}^{-} {J}_{78}^{} {J}_{85}^{}$ ${S}_{21}^{+} {S}_{32}^{-} {S}_{34}^{-} {S}_{41}^{+} {S}_{51}^{+} {S}_{62}^{-} {S}_{73}^{+} {S}_{84}^{-} {S}_{65}^{-} {S}_{76}^{+} {S}_{78}^{+} {S}_{85}^{-}$ ${R}_{21}^{+} {R}_{32}^{-} {R}_{34}^{-} {R}_{41}^{+} {S}_{51}^{-} {S}_{62}^{+} {S}_{73}^{-} {S}_{84}^{+} {R}_{65}^{-} {R}_{76}^{+} {R}_{78}^{+} {R}_{85}^{-}$ ${J}_{21}^{} {J}_{32}^{} {J}_{34}^{} {J}_{41}^{} {S}_{51}^{-} {R}_{62}^{+} {S}_{73}^{-} {R}_{84}^{+} {J}_{65}^{} {J}_{76}^{} {J}_{78}^{} {J}_{85}^{}$ ${J}_{21}^{} {J}_{32}^{} {J}_{34}^{} {J}_{41}^{} {J}_{51}^{} {J}_{62}^{} {J}_{73}^{} {J}_{84}^{} {J}_{65}^{} {J}_{76}^{} {J}_{78}^{} {J}_{85}^{}$ ${S}_{21}^{+} {J}_{32}^{} {J}_{34}^{} {S}_{41}^{+} {S}_{51}^{+} {J}_{62}^{} {S}_{73}^{+} {J}_{84}^{} {J}_{65}^{} {S}_{76}^{+} {S}_{78}^{+} {J}_{85}^{}$ $\text {SNR} = 25 \text { dB}$ $\mathcal {F}_{\mathcal {MP}}$ $\mathcal {F}_{\mathcal {GC}}$ $(\lambda = 3)$ $\mathbf {X}$ $\text {SNR} = 25 \text { dB}$ $\text {SNR} = 25 \text { dB}$ https://orcid.org/0000-0001-5460-5337 mailto:rgc38@cam.ac.uk mailto:rrk26@cam.ac.uk https://doi.org/10.1016/j.cma.2025.118328 https://doi.org/10.1016/j.cma.2025.118328 http://creativecommons.org/licenses/by/4.0/ R. Cassia and R. Kerswell message-passing is permutation-invariance, meaning features can be updated in any node ordering. Furthermore, message-passing inherently captures local dependencies in data. Graph neural networks (GNNs) and message-passing were first introduced by Scarselli et al. [1]. Graph convolutional networks were then introduced by Kipf and Welling[2], which perform the message-passing aggre- gation through convolutional operations, increasing efficiency and interpretability. Veličković et al. [3] improve over GCNs by in- troducing graph attention networks (GATs) which use attention scores in the aggregation process. The attention mechanism reduces oversmoothing typically encountered in GCNs. Hamilton et al. [4] also improve over GCNs by using sampling-based aggregation (graphSAGE), enabling scalability to larger graphs. Xu et al. [5] use a MLP-based aggregation of neighbours in their graph isomor- phism network (GIN). They show that their network is as expressive as the Weisfeiler-Lehman test (a graph isomorphism test), which enables their model to better distinguish graph structures. Contextually-aware models are those that capture long-range dependencies, typically through global attention mechanisms. The origins of attention mechanisms can be traced to the work of Bahdanau et al. [6] and Luong et al. [7] who introduce additive and dot-product attention in their recurrent-based architectures for neural machine translation. However, the major breakthrough came when the transformer architecture, presented by Vaswani et al. [8], completely removed recurrence through usage of sinusoidal po- sitional encodings and multi-head self attention. This novel architecture is parallelizable over sequences which significantly reduces training time. They generalize the attention mechanism in terms of querys, keys and values where the multiple heads allows learning of different aspects of the input. Their attention is a scaled dot-product attention where the scaling prevents exploding gradients. The transformer breakthrough led to numerous milestones in language modeling and vision [9–11]. There has also been work on reducing the computational complexity of transformers from quadratic to linear. Katharopoulos et al. [12] achieve linear complexity by expressing self-attention as a linear dot-product of kernel feature maps and by making use of the associativity of matrix multi- plication. Choromanski et al. [13] also achieve linear complexity by approximating kernelized attention using random feature maps. Recently, there has been work on replacing attention with state space models (SSMs) that are contextually-aware to achieve linear complexity. Gu and Dao[14] do this by making the SSM parameters input-dependent, allowing their model to selectively propa- gate or forget information along a sequence depending on the current token. Their model, named Mamba, is implemented using a hardware-aware algorithm to address the computational inefficiencies caused by the recurrent-nature of the model. Dao and Gu[15] later develop a theoretical framework that connects SSMs with certain variants of attention - enabling the design of a newer model, Mamba-2, that also leverages the modern hardware optimizations of attention. There has also been work on extending transformers and selective-SSMs to handle graphical data [16–20]. In this paper, we are interested in using the framework proposed by Rampášek et al. [21] in performing our fluid reconstruction task. The framework, known as GraphGPS, comprises three main components: a local message-passing mechanism via graphs, a global attention mechanism, and positional/structural encodings. The contributions of our paper are as follows: • To the best of our knowledge, this is the first study on the use of GraphGPS for flow field reconstruction. In particular, we make use of the modularity of the framework to explore the effect of different global attention mechanisms and different message- passing mechanisms. We also show superior performance of GraphGPS against other ML architectures as well as against classical approaches. • For the message-passing module, we also introduce a modified GAT layer, known as SA-GATConv, that leverages information on shocks and discontinuities in computing the local attention weights. We show that this produces sharper reconstructions in regions of shocks and discontinuities. • We further leverage the graphical structure during message-passing such that a node that is initially unknown cannot propagate information to neighbours unless information has first been propagated to it from a known node. Compared to a naive form of message passing where all nodes (known or unknown) propagate information to each other, this form of guided message-passing is more interpretable, reduces computational overhead, and improves convergence. • We ensure that in the projected space of GraphGPS, the representation of unknown nodes is zeroed such that the only information available is through their positional encoding. We achieve this through a masked projection of the input, where the masking is based on whether a node is known or unknown. This also ensures the reconstruction is agnostic to the initialization values at unknown nodes. We show empirically that this setup performs just as well as an unmasked (normal) projection of the input where feature propagation [22] has been used to provide an initial guess at unknown nodes. • To the best of our knowledge, this is the first study on the 3D Riemann problems in the SciML literature. This paper is organized as follows. In Section 2, we review prior work on flow field reconstruction from sparse measurements. In Section 3, we mathematically formulate our reconstruction problem, and outline the GraphGPS architecture and its various aspects. In Section 4, we show our experimental findings. Finally, we finish with conclusions in Section 5. 2. Prior work on flow reconstruction One popular approach for reconstruction in fluid dynamics is through proper orthogonal decomposition (POD), which involves finding orthonormal basis vectors that minimize the difference between the known data points and their projection onto a lower dimensional subspace. Gappy POD uses a mask to enforce the POD to be applied on known points only, and the basis vectors can then be used to estimate the variables at all grid points. The method was first introduced by Everson and Sirovich[23] for reconstructing human face images, and later used by Bui-Thanh et al. [24] in an aerodynamic setting where the pressure field around aerofoils were reconstructed from sensors on the aerofoil surface. Vendl and Faßbender[25] also used the method to recover the transonic flow over a wing-body configuration. The downside of POD is that it assumes a linear decomposition and may struggle to capture non-linear Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 2 R. Cassia and R. Kerswell flow behaviour. Other classical methods are interpolation-based. One such method constructs a function that is a weighted sum of radial basis functions (RBFs). Another such method is Kriging interpolation, which is a probabilistic method that estimates values at unmeasured locations based on spatial correlations in the data, whilst also providing uncertainty estimates. Huang et al. [26] investigate the use of Kriging and RBFs in the reconstruction of hypersonic flow through a nozzle. More recently machine learning has become a powerful paradigm for performing reconstruction. Duthé et al. [27] use a graph neural network to simultaneously reconstruct both the pressure and velocity fields around aerofoils from the surface pressure distri- butions. Their model is also capable of predicting global properties of the flow such as inflow velocity and angle of attack. They make use of feature propagation to initialize unknown nodes. Xu et al. [28] use a physics-informed neural network to reconstruct the wake flow past a cylinder. They place emphasis on the importance of training procedures in drastically reducing the number of training epochs required, by using a cosine-annealing learning rate schedule with warm restarts. Kim et al. [29] use a gappy autoencoder to perform reconstruction via non-linear manifold representations. The method shows improvement over gappy POD (which utilizes linear subspaces), as tested on diffusion, advection and wave problems. They also assess the impact of different sensor placements on reconstruction accuracy. Peng et al. [30] use a physics-informed graph neural network to reconstruct high speed transient flows. They use a fully-connected neural network to map the sparse data to the entire domain, and then assign the mapped data to graph nodes to facilitate information exchange. Finally the output state variables are used to compute a PINN loss summed together with initial condition and boundary condition losses. Their model exhibits good spatial and temporal generalization but has low accuracy when reconstructing flow near wave-fronts. Hosseini and Shiri [31] use a physics-informed neural network to reconstruct the wake flow past a cylinder from sparse sensors placed around the domain boundary and cylinder wall. Danciu et al. [32] use graph atten- tion convolutional layers to reconstruct the flow in an internal combustion engine where the geometry is time-varying due to piston motion. They use a feature propagation step to leverage information from known nodes to initialize missing features, and use a mask to distinguish between original and propagated data points which allows more effective learning from the sparse inputs. Quattromini et al. [33] make use of the adjoint method to compute Reynolds-averaged Navier Stokes (RANS) gradients which are then used as optimization terms during the training of a GNN. They use this hybrid ML-CFD approach to reconstruct the mean flow past cylinders. Nguyen et al. [34] introduce FLRNet which uses a variational autoencoder to learn a latent representation of the flow-field past a cylinder. The VAE makes use of Fourier feature layers and a perceptual loss term to enhance its ability to learn detailed features of the flow-field. An MLP is then used to correlate sparse sensor measurements to the learned representation. Dang et al. [35] use an operator learning framework to perform reconstruction. The framework utilizes a branch-trunk architecture. The branch network, which uses Fourier neural operators, incorporates sensor measurements at different timestamps. The trunk network uses an MLP to encode the entire temporal domain. Combining the outputs of the branch and trunk nets yields the final reconstruction which is discretization-independent. 3. Methodology 3.1. Problem definition We are interested in the following inverse problem: Find Θ ∶ ( 𝐗 ∈ ℝ𝑁𝑥𝑦𝑧×𝑁𝑡𝑐 ,𝐦 ∈ {0, 1}𝑁𝑥𝑦𝑧 ) ⟶ �̂� ∈ ℝ𝑁𝑥𝑦𝑧×𝑁𝑡𝑐 such that 𝜖(�̂�, �̄�) is minimized, where: • 𝐗, �̂�, and �̄� are the sparse, reconstructed, and true flow fields, respectively. • 𝐦 is the binary mask which indicates where values are known or unknown. • Θ is the machine learning model with learnable weights Θ. • 𝑁𝑥𝑦𝑧 = 𝑁𝑥𝑁𝑦𝑁𝑧 represents the number of points in 3D space. 𝑁𝑡𝑐 = 𝑁𝑡𝑁𝑐 represents the number of features where 𝑁𝑡 is number of timesteps and 𝑁𝑐 is the number of variables in the physical system. • 𝜖 is the error metric between �̂�, and �̄� for measuring accuracy (not to be confused with the training loss function). In this paper, we are interested in flow fields pertaining to the 3D Euler equations, so 𝑁𝑐 = 5: 𝜕𝑡𝐔 + 𝜕𝑥𝐅(𝐔) + 𝜕𝑦𝐆(𝐔) + 𝜕𝑧𝐇(𝐔) = 𝟎, (1) 𝜕𝑡 ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ 𝜌 𝜌𝑢 𝜌𝑣 𝜌𝑤 𝐸 ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ + 𝜕𝑥 ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ 𝜌𝑢 𝜌𝑢2 + 𝑝 𝜌𝑢𝑣 𝜌𝑢𝑤 𝑢(𝐸 + 𝑝) ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ + 𝜕𝑦 ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ 𝜌𝑣 𝜌𝑢𝑣 𝜌𝑣2 + 𝑝 𝜌𝑣𝑤 𝑣(𝐸 + 𝑝) ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ + 𝜕𝑧 ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ 𝜌𝑤 𝜌𝑢𝑤 𝜌𝑣𝑤 𝜌𝑤2 + 𝑝 𝑤(𝐸 + 𝑝) ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ = 𝟎, (2) 𝐸 = 1 2 𝜌 ( 𝑢2 + 𝑣2 +𝑤2) + 𝑝 (𝛾 − 1) , (3) where 𝜌, 𝑢, 𝑣, 𝑤, 𝑝, 𝐸, 𝛾 are density, 𝑥-velocity, 𝑦-velocity, 𝑧-velocity, pressure, energy and ratio of specific heats, respectively. The boundary conditions are assumed to be Neumann, i.e., 𝜕𝐧𝐔 = 𝟎. To enforce positivity preservation (i.e., non-negative density and energy, [36]) as a hard constraint, we work with the following representation for 𝐗 at each spatiotemporal point: 𝐗 = ( 𝜌, 𝑢, 𝑣, 𝑤,𝐸 − 1 2 𝜌(𝑢2 + 𝑣2 +𝑤2) )𝑇 , (4) Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 3 R. Cassia and R. Kerswell and apply during training, for each spatiotemporal point, the transformation  after applying Θ:  ∶ ( 𝜌, 𝑢, 𝑣, 𝑤,𝐸 − 1 2 𝜌 ( 𝑢2 + 𝑣2 +𝑤2) )𝑇 ⟶ ( |𝜌|, 𝑢, 𝑣, 𝑤, | | | | 𝐸 − 1 2 𝜌 ( 𝑢2 + 𝑣2 +𝑤2)| | | | )𝑇 . (5) We obtain �̄�, the true field, by numerically solving the 3D Euler equations using a 5th-order HLLC-WENO scheme (see Appendices A and B for more details). The mask 𝐦 is randomly generated from a uniform distribution such that 90% of points are unknown. For element 𝑚𝑖 of 𝐦: 𝑚𝑖 = { 0 if 𝑎𝑖 < 0.9, 1 if 𝑎𝑖 ≥ 0.9. , 𝑖 ∈ [1, 𝑁𝑥𝑦𝑧], 𝑎𝑖 ∼  (0, 1). (6) The sparse field is then generated from the mask as 𝐗 = 𝐦⊙ �̄�, where ⊙ is the Hadamard product. For the error metric, we choose: 𝜖 = ‖ ‖ ‖ �̂� − �̄�‖‖ ‖2 ‖ ‖ �̄�‖ ‖2 . (7) 3.2. 3D Riemann problems and characteristic waves We are interested in solutions of the 3D Euler equations where the initial conditions are described by Riemann problems. As such we briefly discuss this class of problems. For a more in-depth discussion of Riemann problems, refer to Schulz-Rinne[37], Hoppe et al. [38]. A unit cube domain, with dimensions [0, 1]3, is divided into 8 octants where each octant is uniformly initialized with its own set of (𝜌, 𝑢, 𝑣, 𝑤, 𝑝)𝑇 . The top octants are numbered 1 to 4, anticlockwise, starting from the octant where 𝑥 > 0.5, 𝑦 > 0.5, 𝑧 > 0.5. The bottom octants are numbered 5 to 8 anticlockwise, starting from the octant where 𝑥 > 0.5, 𝑦 > 0.5, 𝑧 < 0.5. The initial states of the octants are set such that there exists a single characteristic wave at each of the 12 interfaces that separate the octants. The characteristic wave can be either a rarefaction 𝑅, shock 𝑆, or a contact discontinuity 𝐽 . We identify these waves by their positions based on octant numbering, i.e., 𝑙𝑟 = {21, 32, 34, 41, 51, 62, 73, 84, 65, 76, 78, 85}. In the following let 𝜔 denote the velocity normal to an interface, and let ω′ denote the tangential velocities. The relations for rarefaction waves 𝑅± 𝑙𝑟 are as follows (assuming polytropic flow): 𝜔𝑙 − 𝜔𝑟 = ± 2 √ 𝛾 𝛾 − 1 (√ 𝑝𝑙 𝜌𝑙 − √ 𝑝𝑟 𝜌𝑟 ) , 𝜌𝑙 𝜌𝑟 = ( 𝑝𝑙 𝑝𝑟 ) 1 𝛾 , (8) ω ′ 𝑙 = ω ′ 𝑟 , 𝜔𝑙 < 𝜔𝑟, 𝑠𝑔𝑛 ( 𝑝𝑙 − 𝑝𝑟 ) = ∓1. (9) The relations for shock waves 𝑆± 𝑙𝑟 are as follows: 𝜔𝑙 − 𝜔𝑟 = √ ( 𝑝𝑙 − 𝑝𝑟 )( 𝜌𝑙 − 𝜌𝑟 ) 𝜌𝑙𝜌𝑟 , 𝜌𝑙 𝜌𝑟 = 𝛾 ( 𝑝𝑙 + 𝑝𝑟 ) + ( 𝑝𝑙 − 𝑝𝑟 ) 𝛾 ( 𝑝𝑙 + 𝑝𝑟 ) − ( 𝑝𝑙 − 𝑝𝑟 ) , (10) ω ′ 𝑙 = ω ′ 𝑟 , 𝜔𝑙 > 𝜔𝑟, 𝑠𝑔𝑛 ( 𝑝𝑙 − 𝑝𝑟 ) = ±1. (11) For contact discontinuities 𝐽𝑙𝑟, the pressure and normal velocities are constant, i.e., 𝜔𝑙 = 𝜔𝑟, 𝑝𝑙 = 𝑝𝑟. Additionally, in polytropic flows, the density 𝜌 varies arbitrarily. The tangential velocities ω′ can also vary arbitrarily, in which case a rotation is induced as a slip line.1 Riemann problems (2D or 3D) yield complex flow patterns which arise from the interaction of the initialized elementary waves. They are therefore used as canonical problems for testing compressible flow solvers. We argue that they are also a plausible choice for exploring inverse problems of the Euler equations, such as our reconstruction task. The configurations or cases we explore are shown in Fig. 1, and their initialization values are summarised in Appendix C. 3.3. GraphGPS framework In this subsection we outline our design of model Θ, which follows the GraphGPS framework [21]. Fig. 2 provides a high-level overview of this framework. The framework takes the sparse field 𝐗, projects it to higher dimensional space along with positional encodings, then applies 𝑁 GraphGPS layers before projecting back to the original space. Each GraphGPS layer involves contributions from two modules:  and  .  is the message-passing module which captures information at a local level, in the neighbour- hood of each node, where the connections between the nodes are defined by edge indices 𝐄(𝑛) for layer 𝑛.  is the module used for computing global context between distant nodes. There are also MLP layers, each involving a linear-ReLU-linear sequence, for processing the combined contributions of  and  , as well as for mixing node features with node positional encodings. The number of layers 𝑁 is determined by the size of the set of edge indices, i.e., ∪𝑁 𝑛=1𝐄 (𝑛). The set of edge indices is determined from the mask 𝐦 via guided message-passing, which we explain next. 1 It is possible to determine directions ±± for contact discontinuities if the tangential velocities differ. However in 3D this will make the notation cumbersome and so we decide to omit directions for contact discontinuities. Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 4 R. Cassia and R. Kerswell Fig. 1. Density fields of the 3D Riemann configurations that we explore in this study. Fig. 2. Overview of GraphGPS framework. 𝐄(𝑛)  represents a message-passing layer acting on edge indices 𝐄(𝑛) for layer 𝑛.  denotes a layer that captures global dependencies of its input features. 𝑝𝑒 denotes positional encodings. Residual connections and normalization layers are omitted for clarity. 3.3.1. Guided message-passing Guided message-passing is underpinned by two rules. The first rule is that during message-passing, information flows strictly from known nodes only (to immediate neighbours). The second rule is that an unknown node becomes ’known’ if information has been propagated to it at least once. Formally: Definition 1 (Guided Message-Passing). Let 𝐦 = 𝐦(0) ∈ {0, 1}𝑁𝑥𝑦𝑧 denote the initial mask. Let 𝑑1(𝑖, 𝑗) denote the 𝑙1 distance between node 𝑖 and node 𝑗 in their Euclidean space. The set of edge indices ∪𝑁 𝑛=1𝐄 (𝑛) can be iteratively generated starting from 𝐦(0) based on the following two rules: 𝐄(𝑛) = { (𝑖, 𝑗) ∣ 𝑑1(𝑖, 𝑗) ∈ ℤ+ ∩ [1, 𝜆], 𝑚(𝑛−1) 𝑖 = 1, ∀𝑖 } , 𝜆 ∈ {1, 2, 3}, (12) Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 5 R. Cassia and R. Kerswell Fig. 3. Illustration of guided message-passing. Red squares indicate known nodes. Blue circles indicate unknown nodes. 𝑚(𝑛) 𝑗 = 𝑚(𝑛−1) 𝑗 ∨ ⎛ ⎜ ⎜ ⎝ ⋁ (𝑖,𝑗)∈𝐄(𝑛) 𝑚(𝑛−1) 𝑖 ⎞ ⎟ ⎟ ⎠ , (13) where ∨ denotes the logical OR operator, and ⋁ denotes the logical OR over a set. Parameter 𝜆 determines whether to consider orthogonal immediate neighbours only (𝜆 = 1), orthogonal+diagonal immediate neighbours (𝜆 = 2), or orthogonal+diagonal+corner immediate neighbours (𝜆 = 3), in determining the next set of edge connections. Fig. 3 illustrates guided message-passing in 2D with 𝜆 = 2. 3.3.2. Masked projection and positional encoding We use masked linear transformations to project input 𝐗 to a higher dimensional space, such that in the projected space, the repre- sentation of unknown nodes is zeroed. We concatenate these projected features with the linear transformation of Laplacian positional encodings of 𝐗 [16,39]. Through this masked projection, we ensure that the initially unknown nodes contribute to subsequent layers only through their positional information. This masked projection also ensures the model is agnostic to the padding value used for unknown nodes in 𝐗 (should they be non-zero). 3.3.3. Candidates for  This section outlines the message-passing modules we will explore. One popular and foundational message-passing layer is the graph convolutional layer (GCNConv). For node 𝑖, GCNConv is expressed as [2]: 𝐡out𝑖 = 𝜎 ⎛ ⎜ ⎜ ⎝ ∑ 𝑗∈ (𝑖)∪{𝑖} 𝐖𝐡𝑗 ⎞ ⎟ ⎟ ⎠ , (14) where 𝐡 represents an intermediate feature, 𝐖 is the layer weight, 𝜎 is the ReLU activation function, and  (𝑖) denotes the neighbour- hood of node 𝑖 (according to edge indices 𝐄). A graph attention convolutional layer (GATConv) is an extension on GCNConv through the introduction of attention weights 𝛼𝑖𝑗 [3]: 𝐡out𝑖 = 𝜎 ⎛ ⎜ ⎜ ⎝ ∑ 𝑗∈ (𝑖)∪{𝑖} 𝛼𝑖𝑗𝐖𝐡𝑗 ⎞ ⎟ ⎟ ⎠ , (15) where the attention weights 𝛼𝑖𝑗 are computed as: 𝛼𝑖𝑗 = exp ( LeakyReLU ( 𝐚𝑇src𝐖𝐡𝑖 + 𝐚𝑇trg𝐖𝐡𝑗 )) ∑ 𝑘∈ (𝑖)∪{𝑖} exp ( LeakyReLU ( 𝐚𝑇src𝐖𝐡𝑖 + 𝐚𝑇trg𝐖𝐡𝑘 )) , (16) and 𝐚𝑇src, 𝐚𝑇trg are learnable parameters pertaining to the source (𝑖) and target (𝑗) nodes. In this paper we propose to modify the GATConv attention mechanism by introducing additional parameters �̃�, 𝐛𝑇src, 𝐛𝑇trg acting on a representation 𝐙 which encapsulates information about flow discontinuities: 𝛼𝑖𝑗 = exp ( LeakyReLU ( 𝐚𝑇src𝐖𝐡𝑖 + 𝐚𝑇trg𝐖𝐡𝑗 + 𝐛𝑇src�̃�𝐙𝑖 + 𝐛𝑇trg�̃�𝐙𝑗 )) ∑ 𝑘∈ (𝑖)∪{𝑖} exp ( LeakyReLU ( 𝐚𝑇src𝐖𝐡𝑖 + 𝐚𝑇trg𝐖𝐡𝑘 + 𝐛𝑇src�̃�𝐙𝑖 + 𝐛𝑇trg�̃�𝐙𝑘 )) , (17) where 𝐙 ∈ ℝ𝑁𝑥𝑦𝑧×3𝑁𝑡 is obtained from the reconstruction �̂� at the previous training epoch by stacking the following physical quanti- ties: 𝐙 = { ‖∇𝜌‖2 ∈ ℝ𝑁𝑥𝑦𝑧×𝑁𝑡 , ‖∇𝑝‖2 ∈ ℝ𝑁𝑥𝑦𝑧×𝑁𝑡 , max ( 0,−div(𝑢, 𝑣,𝑤) ) ∈ ℝ𝑁𝑥𝑦𝑧×𝑁𝑡 } . (18) The density gradient alone is able to capture information on both shocks and contact discontinuities indiscriminately, since they both involve density jumps. The combination of density and pressure gradients however enables the attention mechanism to implicitly Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 6 R. Cassia and R. Kerswell distinguish these two wave types since entropy 𝑠 = ln(𝑝∕𝜌𝛾 ) increases in the case of shocks. Lastly, the divergence of velocity is present to enable explicit identification of shocks by finding compressible regions. We refer to our modified, shock-aware GATConv as SA-GATConv. The final message passing layer we explore is SAGEConv which aggregates over a sampled subset of a node’s neighbourhood, enabling scalability [4]: 𝐡out𝑖 = 𝜎 ( 𝐖1𝐡𝑖 +𝐖2 ⋅ aggregate ( {𝐡𝑗 , ∀𝑗 ∈ samp(𝑖) ⊆  (𝑖)} )) , (19) where the aggregation function is either mean, max-pool, or LSTM-based. 3.3.4. Candidates for  We now explore candidates for  , the module for capturing global context. A popular choice for capturing global context is the scaled dot-product attention proposed by Vaswani et al. [8]: 𝐡out = softmax ( 𝐐𝐊𝑇 √ 𝑑 ) 𝐕, (20) where 𝐐, 𝐊, and 𝐕 are the queries, keys and values given by 𝐐 = 𝐡𝐖𝐐, 𝐊 = 𝐡𝐖𝐊, and 𝐕 = 𝐡𝐖𝐕, and 𝑑 is the number of hidden channels. This form of attention is however quadratic in complexity and therefore unsuitable for large graphs. A linear-complexity alternative proposed by Katharopoulos et al. [12] aims to approximate the softmax function using kernel functions 𝜙(⋅): 𝐡out = ( 𝜙(𝐐) ( 𝜙(𝐊)𝑇𝐕 )) ( 𝜙(𝐐) ( 𝜙(𝐊)𝑇 𝟏 )) , (21) where the kernel function could be exponential, sigmoid, ReLU, ELU + 1, etc. Another way of capturing global context is the Exphormer attention, which is also linear in complexity and takes a graph-centric approach. If we view 𝐡 graphically, then for node 𝑖 (corresponding to 𝐡𝑖) the attention (and its propagation) are computed as [40]: 𝐡out𝑖 = ∑ 𝑗∈eg(𝑖) 𝐎𝑖𝑗𝐕𝑗 ∑ 𝑗∈eg(𝑖) 𝐎𝑖𝑗 , 𝐎𝑖𝑗 = exp ( 𝐐𝑖𝐊𝑗 √ 𝑑 ) , (22) where eg(𝑖) denotes the neighbours of node 𝑖 as defined by an expander graph of 𝐡. Expander graphs are sparse approximations of complete graphs which introduce edge connections between nodes that are distant in the original graph. For details on how expander graphs are generated, we refer the reader to Shirzad et al. [40]. The expander graph is usually complemented with a small number of virtual nodes which form edge connections with all the nodes of the expander graph. These global nodes allow for communication between nodes that are not directly connected in the expander graph, introducing another flavour of global attention. The final form of  we explore is the Mamba-2 module, which is best explained schematically as in Fig. 4. Mamba-2 does not explicitly compute attention scores to capture global context. Rather, it relies on a gating mechanism and a selective state space Fig. 4. Mamba-2 module, which acts on 𝐡, and outputs 𝐡out. The module takes on the structure of a gating mechanism which, along with the SSM layer, helps determine which information to discard and retain. The SSM updates the intermediate input 𝐡′ through a fixed size latent variable, and its selectivity arises from making the weights 𝐖1, 𝐖2, and 𝐖3 themselves input-dependent. The module also comprises of projection layers as well as a normalization layer that improves stability. 𝜎 denotes the SiLU activation function. Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 7 R. Cassia and R. Kerswell model to determine which information to discard and retain. Examining Fig. 4, the state space model updates the intermediate input 𝐡′ through a fixed size latent variable 𝚿, and its selectivity arises from making the weights 𝐖1, 𝐖2, and 𝐖3 themselves input-dependent [14,15]. If we view the intermediate input as a sequence (rather than as a graph) indexed by 𝑛, then the recurrent state-space model is defined as: 𝚿(𝑛) = 𝐖1𝚿(𝑛−1) +𝐖2𝐡′(𝑛), (23) 𝐲(𝑛) = 𝐖3𝚿(𝑛). (24) Mamba-2 is linear in complexity and has been proposed as an alternative to the transformer model. We remark that a crucial feature of Mamba-2 is that weight 𝐖1 has a scalar times identity matrix structure. By imposing this structure on 𝐖1, a duality can be derived between state space models and attention, enabling the design of an efficient algorithm for computing the state space model which has linear complexity whilst making use of the GPU-friendly matrix multiplications of attention [15]. 3.4. Multi-loss function To learn the weights of function Θ, we optimize a data loss data and a physical loss phy with respect to the weights. The data loss is computed only for the initially known nodes as defined by 𝐦 = 𝐦(0): data = 1 𝑁𝑡𝑐 ∑ 𝐦 ‖ ‖ ‖ ‖ 𝐦⊙ ( 𝐗 − �̂� ) ‖ ‖ ‖ ‖ 2 2 . (25) For the physical loss, we first map reconstruction �̂� back to conserved variables �̂� ∈ ℝ𝑁𝑡×𝑁𝑐×𝑁𝑥×𝑁𝑦×𝑁𝑧 , which we then use to compute a Godunov loss function2: phy = 1 𝑁 ∑ 𝑐 ∑ 𝑡 ∑ 𝑖, 𝑗, 𝑘 ( �̂�(𝑡+1, 𝑐) 𝑖, 𝑗, 𝑘 − �̂�(𝑡, 𝑐) 𝑖, 𝑗, 𝑘 + 𝜆𝑥 [ Δ𝐅(𝑡, 𝑐) 𝑖, 𝑗, 𝑘 ] + 𝜆𝑦 [ Δ𝐆(𝑡, 𝑐) 𝑖, 𝑗, 𝑘 ] + 𝜆𝑧 [ Δ𝐇(𝑡, 𝑐) 𝑖, 𝑗, 𝑘 ])2 , (26) Δ𝐅(𝑡, 𝑐) 𝑖, 𝑗, 𝑘 = 𝐅(𝑡, 𝑐) 𝑖+ 1 2 , 𝑗, 𝑘 − 𝐅(𝑡, 𝑐) 𝑖− 1 2 , 𝑗, 𝑘 , (27) Δ𝐆(𝑡, 𝑐) 𝑖, 𝑗, 𝑘 = 𝐆(𝑡, 𝑐) 𝑖, 𝑗+ 1 2 , 𝑘 −𝐆(𝑡, 𝑐) 𝑖, 𝑗− 1 2 ,𝑘 , (28) Δ𝐇(𝑡, 𝑐) 𝑖, 𝑗, 𝑘 = 𝐇(𝑡, 𝑐) 𝑖, 𝑗, 𝑘+ 1 2 −𝐇(𝑡, 𝑐) 𝑖, 𝑗, 𝑘− 1 2 , (29) where 𝜆𝑥 = Δ𝑡∕Δ𝑥, 𝜆𝑦 = Δ𝑡∕Δ𝑦, 𝜆𝑧 = Δ𝑡∕Δ𝑧, 𝑁 = 𝑁𝑥𝑦𝑧𝑁𝑡𝑐 , and Δ𝐅, Δ𝐆, Δ𝐇 are the net 𝑥, 𝑦, 𝑧 fluxes computed from �̂� using a Riemann solver. For the purposes of this paper we use the HLLC Riemann solver to estimate the fluxes, which assumes a three-wave structure at each cell interface, as outlined in Appendix A. We refer the reader to Cassia and Kerswell [41] for more information on Godunov losses and the justification of their use for modeling hyperbolic conservation laws. Rather than optimizing the summation data + phy with one Adam optimizer, we choose to optimize the two losses separately using two separate Adam optimizers. Therefore, within each epoch, two back-propagations are applied, and two updates are made to the model weights. Despite the increased computational overhead, we find that this setup significantly stabilizes training.3 3.5. Handling of boundary conditions Boundary conditions are incorporated into the ML architecture using padding operations. After each GraphGPS layer, we 1) reshape the representation to a 3D grid, 2) remove the first and last values of each spatial dimension, 3) replace the removed values with padded values, and 4) flatten the final representation for subsequent processing. In our case of homogeneous Neumann boundary conditions (i.e. 𝜕𝐧𝐔 = 𝟎), we employ replicative padding. For non-zero Neumann boundary conditions, the padded values must be interpolated from the interior domain. If the boundary conditions were Dirichlet or periodic, constant or circular padding should be employed, respectively. We also note that boundary conditions could be more mathematically complex (coupled, non-linear, time-dependent), and act on irregular, curved, or moving boundaries. Exploring the implementation of such boundary conditions into ML architectures is an interesting topic but is beyond the scope of this paper. 4. Experiments This section is divided into four parts. Firstly, we perform an ablation study on the message-passing and global attention compo- nents of the GraphGPS framework. Secondly, we examine the performance of guided-message passing compared to standard, dense message-passing under different node connectivities and initializations. Thirdly, we examine performance of our GraphGPS model 2 We use here notation (𝑖, 𝑗, 𝑘) to denote spatial indices in 3D. This notation is not to be confused with earlier sections where notation (𝑖, 𝑗) is used to denote graph nodes. 3 Specifically, we did not encounter any NaN loss values with this setup, which we did encounter when using a single additive loss  = data + phy. Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 8 R. Cassia and R. Kerswell Fig. 5. Random and biased distribution of known points (sensor locations) on the faces of the Riemann cube for Case 6. The density field is shown. against numerous benchmarks. Lastly, we examine the performance gains resulting from the physics-based loss and optimal sensor placement. All models examined are trained to 5000 epochs using an Adam optimizer, with a learning rate of 10−4, on a NVIDIA A100 GPU. A model is trained for each flow case outlined in Section 3.2. The number of hidden channels (or projected dimension) is set to 64 for all models. We explore a grid of spatial resolution 643 and time resolution 5 × 10−4. For visualization, we find that plotting diagonal slices through the Riemann cube, defined by normal vectors (±1,±1,±1)𝑇 , is most effective for revealing the 3D nature of the flow. Fig. 5 shows the distribution of observed points (sensor locations) on the faces of the Riemann cube for Case 6, to give a sense of the level of sparsity we explore. In tangible terms, 90% of points are unknown. We explore two types of sensor distributions: a purely random sensor distribution, and a more optimal distribution that has bias towards sharp flow gradients. The latter assumes prior knowledge of the flow and results in a non-uniform (or non-homogeneous) sensor distribution. Unless stated otherwise, all experiments assume the random sensor distribution, where all cases explored have the same random distribution of known points. Sensor locations for all cases can be found in Appendix D. 4.1. Ablation study For all experiments explored in this subsection, we use guided message-passing with 𝜆 = 3 and masked projection, which are outlined in Sections 3.3.1 and 3.3.2, respectively. We first examine the effect of using different message-passing layers,  , on the performance of the framework in Fig. 2, whist keeping  the same (Mamba-2). We examine the performance when using GATConv, our SA-GATConv, GCNConv, and SAGEConv. See Section 3.3.3 for an overview of these layers. For GATConv and SA-GATConv, we use one attention head. For SAGEConv we use max pooling as the aggregation method. Fig. 6 shows the density plots and error for each message-passing layer and for each case. We find that SA-GATConv and GATConv give the most faithful reconstructions both visually and numerically. On the other hand, the susceptibility of GCNConv to oversmoothing can be observed where there are shocks and slip lines. SAGEConv performs the worst and gives noisy reconstructions. Table 1 compares the computational expense of GraphGPS when using the different message-passing layers. Due to their attention mechanisms, GATConv and SA-GATConv require more memory and time. Furthermore the modified attention mechanism of SA-GATConv slightly increases the overhead compared to GATConv. Examining Fig. 6 for the top performers in more detail, we find that SA-GATConv yields slightly lower errors than GATConv. It is also interesting to examine the physical features of the cases in more detail for these two message-passing layers, which appear sharper for SA-GATConv. For Case 1, we observe this in the shock and slip line towards the bottom right. For Case 2, we observe the same at the tips of the arrow-like subsonic region produced from the interaction of the initial shock waves. The same can be observed in Case 3 for the shock that bends towards the center and dissolves into rarefaction fans. For Case 4, this observation can be made for the slip line that bends the rarefaction, as well as for the bow-shock on the bottom right. In Case 5, the bent slip lines have a more pronounced shape for SA-GATConv that better match the truth compared to GATConv. Finally, Case 6 shows the most significant difference in shock definition between the two message-passing layers when observing the two shock fronts towards the left of the density plots. Next, we examine the effect of different methods for capturing global context,  , on the performance of the framework in Fig. 2, whist keeping  the same (SA-GATConv). We examine the performance when using Mamba-2, linear transformer, and Exphormer. Table 1 Compute information of different  candidates when combined with Mamba-2, along with the use of masked projection and guided message-passing (𝜆 = 3). SA-GATConv GATConv GCNConv SAGEConv Max memory allocated (GB) 37.70 36.90 15.40 15.10 Avg time per epoch (s) 1.64 1.60 1.43 1.40 Total training time (h) 2.28 2.22 1.99 1.95 Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 9 R. Cassia and R. Kerswell Fig. 6. Performance of different  candidates when combined with Mamba-2, along with the use of masked projection and guided message- passing (𝜆 = 3). The relative L2 error across all physical channels in 𝐗 is indicated on the top left of each subplot, and the density field is plotted with values normalized between [0,1]. The normal vectors that define the planes of view are indicated at the bottom for each case. The full 3D density fields for the cases are shown in Fig. 1. The corresponding error fields for density are shown in Appendix F, Fig. F.1. Table 2 Compute information of different  candidates when combined with SA-GATConv, along with the use of masked projection and guided message-passing (𝜆 = 3). Mamba-2 Exphormer Transformer Max memory allocated (GB) 37.70 77.00 37.80 Avg time per epoch (s) 1.63 2.59 2.50 Total training time (h) 2.26 3.60 3.48 See Section 3.3.4 for an overview of these modules. For all three modules we use 16 heads. For the linear transformer we use kernel function 𝜙 = ELU(𝑥) + 1. For the Exphormer, the degree of the expander graph is 10, and the number of virtual nodes is 5. Examining Fig. 7 we find that the linear transformer marginally outperforms the other two candidates, with the exception of Case 2 where the reconstruction appears least faithful. Of more significance is the computational expense of the three candidates outlined in Table 2, where Mamba-2 has the least overhead in terms of time and memory. Despite the sparsification of the graph used in Exphormer, the memory consumption is almost double that of the other candidates, due to the propagation step and due to the introduction of additional edge connections through virtual nodes. It is evident from this ablation study that  is more influential to the performance of GraphGPS compared to  , in the context of our reconstruction task. Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 10 R. Cassia and R. Kerswell Fig. 7. Performance of different  candidates when combined with SA-GATConv, along with the use of masked projection and guided message- passing (𝜆 = 3). The relative L2 error across all physical channels in 𝐗 is indicated on the top left of each subplot, and the density field is plotted with values normalized between [0,1]. The normal vectors that define the planes of view are indicated at the bottom for each case. The full 3D density fields for the cases are shown in Fig. 1. The corresponding error fields for density are shown in Appendix F, Fig. F.2. 4.2. Guided message-passing vs. dense message-passing For all experiments explored in this subsection, we use SA-GATConv for  and Mamba-2 for  , which are explained in detail in Sections 3.3.3 and 3.3.4, respectively. We are interested to see how guided message-passing performs in comparison to dense-message passing. An overview of guided message-passing is outlined in Section 3.3.1. By dense-message passing, we mean that for all layers of GraphGPS, each node is connected to all its immediate neighbours. Like guided message-passing, the immediate neighbours could be orthogonal neighbours (𝜆 = 1), orthogonal+diagonal neighbours (𝜆 = 2), or orthogonal+diagonal+corner neighbours (𝜆 = 3). It is interesting to compare the performance of guided and dense message-passing under these three different types of edge connectivities, assuming masked projection for all runs. This is done in Fig. 8. In terms of error we observe only a marginal difference in performance between the message-passing types at 𝜆 = 3 and 𝜆 = 2, with no significant difference from a visual point of view. We see however that performance significantly degrades for both message-passing types when 𝜆 = 1, indicating that the additional diagonal neighbours are critical to the reconstruction process. Furthermore, for 𝜆 = 1, guided message-passing performs significantly worse than dense message-passing for Cases 2, 3, 5, and 6. We thus conclude that in the case of 𝜆 = 2 and 𝜆 = 3, the use of guided message-passing, which is sparser in terms of number of edge connections, does not come at the cost of performance degradation, as in the case of 𝜆 = 1. This sparseness of guided message-passing also results in reduced computational overhead as indicated in Table 3, especially in terms of memory. Also of note is that the savings also appear less significant going from 𝜆 = 2 to 𝜆 = 1. Another performance gain of guided message-passing appears to be in terms of initial convergence behaviour. Fig. 9 plots the evolution of error in the first 500 epochs, where it is evident that guided message-passing at 𝜆 = 2 and 𝜆 = 3 tend towards a stable solution quicker than the other settings, because they do not encounter plateaus as the other settings do. These plateaus are longer and more frequent at 𝜆 = 1 for both message-passing types, and are accompanied by sudden jumps in error, which indicate unstable training. This may explain the degraded reconstructions at 𝜆 = 1. It is also of interest to compare the two message-passing types when the input projection is masked, or normal (unmasked). Section 3.3.2 outlines masked projection. Where unmasked projection is used, we initialize unknown nodes using feature propagation as is usually done in the literature for graph-based reconstructions [27,32].4 For all runs in Fig. 10 we use 𝜆 = 3. Figs. 10 and 11 suggest there is no notable improvement in accuracy or stability when introducing unmasked projection (with feature propagation) 4 Feature propagation is a pre-processing step where missing features are interpolated from known features using propagation. Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 11 R. Cassia and R. Kerswell Fig. 8. Performance of guided/dense message-passing with 𝜆 = {1, 2, 3}. We use SA-GATConv and Mamba-2 for  and  , respectively, and use masked projection. The relative L2 error across all physical channels in 𝐗 is indicated on the top left of each subplot, and the density field is plotted with values normalized between [0,1]. The normal vectors that define the planes of view are indicated at the bottom for each case. The full 3D density fields for the cases are shown in Fig. 1. The corresponding error fields for density are shown in Appendix F, Fig. F.3. Table 3 Compute information of guided/dense message-passing with 𝜆 = {1, 2, 3}. We use SA-GATConv and Mamba-2 for  and  , respectively, and use masked projection. GMP, 𝜆 = 3 DMP, 𝜆 = 3 GMP, 𝜆 = 2 DMP, 𝜆 = 2 GMP, 𝜆 = 1 DMP, 𝜆 = 1 Max memory allocated (GB) 37.70 41.80 29.80 32.90 28.30 29.70 Avg time per epoch (s) 1.63 1.75 1.46 1.54 1.63 1.67 Total training time (h) 2.26 2.43 2.03 2.14 2.26 2.32 Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 12 R. Cassia and R. Kerswell Fig. 9. Convergence behaviour of guided/dense message-passing with 𝜆 = {1, 2, 3} over the first 500 epochs. We use SA-GATConv and Mamba-2 for  and  , respectively, and use masked projection. Table 4 Recommended setup of GraphGPS based on parameters explored in Sections 4.1 and 4.2.   MP Type 𝝀 Projection SA-GATConv Mamba-2 Guided 2 Masked and dense message-passing. This suggests that the zeroed representation of unknown nodes in the projected space when using masked projection is sufficient, and no pre-processing such as feature propagation is required. This also justifies the use of masked projection, which unlike normal projection, is agnostic to the initialization of unknown nodes. For Case 2, one can even observe a degraded reconstruction when using unmasked projection, for both guided and dense message-passing. This appears to coincide with the unstable jumps in error observed in Fig. 11 for Case 2, which is not encountered for masked, guided message-passing. Table 4 gives the final recommended setup of GraphGPS based on parameters explored in Sections 4.1 and 4.2, taking into account both accuracy and computational expense. 4.3. Comparison with benchmarks We compare our GraphGPS architecture with a number of benchmarks, such as graph attention network (GAT), linear transformer, autoencoder, and 3D convolutional neural network (3D CNN), as well as classical benchmarks such as Kriging interpolation and gappy proper orthogonal decomposition (GPOD). The architectural details of the ML models are outlined in Appendix G. For our GraphGPS architecture, we use SA-GATConv and Mamba-2 for  and  , respectively, along with guided message-passing (𝜆 = 3), and masked projection. From Fig. 12, we find that our model has superior accuracy compared to the benchmarks for all cases, albeit at an increased computational cost as shown in Table 5. All the benchmarks result in over-smoothing of shocks and sliplines. Moreover, the transformer and autoencoder models produce jagged contours, as does the 3D CNN but to a lesser extent. The 3D CNN also produces noisy artifacts. Kriging interpolation and GPOD also produce jagged contours. We also compare our architecture with the benchmarks when the sensor/known values are corrupted with noise, at a 25dB signal-to-noise ratio (SNR). The results are shown in Appendix E where we observe similar trends, with superior performance of our architecture across all cases. All the benchmarks with the exception of GAT also result in noisier reconstructions. 4.4. Performance gains of physical loss and biased sensor placement For all experiments explored in this subsection, we use SA-GATConv for  and Mamba-2 for  . We also use guided message passing with 𝜆 = 3 and masked projection. Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 13 R. Cassia and R. Kerswell Fig. 10. Performance of masked/unmasked projection and guided/dense message-passing (𝜆 = 3). We use SA-GATConv and Mamba-2 for  and  , respectively. The relative L2 error across all physical channels in 𝐗 is indicated on the top left of each subplot, and the density field is plotted with values normalized between [0,1]. The normal vectors that define the planes of view are indicated at the bottom for each case. The full 3D density fields for the cases are shown in Fig. 1. The corresponding error fields for density are shown in Appendix F, Fig. F.4. Fig. 11. Convergence behaviour of masked/unmasked projection and guided/dense message-passing (𝜆 = 3) over the first 500 epochs. We use SA-GATConv and Mamba-2 for  and  , respectively. Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 14 R. Cassia and R. Kerswell Fig. 12. Performance of our architecture against different benchmarks. For our architecture, we use SA-GATConv and Mamba-2 for  and  , respectively, along with guided message-passing (𝜆 = 3) and masked projection. The relative L2 error across all physical channels in 𝐗 is indicated on the top left of each subplot, and the density field is plotted with values normalized between [0,1]. The normal vectors that define the planes of view are indicated at the bottom for each case. The full 3D density fields for the cases are shown in Fig. 1. The corresponding error fields for density are shown in Appendix F, Fig. F.5. Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 15 R. Cassia and R. Kerswell Table 5 Compute information of our architecture against different ML benchmarks. For our ar- chitecture, we use SA-GATConv and Mamba-2 for  and  , respectively, along with guided message-passing (𝜆 = 3) and masked projection. Ours GAT Transformer Autoencoder 3D CNN Max memory allocated (GB) 37.70 29.00 9.30 6.20 4.30 Avg time per epoch (s) 1.63 0.84 1.69 0.58 0.56 Total training time (h) 2.26 1.16 2.34 0.81 0.77 Fig. 13. Performance when using random/biased sensor locations with/without the use of the physics-based loss. We use SA-GATConv and Mamba- 2 for  and  , respectively, and use masked projection and guided message-passing (𝜆 = 3). The relative L2 error across all physical channels in 𝐗 is indicated on the top left of each subplot, and the density field is plotted with values normalized between [0,1]. The normal vectors that define the planes of view are indicated at the bottom for each case. The full 3D density fields for the cases are shown in Fig. 1. The corresponding error fields for density are shown in Appendix F, Fig. F.6. Fig. 13 examines the performance of GraphGPS when biasing the sparsely known points towards sharp gradients5, see Appendix D for the sensor placements for all cases. As expected, performance gains are noticable for all cases in terms of reconstruction error. Through this we also demonstrate how our framework can handle non-uniform/non-homogeneous sensor placements. The biasing of sensors means slightly more GraphGPS layers are required for guided-message passing to connect all the nodes with each other, resulting in slightly increased computational overhead as per Table 6. Also from Fig. 13 we examine how GraphGPS performs when trained without the physical loss and using the data loss only (��phy) where we see a slight performance dip compared to utilizing the physical loss, as well as reduced sharpness at discontinuities as in Cases 2, 4, and 6. The slight performance gain of the physical loss is because it encourages adherence to physical laws and because it is evaluated using all points (both known and unknown). By comparison, the data loss is only evaluated using known points only. We do however see significantly reduced overhead according to Table 6 due to the elimination of an additional loss evaluation and backpropagation, and so the use of just the data loss offers a cheaper alternative where compute is scarce. 5 This of course requires some prior knowledge about the flow. Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 16 R. Cassia and R. Kerswell Table 6 Compute information when using random/biased sensor locations with/without the use of the physics-based loss. We use SA-GATConv and Mamba-2 for  and  , respectively, and use masked projection and guided message-passing (𝜆 = 3). Random, phy Biased, phy Random, ��phy Max memory allocated (GB) 37.70 45.08 36.50 Avg time per epoch (s) 1.63 1.92 0.81 Total training time (h) 2.26 2.65 1.12 5. Conclusion This study explores the use of a GraphGPS framework for the reconstruction of the 3D Riemann problems from sparse data, where 90% of points are missing. The key feature of GraphGPS is that it combines the benefits of 1) message-passing modules, 2) modules that capture global context, and 3) positional encodings. For message-passing, we introduce a modified GAT layer, SA-GATConv, that leverages information on shocks and discontinuities in its attention mechanism. We also introduce guided message-passing, which sparsifies node aggregation by enforcing information to flow strictly from known nodes only. To make the model agnostic to the initialization of unknown nodes we make use of masked projection, which zeroes the representation of unknown nodes in the projected space of GraphGPS. An ablation study on the modules of GraphGPS indicates SA-GATConv to be the superior message-passing module in terms of accuracy, enabling sharper reconstructions of shocks and discontinuities. From the ablation study we also find that the Mamba-2 module is the most efficient way for capturing global context whilst yielding acceptable accuracies. Experiments also show that guided- message passing reduces the time and memory required for reconstruction compared to (standard) dense message passing, whilst being on-par in terms of accuracy. Guided message-passing also exhibits more stable training than dense message-passing based on initial convergence behaviour. Furthermore we find that masked projection of the input yields similar accuracies to normal projection, even when the value of unknown nodes is pre-processed via feature propagation in the case of normal projection. Additionally, we find that our GraphGPS framework outperforms a number of different benchmarks, both ML-based and classical, even when sensor/known values are noisy. Finally, we show that the framework can benefit from a more optimal and non-homogeneous sensor placement and verify the performance gain of using the physical loss (in addition to the data loss). The most notable limitation encountered in this study is the relatively large memory footprint of SA-GATConv for message-passing, despite its superior accuracy. This is due to the computation of the attention scores when aggregating information from neighbouring nodes. Future work should aim to alleviate this memory requirement such that finer resolutions could be explored. One way to mitigate this would be to reduce parameter 𝜆 of guided message-passing from 3 to 2. Another possible way would be to make guided message-passing act on known-unknown node connections only, which would be an interesting extension.6 Another limitation is that we also train a model for each 3D Riemann configuration separately, and it would be interesting to explore whether a model can generalize across multiple configurations, given a fixed distribution of known points. This study also assumes that known and unknown points/nodes are arranged as a uniform grid, and in a cubic domain. In practice, grids tend to be unstructured/non-uniform and domain geometries tend to be complex and so this is therefore another direction to explore. One possible way of adapting the GraphGPS framework to unstructured/non-uniform grids and complex geometries would be to implement a pre-processing step that projects the unstructured/non-uniform grid (and its values) onto a uniform cubic grid, so that the reconstruction occurs in this simpler grid space. One can then implement a post-processing step that projects the output of the framework back to the original grid space. CRediT authorship contribution statement Rami Cassia: Writing – original draft, Visualization, Validation, Software, Project administration, Methodology, Investigation, Formal analysis, Data curation, Conceptualization; Rich Kerswell: Writing – review & editing, Supervision. Code and data Code and data can be accessed via the following link. Data availability The following link gives access to the GitHub repository of the paper, which includes code for data generation. 6 This is as opposed to applying guided message-passing on known-unknown connections and known-known connections, which is what we do. Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 17 https://github.com/RamiCassia/GraphGPS-for-Sparse-Reconstruction https://github.com/RamiCassia/GraphGPS-for-Sparse-Reconstruction R. Cassia and R. Kerswell Declaration of competing interest The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: Rami Cassia reports financial support was provided by Cambridge Commonwealth European and International Trust. If there are other authors, they declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Acknowledgments Rami Cassia acknowledges the financial support provided by the Cambridge Commonwealth, European and International Trust. Appendix A. HLLC flux estimation An outline of the HLLC algorithm is presented in this section. Fig. A.1 illustrates the three-wave model assumed by the HLLC solver. In order to determine the HLLC flux normal to a cell interface for the 3D Euler equations, it is sufficient to consider a 𝑥-split version of the 3D Euler equations because of rotational invariance [42]: 𝜕𝑡 ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ 𝜌 𝜌𝑢 𝜌𝑣 𝜌𝑤 𝐸 ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ + 𝜕𝑥 ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ 𝜌𝑢 𝜌𝑢2 + 𝑝 𝜌𝑢𝑣 𝜌𝑢𝑤 𝑢(𝐸 + 𝑝) ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ = 𝟎 (A.1) We then estimate wave speeds 𝑠𝑙 and 𝑠𝑟 as Toro[43]: 𝑠𝑙 = �̃� − �̃�, 𝑠𝑟 = �̃� + �̃�, (A.2) where the Roe-average velocity and sound speed, �̃� and �̃�, are defined as: �̃� = √ 𝜌𝑙𝑢𝑙 + √ 𝜌𝑟𝑢𝑟 √ 𝜌𝑙 + √ 𝜌𝑟 , �̃� = √ (𝛾 − 1) ( �̃� − 1 2 �̃�2 ) , (A.3) and the Roe-average enthalpy �̃� is defined as: �̃� = √ 𝜌𝑙𝐻𝑙 + √ 𝜌𝑟𝐻𝑟 √ 𝜌𝑙 + √ 𝜌𝑟 , 𝐻 = 𝐸 + 𝑝 𝜌 . (A.4) We then use 𝑠𝑙 and 𝑠𝑟 to compute intermediate speed 𝑠∗ [42]: 𝑠∗ = 𝑝𝑟 − 𝑝𝑙 + 𝜌𝑙𝑢𝑙(𝑠𝑙 − 𝑢𝑙) − 𝜌𝑟𝑢𝑟(𝑠𝑟 − 𝑢𝑟) 𝜌𝑙(𝑠𝑙 − 𝑢𝑙) − 𝜌𝑟(𝑠𝑟 − 𝑢𝑟) . (A.5) The intermediate conservative vectors 𝐔∗𝑙 and 𝐔∗𝑟 are then computed using 𝑠∗, 𝑠𝑙 and 𝑠𝑟 as [42]: 𝐔∗𝜂 = 𝜌𝜂 ( 𝑠𝜂 − 𝑢𝜂 𝑠𝜂 − 𝑠∗ ) ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ 1 𝑠∗ 𝑣𝜂 𝑤𝜂 𝐸𝜂 𝜌𝜂 + (𝑠∗ − 𝑢𝜂) [ 𝑠∗ + 𝑝𝜂 𝜌𝜂 (𝑠𝜂−𝑢𝜂 ) ] ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ , (A.6) where 𝜂 = {𝑙, 𝑟}. Finally, the intermediate flux vectors 𝐅∗𝑙 and 𝐅∗𝑟 are computed from the state vectors so that the flux 𝐅𝑛 𝑖+ 1 2 is computed using Eq. (A.7). 𝐅∗𝜂 = 𝐅𝜂 + 𝑠𝜂(𝐔∗𝜂 − 𝐔𝜂), 𝐅𝑡 𝑖+ 1 2 = ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ 𝐅𝑙 0 ≤ 𝑠𝑙 , 𝐅∗𝑙 𝑠𝑙 ≤ 0 ≤ 𝑠∗, 𝐅∗𝑟 𝑠∗ ≤ 0 ≤ 𝑠𝑟, 𝐅𝑟 0 ≥ 𝑠𝑟. (A.7) Due to rotational invariance, an analogous procedure allows us to determine the 𝑦-flux 𝐆𝑡 𝑗+ 1 2 and the 𝑧-flux 𝐇𝑡 𝑘+ 1 2 . Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 18 R. Cassia and R. Kerswell Fig. A.1. The three-wave model assumed by the HLLC solver. The left and right characteristic lines correspond to the fastest and slowest signals, 𝑠𝑙 and 𝑠𝑟, emerging from the cell interface at 𝑥 = 0. The middle characteristic corresponds to the wave of speed 𝑠∗ which accounts for contact and shear waves. Appendix B. WENO reconstruction The weighted essentially non-oscillatory (WENO) scheme is a method of reconstructing cell averaged values to cell interfaces at a higher order of accuracy. In this section we briefly outline the 5th-order WENO scheme used in generating the ground truth �̄�. We apply the WENO scheme on the conservative variables 𝐔 for each spatial dimension separately. As such we demonstrate the reconstruction step for the 𝑥-dimension only. The procedure is analogous for the 𝑦- and 𝑧-dimensions. The first step is to find candidate stencil approximations. For the 5th-order scheme these are: α0 = 1 3 𝐔𝑖−2 − 7 6 𝐔𝑖−1 − 11 6 𝐔𝑖 (B.1) α1 = −1 6 𝐔𝑖−1 + 5 6 𝐔𝑖 + 1 3 𝐔𝑖+1 (B.2) α2 = 1 3 𝐔𝑖 + 5 6 𝐔𝑖+1 − 1 6 𝐔𝑖+2 (B.3) We then obtain the smoothness indicators β as: β0 = 13 12 ( 𝐔𝑖−2 − 2𝐔𝑖−1 + 𝐔𝑖 )2 + 1 4 ( 𝐔𝑖−2 − 4𝐔𝑖−1 + 3𝐔𝑖 )2 (B.4) β1 = 13 12 ( 𝐔𝑖−1 − 2𝐔𝑖 + 𝐔𝑖+1 )2 + 1 4 ( 𝐔𝑖−1 − 𝐔𝑖+1 )2 (B.5) β2 = 13 12 ( 𝐔𝑖 − 2𝐔𝑖+1 + 𝐔𝑖+2 )2 + 1 4 ( 3𝐔𝑖 − 4𝐔𝑖+1 + 𝐔𝑖+2 )2 (B.6) We then find the non-linear weights ω: ω𝑛 = ζ𝑛 ζ0 + ζ1 + ζ2 , ζ𝑛 = 𝑑𝑛 ( 𝜖 + β𝑛 )2 , ( 𝑑0, 𝑑1, 𝑑2 ) = ( 1 10 , 3 5 , 3 10 ) , 𝑛 ∈ {0, 1, 2}, (B.7) where 𝜖 is a small number to prevent division by 0. The 5th-order approximation at the cell interface 𝐔WENO 𝑖+ 1 2 can then be obtained: 𝐔WENO 𝑖+ 1 2 = ω0 ⊙α0 +ω1 ⊙α1 +ω2 ⊙α2. (B.8) Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 19 R. Cassia and R. Kerswell Appendix C. Initialization values for 3D Riemann problems Initialization values are taken from Balsara[44] and Hoppe et al. [38] (Tables C.1, C.2, C.3, C.4, C.5, C.6, C.7). Table C.1 Octant definitions for domain [0, 1]3. Octant x y z 1 > 0.5 > 0.5 > 0.5 2 ≤ 0.5 > 0.5 > 0.5 3 ≤ 0.5 ≤ 0.5 > 0.5 4 > 0.5 ≤ 0.5 > 0.5 5 > 0.5 > 0.5 ≤ 0.5 6 ≤ 0.5 > 0.5 ≤ 0.5 7 ≤ 0.5 ≤ 0.5 ≤ 0.5 8 > 0.5 ≤ 0.5 ≤ 0.5 Table C.2 Case 1, 𝐽21𝐽32𝑆− 34𝑆 + 41𝐽51𝑆 + 62𝐽73𝑆 − 84𝑅 − 65𝑆 − 76𝐽78𝐽85. Octant 𝜌 𝑢 𝑣 𝑤 𝑝 1 0.6223 0.1000 −0.6259 −0.1000 0.4000 2 0.5313 0.1000 0.1000 −0.8276 0.4000 3 0.6223 0.8259 0.1000 −0.1000 0.4000 4 1.2219 0.1000 0.1000 −0.1000 1.0683 5 0.5197 0.8259 0.1000 −0.1000 0.4000 6 1.0000 0.1000 0.1000 −0.1000 1.0000 7 0.5313 0.1000 0.8276 −0.1000 0.4000 8 0.6223 0.1000 0.1000 −0.6259 0.4000 Table C.3 Case 2, 𝑆+ 21𝑆 − 32𝑆 − 34𝑆 + 41𝑆 + 51𝑆 − 62𝑆 + 73𝑆 − 84𝑆 − 65𝑆 + 76𝑆 + 78𝑆 − 85. Octant 𝜌 𝑢 𝑣 𝑤 𝑝 1 0.5065 0.0000 0.0000 0.0000 0.3500 2 1.1000 0.8939 0.0000 0.0000 1.1000 3 0.5065 0.8939 0.8939 0.0000 0.3500 4 1.1000 0.0000 0.8939 0.0000 1.1000 5 1.1000 0.0000 0.0000 0.8939 1.1000 6 0.5065 0.8939 0.0000 0.8939 0.3500 7 1.1000 0.8939 0.8939 0.8939 1.1000 8 0.5065 0.0000 0.8939 0.8939 0.3500 Table C.4 Case 3, 𝑅+ 21𝑅 − 32𝑅 − 34𝑅 + 41𝑆 − 51𝑆 + 62𝑆 − 73𝑆 + 84𝑅 − 65𝑅 + 76𝑅 + 78𝑅 − 85. Octant 𝜌 𝑢 𝑣 𝑤 𝑝 1 1.0000 0.0000 0.0000 −0.7881 1.0000 2 0.5000 −0.7658 0.0000 −0.7881 0.3789 3 1.0000 −0.7658 −0.7658 −0.7881 1.0000 4 0.5000 0.0000 −0.7658 −0.7881 0.3789 5 0.5000 0.0000 0.0000 0.0000 0.3789 6 1.0000 −0.7658 0.0000 0.0000 1.0000 7 0.5000 −0.7658 −0.7658 0.0000 0.3789 8 1.0000 0.0000 −0.7658 0.0000 1.0000 Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 20 R. Cassia and R. Kerswell Table C.5 Case 4, 𝐽21𝐽32𝐽34𝐽41𝑆− 51𝑅 + 62𝑆 − 73𝑅 + 84𝐽65𝐽76𝐽78𝐽85. Octant 𝜌 𝑢 𝑣 𝑤 𝑝 1 1.5000 0.1000 −0.1500 −0.2000 1.0000 2 1.0000 0.1000 0.1500 −0.3000 1.0000 3 1.5000 −0.1000 0.1500 −0.5000 1.0000 4 1.0000 −0.1000 −0.1500 1.2000 1.0000 5 0.7969 0.1000 −0.1500 0.3941 0.4000 6 0.5197 0.1000 0.1500 −1.0259 0.4000 7 0.7969 −0.1000 0.1500 0.0941 0.4000 8 0.5197 −0.1000 −0.1500 0.4741 0.4000 Table C.6 Case 5, 𝐽21𝐽32𝐽34𝐽41𝐽51𝐽62𝐽73𝐽84𝐽65𝐽76𝐽78𝐽85. Octant 𝜌 𝑢 𝑣 𝑤 𝑝 1 0.5000 −0.2500 −0.5000 −0.5000 1.0000 2 2.0000 −0.2500 0.5000 −0.2500 1.0000 3 0.5000 0.2500 0.5000 0.2500 1.0000 4 1.0000 0.2500 −0.5000 −0.2500 1.0000 5 1.0000 0.2500 −0.2500 −0.5000 1.0000 6 0.5000 0.2500 0.2500 −0.2500 1.0000 7 2.0000 −0.2500 0.2500 0.2500 1.0000 8 0.5000 −0.2500 −0.2500 −0.2500 1.0000 Table C.7 Case 6, 𝑆+ 21𝐽32𝐽34𝑆 + 41𝑆 + 51𝐽62𝑆 + 73𝐽84𝐽65𝑆 + 76𝑆 + 78𝐽85. Octant 𝜌 𝑢 𝑣 𝑤 𝑝 1 0.5313 0.0000 0.0000 0.0000 0.4000 2 1.0000 0.7276 0.0000 0.0000 1.0000 3 0.8000 0.0000 0.0000 −0.7276 1.0000 4 1.0000 0.0000 0.7276 0.0000 1.0000 5 1.0000 0.0000 0.0000 0.7276 1.0000 6 0.8000 0.0000 −0.7276 0.0000 1.0000 7 1.0162 −0.4014 −0.4014 −0.4014 1.4000 8 0.8000 −0.7276 0.0000 0.0000 1.0000 Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 21 R. Cassia and R. Kerswell Appendix D. Sensor placement Figs. D.1 and D.2 show the distribution of known points (sensor locations) on the faces of the Riemann cube for Cases 1–6. Fig. D.1. Random and biased distribution of known points (sensor locations) on the faces of the Riemann cube for Cases 1–3. The density fields are shown. Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 22 R. Cassia and R. Kerswell Fig. D.2. Random and biased distribution of known points (sensor locations) on the faces of the Riemann cube for Cases 4–6. The density fields are shown. Appendix E. Comparison with benchmarks for noisy sensors Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 23 R. Cassia and R. Kerswell Fig. E.1. Performance of our architecture against different benchmarks with noise added to the values at known points (SNR = 25 dB). For our architecture, we use SA-GATConv and Mamba-2 for  and  , respectively, along with guided message-passing (𝜆 = 3) and masked projection. The relative L2 error across all physical channels in 𝐗 is indicated on the top left of each subplot, and the density field is plotted with values normalized between [0,1]. The normal vectors that define the planes of view are indicated at the bottom for each case. The full 3D density fields for the cases are shown in Fig. 1. The corresponding error fields for density are shown in Appendix F, Fig. F.7. To get a sense of the level of noise added, see Figs. E.2 and E.3. Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 24 R. Cassia and R. Kerswell Fig. E.2. Random distribution of known points (sensor locations) on the faces of the Riemann cube, corrupted with noise (SNR = 25 dB), for Cases 1–3. The density fields are shown. Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 25 R. Cassia and R. Kerswell Fig. E.3. Random distribution of known points (sensor locations) on the faces of the Riemann cube, corrupted with noise (SNR = 25 dB), for Cases 4–6. The density fields are shown. Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 26 R. Cassia and R. Kerswell Appendix F. Error fields Fig. F.1. Density error fields corresponding to Fig. 6. Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 27 R. Cassia and R. Kerswell Fig. F.2. Density error fields corresponding to Fig. 7. Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 28 R. Cassia and R. Kerswell Fig. F.3. Density error fields corresponding to Fig. 8. Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 29 R. Cassia and R. Kerswell Fig. F.4. Density error fields corresponding to Fig. 10. Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 30 R. Cassia and R. Kerswell Fig. F.5. Density error fields corresponding to Fig. 12. Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 31 R. Cassia and R. Kerswell Fig. F.6. Density error fields corresponding to Fig. 13. Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 32 R. Cassia and R. Kerswell Fig. F.7. Density error fields corresponding to Fig. E.1. Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 33 R. Cassia and R. Kerswell Appendix G. Benchmark ML models Tables G.1, G.2, G.3 and G.4 summarise the architectural details of the ML benchmarks. Table G.1 Architectural details of 3D CNN. For all Conv3D layers the kernel size is 3, the stride and padding are 1, and replicative padding is used. Layer no. Layer type Input channels Output channels 1 Conv3D+ReLU 5 16 2 Conv3D+ReLU 16 32 3 Conv3D+ReLU 32 64 4–9 Conv3D+ReLU 64 64 10 Conv3D+ReLU 64 32 11 Conv3D+ReLU 32 16 12 Conv3D 16 5 Table G.2 Architectural details of GAT. For all GATConv layers the number of attention heads is 1. Layer no. Layer type Input channels Output channels 1 GATConv+ReLU 10 64 2–9 GATConv+ReLU 64 64 10 GATConv 64 10 Table G.3 Architectural details of autoencoder. Layer no. Layer type Input channels Output channels 1 Linear+ReLU 10 512 2 Linear+ReLU 512 256 3 Linear+ReLU 256 128 4 Linear+ReLU 128 64 5 Linear+ReLU 64 32 6 Linear+ReLU 32 64 7 Linear+ReLU 64 128 8 Linear+ReLU 128 256 9 Linear+ReLU 256 512 10 Linear 512 10 Table G.4 Architectural details of linear transformer. The number of heads for transformer layers is 16. Layer no. Layer type Input channels Output channels 1 Linear 10 64 2–6 MLP+Transformer+MLP 64 64 7 Linear+ReLU 64 32 8 Linear+ReLU 32 16 9 Linear 16 10 Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 34 R. Cassia and R. Kerswell References [1] F. Scarselli, M. Gori, A.C. Tsoi, M. Hagenbuchner, G. Monfardini, The graph neural network model, IEEE Trans. Neural Netw. 20 (1) (2008) 61–80. [2] T.N. Kipf, M. Welling, Semi-supervised classification with graph convolutional networks, (2016). arXiv preprint arXiv:1609.02907 [3] P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Lio, Y. Bengio, Graph attention networks, (2017). arXiv preprint arXiv:1710.10903 [4] W. Hamilton, Z. Ying, J. Leskovec, Inductive representation learning on large graphs, Adv. Neural Inf. Process. Syst. 30 (2017) 1–19. [5] K. Xu, W. Hu, J. Leskovec, S. Jegelka, How powerful are graph neural networks? (2018). arXiv preprint arXiv:1810.00826 [6] D. Bahdanau, K. Cho, Y. Bengio, Neural machine translation by jointly learning to align and translate, (2014). arXiv preprint arXiv:1409.0473 [7] M.-T. Luong, H. Pham, C. Manning, Effective approaches to attention-based neural machine translation, (2015). arXiv preprint arXiv:1508.04025 [8] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. Gomez, L. Kaiser, I. Polosukhin, Attention is all you need, Adv. Neural Inf. Process. Syst. 30 (2017) 1–11. [9] A. Radford, K. Narasimhan, T. Salimans, I. Sutskever, Improving language understanding by generative pre-training (2018). [10] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: pre-training of deep bidirectional transformers for language understanding, in: Proceedings of naacL-HLT, Minneapolis, Minnesota, 1, 2019. [11] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, N. Houlsby, An image is worth 16x16 words: Transformers for image recognition at scale, (2020). arXiv preprint arXiv:2010.11929 [12] A. Katharopoulos, A. Vyas, N. Pappas, F. Fleuret, Transformers are RNNs: fast autoregressive transformers with linear attention, in: International Conference on Machine Learning, PMLR, 2020, pp. 5156–5165. [13] K. Choromanski, V. Likhosherstov, D. Dohan, X. Song, A. Gane, T. Sarlos, P. Hawkins, J. Davis, A. Mohiuddin, L. Kaiser, et al., Rethinking attention with performers, (2020). arXiv preprint arXiv:2009.14794 [14] A. Gu, T. Dao, Mamba: Linear-time sequence modeling with selective state spaces, (2023). arXiv preprint arXiv:2312.00752 [15] T. Dao, A. Gu, Transformers are SSMs: Generalized models and efficient algorithms through structured state space duality, (2024). arXiv preprint arXiv:2405.21060 [16] V.P. Dwivedi, X. Bresson, A generalization of transformer networks to graphs, (2020). arXiv preprint arXiv:2012.09699 [17] D. Kreuzer, D. Beaini, W. Hamilton, V. Létourneau, P. Tossou, Rethinking graph transformers with spectral attention, Adv. Neural Inf. Process. Syst. 34 (2021) 21618–21629. [18] C. Ying, T. Cai, S. Luo, S. Zheng, G. Ke, D. He, Y. Shen, T.-Y. Liu, Do transformers really perform badly for graph representation? Adv. Neural Inf. Process. Syst. 34 (2021) 28877–28888. [19] A. Behrouz, F. Hashemi, Graph mamba: towards learning on graphs with state space models, in: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2024, pp. 119–130. [20] C. Wang, O. Tsepa, J. Ma, B. Wang, Graph-mamba: Towards long-range graph sequence modeling with selective state spaces, (2024). arXiv preprint arXiv:2402.00789 [21] L. Rampášek, M. Galkin, V.P. Dwivedi, A.T. Luu, G. Wolf, D. Beaini, Recipe for a general, powerful, scalable graph transformer, Adv. Neural Inf. Process. Syst. 35 (2022) 14501–14515. [22] E. Rossi, H. Kenlay, M.I. Gorinova, B.P. Chamberlain, X. Dong, M.M. Bronstein, On the unreasonable effectiveness of feature propagation in learning on graphs with missing node features, in: Proceedings of the First Learning on Graphs Conference, PMLR, 2022. [23] R. Everson, L. Sirovich, Karhunen–Loeve procedure for gappy data, JOSA A 12 (8) (1995) 1657–1664. [24] T. Bui-Thanh, M. Damodaran, K. Willcox, Aerodynamic data reconstruction and inverse design using proper orthogonal decomposition, AIAA J. 42 (8) (2004) 1505–1516. [25] A. Vendl, H. Faßbender, Proper orthogonal decomposition for steady aerodynamic applications, PAMM 10 (1) (2010) 635–636. [26] H. Huang, Q. Wang, Z. Wu, A surrogate-based flow-field prediction and optimization strategy for hypersonic thrust nozzle, AIP Adv. 14 (12) (2024) 1–21. [27] G. Duthé, I. Abdallah, S. Barber, E. Chatzi, Graph neural networks for aerodynamic flow reconstruction from sparse sensing, (2023). arXiv preprint arXiv:2301.03228 [28] S. Xu, Z. Sun, R. Huang, D. Guo, G. Yang, S. Ju, A practical approach to flow field reconstruction with sparse or incomplete data through physics informed neural network, Acta Mech. Sin. 39 (3) (2023) 322302. [29] Y. Kim, Y. Choi, B. Yoo, Gappy AE: a nonlinear approach for gappy data reconstruction using auto-encoder, Comput. Methods Appl. Mech. Eng. 426 (2024) 116978. [30] J.-Z. Peng, Z.-Q. Wang, X. Rong, M. Mei, M. Wang, Y. He, W.-T. Wu, Rapid and sparse reconstruction of high-speed steady-state and transient compressible flow fields using physics-informed graph neural networks, Phys. Fluids 36 (4) (2024) 1–19. [31] M.Y. Hosseini, Y. Shiri, Flow field reconstruction from sparse sensor measurements with physics-informed neural networks, Phys. Fluids 36 (7) (2024) 1–15. [32] B.A. Danciu, V.A. Pagone, B. Böhm, M. Schmidt, C.E. Frouzakis, Flow reconstruction in time-varying geometries using graph neural networks, (2024). arXiv preprint arXiv:2411.08764 [33] M. Quattromini, M.A. Bucci, S. Cherubini, O. Semeraro, Graph Neural Networks and Differential Equations: A hybrid approach for data assimilation of fluid flows, (2024). arXiv preprint arXiv:2411.09476 [34] P.C.H. Nguyen, J.B. Choi, Q.-T. Luu, FLRNet: A Deep Learning Method for Regressive Reconstruction of Flow Field From Limited Sensor Measurements, (2024). arXiv preprint arXiv:2411.13815 [35] H.V. Dang, J.B. Choi, P.C.H. Nguyen, FLRONet: Deep Operator Learning for High-Fidelity Fluid Flow Field Reconstruction from Sparse Sensor Measurements, (2024). arXiv preprint arXiv:2412.08009 [36] P. Batten, N. Clarke, C. Lambert, D.M. Causon, On the choice of wavespeeds for the HLLC Riemann solver, SIAM J. Sci. Comput. 18 (6) (1997) 1553–1570. [37] C.W. Schulz-Rinne, Classification of the Riemann problem for two-dimensional gas dynamics, SIAM J. Math. Anal. 24 (1) (1993) 76–88. [38] N. Hoppe, N. Fleischmann, B. Biller, S. Adami, N.A. Adams, A systematic analysis of three-dimensional Riemann problems for verification of compressible-flow solvers, Comput. Fluids 278 (2024) 106298. [39] V.P. Dwivedi, C.K. Joshi, A.T. Luu, T. Laurent, Y. Bengio, X. Bresson, Benchmarking graph neural networks, J. Mach. Learn. Res. 24 (43) (2023) 1–48. [40] H. Shirzad, A. Velingker, B. Venkatachalam, D.J. Sutherland, A.K. Sinop, Exphormer: sparse transformers for graphs, in: International Conference on Machine Learning, PMLR, 2023, pp. 31613–31632. [41] R. Cassia, R. Kerswell, Godunov loss functions for modelling of hyperbolic conservation laws, Comput. Methods Appl. Mech. Eng. 437 (2025) 117782. [42] E.F. Toro, The HLLC Riemann solver, Shock Waves 29 (8) (2019) 1065–1082. [43] E.F. Toro, Riemann Solvers and Numerical Methods for Fluid Dynamics: A Practical Introduction, Springer Science & Business Media, 2013. [44] D.S. Balsara, Three dimensional HLL Riemann solver for conservation laws on structured meshes; application to Euler and magnetohydrodynamic flows, J. Comput. Phys. 295 (2015) 1–23. Computer Methods in Applied Mechanics and Engineering 447 (2025) 118328 35 http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0001 http://arxiv.org/abs/1609.02907 http://arxiv.org/abs/1710.10903 http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0002 http://arxiv.org/abs/1810.00826 http://arxiv.org/abs/1409.0473 http://arxiv.org/abs/1508.04025 http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0003 http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0003 http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0004 http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0005 http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0005 http://arxiv.org/abs/2010.11929 http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0006 http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0006 http://arxiv.org/abs/2009.14794 http://arxiv.org/abs/2312.00752 http://arxiv.org/abs/2405.21060 http://arxiv.org/abs/2405.21060 http://arxiv.org/abs/2012.09699 http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0007 http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0007 http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0008 http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0008 http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0009 http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0009 http://arxiv.org/abs/2402.00789 http://arxiv.org/abs/2402.00789 http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0010 http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0010 http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0011 http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0011 http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0012 http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0013 http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0013 http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0014 http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0015 http://arxiv.org/abs/2301.03228 http://arxiv.org/abs/2301.03228 http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0016 http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0016 http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0017 http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0017 http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0018 http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0018 http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0019 http://arxiv.org/abs/2411.08764 http://arxiv.org/abs/2411.08764 http://arxiv.org/abs/2411.09476 http://arxiv.org/abs/2411.13815 http://arxiv.org/abs/2412.08009 http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0020 http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0021 http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0022 http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0022 http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0023 http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0024 http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0024 http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0025 http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0026 http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0027 http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0028 http://refhub.elsevier.com/S0045-7825(25)00600-0/sbref0028 A physics-augmented GraphGPS framework for the reconstruction of 3D Riemann problems from sparse data 1 Introduction 2 Prior work on flow reconstruction 3 Methodology 3.1 Problem definition 3.2 3D Riemann problems and characteristic waves 3.3 GraphGPS framework 3.3.1 Guided message-passing 3.3.2 Masked projection and positional encoding 3.3.3 Candidates for FMP 3.3.4 Candidates for FGC 3.4 Multi-loss function 3.5 Handling of boundary conditions 4 Experiments 4.1 Ablation study 4.2 Guided message-passing vs. dense message-passing 4.3 Comparison with benchmarks 4.4 Performance gains of physical loss and biased sensor placement 5 Conclusion A HLLC flux estimation B WENO reconstruction C Initialization values for 3D Riemann problems D Sensor placement E Comparison with benchmarks for noisy sensors F Error fields G Benchmark ML models