Studies in History and Philosophy of Science 96 (2022) 51–67 Contents lists available at ScienceDirect Studies in History and Philosophy of Science journal homepage: www.elsevier.com/locate/shpsa Pluralizing measurement: Physical geodesy's measurement problem and its resolution Miguel Ohnesorge Department of History and Philosophy of Science, University of Cambridge, Free School Lane, Cambridge CB2 3RH, United Kingdom A R T I C L E I N F O Keywords: Measurement Coordination Geodesy Geophysics Astronomy Measurement error E-mail address: mo459@cam.ac.uk. 1 In what follows, I take for granted that I am talk distinction between direct and derived measures, b 2 While this notion was popularised by Hans Re quantity concepts and measurement procedures. 3 I use “perturbation” in the sense usually ascrib physical system. Perturbations of the measurement https://doi.org/10.1016/j.shpsa.2022.08.011 Received 7 August 2022 Available online 22 September 2022 0039-3681/© 2022 The Author(s). Published by Els A B S T R A C T Derived measurements involve problems of coordination. Conducting them often requires detailed theoretical assumptions about their target, while such assumptions can lack sources of evidence that are independent from these very measurements. In this paper, I defend two claims about problems of coordination. I motivate both by a novel case study on a central measurement problem in the history of physical geodesy: the determination of the earth's ellipticity. First, I argue that the severity of problems of coordination varies according to scientists' pre- dictive and experimental control over perturbations of the measurement process. Second, I identify a method- ology by which scientists can solve hard problems of coordination and gradually increase their predictive control over perturbations. I dub this methodology ‘operational pluralism’ since it is driven by the introduction of alternative measurement operations that involve different physical indicators. 1. Introduction When conducting derived measurements,1 scientists infer the magnitude of a theoretical parameter from a set of quantitative in- dicators. It has been noted widely that such inferences can be affected by an epistemic circularity (Chang, 2004; van Fraassen, 2008; Mach, 1986; Tal, 2017a). Establishing measurements often requires detailed theoret- ical knowledge about their target parameter, while our theoretical models of that parameter can lack sources of evidence that are inde- pendent from these very measurements. As a consequence, many phi- losophers have argued that justification inmeasurement takes the form of bi-directional problems of coordination.2 Both our measurement proced- ures and theoretical models are modified iteratively to account for prediction-measurement discrepancies, making them cohere with each other as needly as possible. If they are successfully coordinated, mea- surements converge within the space of possible outcomes permitted by our best theoretical model of their target and their former disagreement can be theoretically explained. Coordination is significantly harder to achieve when measuring the parameters of large and partially inaccessible physical systems – the earth ing about derived measurements ut this is not the place to argue f ichenbach (1920, 1932), Mach ( ed to it in applied mathematics. process are physical effects that evier Ltd. This is an open access being a prime example. Despite growing philosophical interest in the geosciences (Bokulich, 2018, 2020; Bokulich & Oreskes, 2017; Miyake, 2015, 2017a, 2017b; Parker, 2014; Smith, 2007; Watkins, 2021), it re- mains insufficiently understood how such complicated epistemic condi- tions affect the dynamics of measurement coordination. In this paper, I analyse a foundational geoscientific measurement problem – the mea- surement of the earth's polar flattening – to develop two interrelated philosophical arguments. The first argument is conceptual, sharpening the existing epistemological vocabulary in light of my case study. I introduce the notion of hard problem of coordination to refer to situations in which scientists can neither predict nor experimentally control the relevant perturbations3 of the measurement process. My second argu- ment is methodological. I argue that hard problems of coordination can be resolved through a diachronic and iterative methodology that I dub operational pluralism. Operational pluralism is distinct from other kinds of pluralism existing in philosophy of science. While existing views focus on the proliferation of theories, models, or taxonomies, operational pluralism denotes a particular methodology in measurement, which aims at isolating and anticipating sources of measurement error. Given that hard problems of coordination are found all across physical science, the and drop the “derived” label. There are good reasons to reject the fundamental or them. 1986) first used “coordination” to describe the dynamic relationship between Roughly, a perturbation is a disturbance of an initial, approximate model of a are not included in the initial model of the measurement process. article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). mailto:mo459@cam.ac.uk http://crossmark.crossref.org/dialog/?doi=10.1016/j.shpsa.2022.08.011&domain=pdf www.sciencedirect.com/science/journal/00393681 www.elsevier.com/locate/shpsa https://doi.org/10.1016/j.shpsa.2022.08.011 http://creativecommons.org/licenses/by/4.0/ https://doi.org/10.1016/j.shpsa.2022.08.011 https://doi.org/10.1016/j.shpsa.2022.08.011 Fig. 1. Meridian ellipse of an ellipsoid of revolution, where a and b are pa- rameters in terms of which polar flattening � f ¼ a�b a � is defined. Wikime- dia commons. M. Ohnesorge Studies in History and Philosophy of Science 96 (2022) 51–67 strategy with which geodesists effectively solved their problem should offer epistemic lessons of general methodological value. In the extensive case study underlying my proposal, I present novel historiographical research reconstructing how geodesists, astronomers, and geophysicists4 first came to measure convergent outcomes for earth's polar flattening between 1880 and 1924. This marked an immense achievement, solving a prestigious measurement problem that had per- sisted since the seventeenth century (Ohnesorge, 2021). In 1924, the International Association of Geodesy accepted a uniquely parametrised ellipsoid model of the earth, motivated by a convergence between all available measurement procedures and significant advances in control- ling them for systematic errors (Torge, 2017, p. 50). As I will show, this convergence was not merely a result of accumulating more data or deriving an accurate model from theory. Rather, it required the use of additional measurement indicators that are subject to different pertur- bational effects, vindicating the value of operational pluralism. The plan is as follows. Section 2 provides a historical and epistemo- logical introduction to geodesy's long-standing measurement problem, together with a brief sketch of the models and measures of the earth's figure. Section 3 contains the bulk of my case study. The key takeaway is that operational pluralism was instrumental for measuring convergent values of the earth's polar flattening. In Section 4, I systematise opera- tional pluralism and defend it against possible objections. 2. Physical geodesy and its measurement problem 2.1. A brief history While geodesy did not become a cohesive discipline until the second half of the nineteenth century, “geodetic” problems were known and studied as such since the seventeenth century. The principal aim of ge- odesists was to determine the figure, gravity field, and interior consti- tution of the earth. These tasks all involve deriving a mathematical model of the rotating earth's shape and density distribution and measuring its theoretical parameters. Since problems of coordination involve an epistemic interdependence between theory and measurement, I have to say some words about the mathematical modelling before moving on to the physical measurement practices. The mathematical “problem of the earth's shape” (Greenberg, 1995) was to derive the earth's general geometric figure and some quantitative limits for its parameters from assumptions about gravitational attraction, the planet's interior density distribution, and its rotational motion. Ap- proaches to the problem assumed that (i) the earth is in a state of hy- drostatic equilibrium,5 and that (ii) its formation can be modelled by treating it as a homogenous fluid. Thus, mathematical geodesy aimed to determine under which conditions homogenous, uniformly rotating fluid bodies whose constituent particles attract according to the inverse-square law of gravity can be in a state of hydrostatic equilibrium. Newton turned this into a feasible mathematical problem by intro- ducing an empirical parameter representing the approximately 1/288 ratio between centrifugal ‘force’ and gravitational acceleration at the equator. His argued that an approximately ellipsoidal spheroid of revo- lution with an ellipticity of 1/230 is the only possible equilibrium figure, noting that inwardly increasing densities inside of the fluid could result in larger ellipticities (Greenberg, 1996; Todhunter, 1873, chap. 1) (Fig. 1).6 Christian Huygens and various French philosophers proposed 4 In what follows, I will sometimes refer to the group of historical actors in question as “geodesists”, because the measurement of the earth's polar flattening was generally understood to be a “geodetic” problem. 5 Attempts at defining hydrostatic equilibrium started with Newton and culminated in Clairaut's idea that the net force acting on any fluid channel be- tween two surface points must be 0, which he articulated through Fontaine's novel partial differential calculus (Greenberg, 1995). 6 In light of several revisions between the different editions of the Principia, the 1/230 value from the third edition is taken as representative here. 52 alternative but ultimately unsuccessful ellipsoidal equilibrium models, which can be derived by replacing Newton's law with their respective theories of gravity, according to which gravity is not a universal force acting on all particles of matter but is directed at the center of planets. Alexis Clairaut articulated the most sophisticated equilibrium theory in 1743, showing that an ellipticity between 1/230 and 1/597 is a necessary condition for uniformly rotating fluid bodies that are composed of different ellipsoidal shells of homogenous density to be in a state of dynamical equilibrium, given that Newton's law applies. Clairaut notably showed that an increasing density towards the center will result in a smaller ellipticity, contrary to what Newton had assumed (Chapin, 1995; Greenberg, 1988).7 To produce evidence for the universal inverse-square law and the global accuracy of their ellipsoid model, physical geodesists had to measure convergent values for the model's parameters. The central parameter that geodesists tried to determine was the earth's polar flat- tening, which is equivalent to the model's ellipticity. For this task, two separate measurement indicators had been proposed by Newton, Huy- gens, and the famous French astronomers Cassini I and II: latitudinal variations in the strength of surface gravity (IP) and latitudinal variations in the lengths of triangulated meridional arcs (IT).8 In both cases, the magnitude of the indicators is assessed at different, astronomically determined coordinates on the earth's surface. After multiple such local measurements, the polar flattening is inferred from the ellipticity of the model which accounts for the latitudinal variations with as little residual error as possible.9 As Fig. 2 below illustrates, such measurements were “theory-mediated” (Harper, 2012; Smith & Seth, 2020). The very defini- tion of polar flattening relied on the theoretically derived ellipsoid model and the amalgamation of different local measurement results assumed their model-conform, elliptic variation with latitude. Edward Sabine's, 1825 survey of latitudinal surface gravity variations offered overwhelming evidence of a systematic disagreement between 7 The competition between the alternative predictions and the controversy about whether Newton's law is compatible with 18th century measurement results are discussed in Chapin (1995) and Ohnesorge (2021). 8 Newton also proposed measurements based on the precession of the equi- noxes. As I show in detail below, it took until the late nineteenth century for such measurements to be empirically feasible. 9 This is a simplified description of these two measures For a more detailed discussion see Ohnesorge (2021). Fig. 2. Measurement inferences from latitudinal variations in pendulum (IP) and arc lengths (IT) to polar flattening. M. Ohnesorge Studies in History and Philosophy of Science 96 (2022) 51–67 the outcomes inferred from IP and IT (Sabine, 1825, p. 341). At the same time, local conflicts between the measured and predicted parameters of surface gravities and curvatures gave evidence for the local shortcomings of the ellipsoid model underlying both measurement procedures (Airy 1826; Bessel, 1838; Gauss, 1828; Laplace, 1796, p. 12). Since the theo- retical ellipsoid model is presumed by any inference to polar flattening, the causes of the inconsistent outcomes could neither be uniquely attributed to the model nor to any specific measurement procedure. Consequently, the discordance between IP and IT confronted geodesists with a problem of coordination. Geodetic Problem of Coordination: To determine whether the theoretical ellipsoid model could accurately represent the earth's global figure despite its local shortcoming, geodesists needed accurate measures. To improve the consistency between their measures, they needed more accurate empirical laws connecting their indicators to a theoretical definition. The available laws, however, relating ellip- ticity to variations in surface gravity and curvature, were defined relative to the theoretical ellipsoid model (Fig. 3). Nineteenth-century geodesists were very aware of this problem. Following the abovementioned results, Carl Friedrich Gauss and George Biddell Airy discussed this measurement problem in two seminal papers during the late 1820s. Both of them, moreover, proposed to solve it through the iterative analysis of prediction-measurement discrepancies, hoping to understand the physical sources of errors in the earth’s heterogenous subterranean density and, if necessary, replace the ellip- soid model with a more sophisticated successor in a piece-meal manner.10 Gauss already suggested the outline for a potential succes- sor model, whose operationalisation required geodesists to determine the extent to which the equipotential surface roughly coinciding with mean sea-level deviates from an ellipsoidal reference surface (Ohnesorge, 2021). Notwithstanding arc and pendulum measurements on a monu- mental scale in Great Britain, Continental Europe, India, Peru, Scandi- navia, and Russia, geodesists did not manage to solve their measurement problem throughout the nineteenth century. Neither was the available data about the distribution and causes of curvature and surface gravity discrepancies sufficient to estimate whether flattening measurements based on the ellipsoid model could, in principle, be made consistent. In the 1880s, IP-values were still concentrated in the range between 1/298–1/310 while IT-values were spread between 1/284 and 10 This corresponds to methodologies discussed by Chang (2004, ch. 4), Harper (2012), and Smith (2014), according to which the discrepancies between a theory (or its associated models) and physical measurements are either repeatedly explained by that theory or lead to its iterative revision. 53 1/292, showing no significant convergence since the beginning of the century (Strasser, 1957, Appendix, 91–93). As things stood then, (i) the value of geodesists' target parameter was underdetermined (providing inconclusive evidence for the ellipsoid model and Newtonian gravita- tion) and (ii) the perturbations acting on IP and IT were not sufficiently understood. Roughly 200 years after Newton had first attempted to derive a model of the earth's figure from his theory of gravity and scattered empirical data, convergent measurements of its defining parameter were still lagging. 2.2. Epistemological assessment I take this case to offer epistemological insights because persistent discordances in theory-mediated measurements are not unique to polar flattening. Gregory Good has illustrated the difficulty of determining the earth's interior composition based on magnetic measurements (Good, 2011). As Gordon Belot and Teru Miyake note, an analogous problem was faced by geophysicists trying to determine the earth's interior density distribution based on gravimetric and seismological measurements at its surface (Belot, 2015; Miyake, 2017b).11 All these problems shared two pertinent features, which, loosely building on Miyake's work, can be characterised as follows: (i) scientists do not have empirical access to a parameter without relying on a idealised theoretical model and (ii) their measurements are subject to multiple overlapping perturbations that they can neither predict theoretically nor shield their measurements against (Miyake, 2011). In what follows, I will refer to such situations as hard problems of coordination. In the case of geodesy, the unaccounted perturbations resulted primarily from ignorance about the earth's irreg- ular topographic and subterranean density distribution. As we will see later, these perturbations can lead to various systematic errors in different measurement procedures. Examples include large, non-elliptic undulations of the terrestrial gravity field that result in mismatches be- tween data from different regions, strong gravity anomalies that severely affect specific measurements, and asymmetries in the earth's general density distribution that affects inferences from astronomical quantities to the earth's figure. To be sure, I do not intend to draw any sharp demarcation between standard and hard problems of coordination, nor can I offer necessary or sufficient conditions. My talk of hard problems of coordination aims to pick out cases that require a different methodological treatment, given the respective degree to which the perturbations of the measure- ment process resist predictive and experimental control. This can be 11 This problem was harder because measurement data was scarcer. Surface gravity measurements are in principle insufficient to uniquely determine the earth's interior structure and seismological measurements can only be conducted after (effectively unpredictable) seismic events. Fig. 3. Epistemic dependence between theoretical ellipsoid model and pendulum and arc measures of the earth's polar flattening. 12 The geoid was the hypothetical successor model that Gauss first imagined in 1828 (see section 2). The Geoid represents an equipotential surface coinciding with mean sea level and offers a closer approximation of the earth's figure and gravity field than any ellipsoid. M. Ohnesorge Studies in History and Philosophy of Science 96 (2022) 51–67 illustrated by a small juxtaposition to other problems of coordination. Eran Tal gives a nice account of how metrologists coordinate the measurement of the standard second, which acts as the basic unit of the Coordinated Universal Time (Tal, 2016, p. 302, p. 326). In the previous International System of Units that Tal discusses, the second was defined as “the duration of 9 192 631 770 periods of the radiation corre- sponding to the transition between the two hyperfine levels of the ground state of the caesium 133 atom” (BIPM, 2006, p. 113). Coordi- nating this theoretical definition with measurements is complicated by the fact that any de facto measured state transitions will be subject to background perturbations introduced by (i) gravitational and (ii) magnetic forces, as well as non-absolute-zero temperatures. The crucial differences to our case, and hard problems of coordination in general, is that metrologists are able to experimentally shield atomic clocks from several perturbations and predict the specific uncertainties arising from the effects of (i) and (ii) in different physical realisations of the ideal caesium 133 atom. This means that their models of the measurement process can reliably anticipate sources of error uniquely affecting spe- cific measurement indicators. Coordination is restored without major difficulties since the operative perturbations were indicated by types of clocks that deviate and the extent to which they deviate. Geodesists, in contrast, did not have sufficient theoretical knowledge about the per- turbations resulting from the earth's heterogenous interior and topo- graphic density distribution to identify the unique source of conflicting outcomes. Neither could they dispose of these perturbations by placing the earth in a highly controlled experimental set-up corresponding to the sophisticated machinery of modern atomic clocks. As a conse- quence, any one of their indicators were unavoidably exposed to mul- tiple perturbation arising from the difference between the physical earth and the idealised model used in the measurement inferences. My distinction between standard and hard problems of coordination is related to but not identical to a distinction that some philosophers draw between two kinds of epistemic “coordination” in scientific practice. Measurement coordination in a narrow sense – and as discussed in this paper – refers to the coordination between physical indicators and models of the target system. This process is also often referred to as “correlation” or “calibration” (Boumans, 2007; Heidelberger, 1994; Tal, 2017b). In a broader sense, models are also coordinated with abstract theories that describe and predict their parameters, sometimes involving stipulative principles (Reichenbach, 1932; Stump, 2015). Put this way, the ellipsoid is a model of the earth that is coordinated (in the broad sense) with Newtonian gravitation based on the principles of planetary equilibrium figures and several idealising assumptions about its 54 rotational motion and internal density. This model, in turn, is coordi- nated (in the narrow sense) with the measured variations in the length of seconds-pendulums and meridional arcs. As Flavia Padovani and Michele Luchetti rightly point out, the two sense of coordination are often deeply intertwined in scientific practice because we use theoretical assumptions to construct measures, identify errors, and adjust models (Luchetti, 2020; Padovani, 2017). Framed in this vocabulary, hard problems of coordi- nation occur if coordination in the broad sense involves idealised rep- resentations of the target system, and the perturbations resulting from these idealisations cannot be experimentally controlled or predicted based on independent theoretical assumptions. As a consequence, sci- entists are unable to predict or explain why measurement coordination in the narrow sense fails. 3. Operational pluralism as a guide to coordination Geodesists gradually resolved their measurement problem between 1880 and 1924. On my reading, their success was predicated on following a particular methodology, which I will refer to as operational pluralism in what follows. In this section, I carve out the structure of this methodology by investigating how geodetic practice changed between 1880 and 1924. A good point of departure to understand this period of geodetic measurement is the work of German geodesist Robert Friedrich Helmert, who is widely considered as the “father” of modern physical geodesy. The period in question overlaps with the climax of Helmert's career, much of which was devoted to overcoming the discordances in ellipticity measurements (Reigber, 2017; Torge, 2005). Helmert pub- lished his two-volume classicMathematical and Physical Theories of Higher Geodesy in the early 1880s. The book was widely translated and used as teaching resource across the world (Torge, 2009, pp. 237–38). Coun- tering a growing frustration about the remaining discordances between IP- and IT-values. In the book, Helmert, argued “that the current practice of geodesists, who treat the geoid12 as an ellipsoid of revolution […] appears justified” (Helmert, 1884, p. 91). As we will see in what follows, he linked the ongoing discordance to insufficiently understood pertur- bations of the measurement process, rather than irresolvable flaws in the ellipsoid model or available measurement procedures. Fig. 4. Measurement inference from lunar perturbations to polar flattening. M. Ohnesorge Studies in History and Philosophy of Science 96 (2022) 51–67 Helmert's programmatic claims alone did of course not yet offer any new empirical evidence. As things stood at the beginning of the 1880s, there was no single ellipsoid model consistent with arc and pendulum measurements. Consequently, Helmert needed to show that the different measures could be coordinated successfully. In 1886, two years after the publication of his second textbook, he left his former post as Geodesy professor in Aachen to become the new head of the Royal Prussian Geodetic Institute (RPGI). He would soon be one of the most influential figures in the discipline, owing to the extent of his theoretical and empirical contributions and the leading role of the RPGI in international research. Among other things, the institute hosted the headquarters of the International Geodetic Association, with Helmert operating as head of its central bureau (Torge, 2005, pp. 564–65). Now, he had the means at his disposal to provide empirical evidence for his conjectural claims and pursue the long sought-after measurement convergence. In what follows I reconstruct how Helmert and fellow geodesists, geophysicists, and astronomers, finally achieved convergent measure- ments of polar flattening. The key to this empirical success was the use of a more diverse range of measurement procedures. As will become clear from my exposition, only the last of the three “new” measures was entirely new. Moreover, the basic assumptions underpinning the first two measurements had been all been discussed in Pierre-Simon Laplace's M�ecanique C�eleste, published between 1789 and 1825 (Laplace, 1832, pp. 853-932, esp. 924-932; 1834, pp. 642-665). In all cases, however, ge- odesists only managed to conduct empirically informative measurements after further instrumental and perturbation-theoretic advances throughout the nineteenth century. I begin by surveying the different kinds of measurement procedures involved. Most of them were pop- ularised or outlined in Helmert's canonical 1880 and 1884 textbooks, making them a good starting point for us. 3.1. Astronomical measurements of ellipticity Helmert devoted fifty pages of his 1884 textbook to the relationship between the earth's flattening and astronomical quantities. Principally, there were two such quantities with suitable nomic links to the extent of the earth's flattening: themagnitude of a specific pair of perturbation in the moon's orbit, and the magnitude of the earth's precessional constant.13 Let us denote these two indicators as IM (magnitude of perturbations ins the moon's orbit) and IPC (magnitude of the precessional constant). To employ them, geodesists needed mathematical expressions of the nomic links be- tween the earth's ellipticity and IM and IPC that accounted for the gravi- tational attractions of other celestial bodies. If other phenomena affecting the moon's orbit and earth's precessional constant cannot be sufficiently accounted for by theory, there is no way to isolate the nomic relation of the earth's ellipticity to either the moon's orbit (IM) or the earth's rotation (IPC). Positively put, the use of both measurement indicators requires well- developed perturbation theories. As a consequence, it was only after sig- nificant advances in eighteenth- and nineteenth-century mechanics and 13 Johann Albrecht Euler (the oldest child of Leonhard Euler) also considered latitudinal variations in the moon's elevation as a possible astronomical indi- cator (Euler, 1768). However, geodesists overwhelmingly agreed that it could not be measured with sufficient precision to be serve as indicator of ellipticity (Bruns, 1878, p. 32; Helmert, 1884, p. 460; Todhunter, 1873, p. 447). 55 astronomy that they became attractive for geodesists. Additionally, there was considerably disciplinary inertia among geodesists, so that even seminal nineteenth-century textbooks did not touch upon the relationship between ellipticity and either precession, the moon's parallax, or lunar orbital perturbations (Clarke, 1880; Fischer, 1845, 1846a, 1846b).14 3.1.1. Perturbations of the moon's orbit The first systematic attempt to determine the earth's ellipticity from its perturbational effects on the moon's orbit had been undertaken long before Helmert's time, in the third volume of Laplace'sM�ecanique C�eleste. Between the 1760s and 1780s, the German astronomer Tobias Meyer and the American astronomer Charles Mason observed a perturbation in the moon's longitude that was proportional to the sine of the longitude of the moon's node. The two nodes of the moon's orbit mark the points at which it comes closest to the earth's equator (Fig. 5) Thus, the effect varied with the proximity of the moon's orbit to the flattened earth's equatorial bulge. In 1783, Laplace found a corresponding latitudinal perturbation and argued that both phenomena can be explained by the impact that the earth's polar flattening has on the earth's gravity field (Chapin, 1995, p. 33). If all other parameters characterising the moon's orbit are known, the magnitude of these remaining perturbations could thus be employed as a measurement indicator for the earth's ellipticity. Alexander Bürg subsequently determined the first numerical values for these longitudinal and latitudinal perturbation coefficients based on his moon tables, from which Laplace derived ellipticity values of 1/304.6 and 1/305.5. Given the contemporary scarcity of empirical data, Laplace considered this sufficiently close to his IP-values of 1/321.5 and 1/335.8 from the pre- vious volume of the M�ecanique C�eleste, as well as the IT-value of 1/334.5 that had just been adopted by the French Commission G�en�erale (Chapin, 1995, p. 34). For reasons of bookkeeping, the outline of IM-inferences is noted down in Fig. 4: In the first three decades of the nineteenth century, some notable as- tronomers followed Laplace's example (Airy, 1861; Bürg, 1825, p. 15). Most importantly, the head of Gotha's internationally acclaimed observa- tory and former member of the European Arc Measurement's standing committee, Andreas Hansen, published a monumental two-volume study of the lunar perturbations in 1862 and 1864. In it, he made the following two significant discoveries: (i) Other planets of the solar system have notable perturbational effects on the moon that need to be subtracted from the ellipticity-related perturbations, thus decreasing the implied value of ellipticity (Hansen, 1862, pp. 481–97). (ii) A development of the known expression of the perturbation to higher powers entails different values for the ellipticity-related perturbations that account better for the moon's observed orbit (Hansen, 1864, pp. 272–322). In light of these two insights, Hansen determined the longitudinal and latitudinal perturbation co- efficients caused by the earth's ellipticity with the highest accuracy so far. In his 1884 textbook, Helmert derived two values of (i) 1/295.6 and (ii) 1/300.0 from the numerical outcomes that Hansen's gave for the ellipticity-related (i) longitudinal and (ii) latitudinal perturbations. He recommended their mean (1/297.8) as the preferred ellipticity outcome 14 The famous and influential exception is Louis Puissant's Trait�e de g�eod�esie (1842a, 1842b), earlier versions of which had been published in 1805 and 1819. Even Puissant, however, only discusses precession and lunar perturbations on a purely theoretical level, drawing mostly on Laplace's earlier work. Fig. 5. Sketch of the moon's orbit, illustrating the position of its nodes AN and DN relative to the ecliptic, i.e., the hypothetical plane that intersects with the earth's orbit at every point. Wikimedia commons. Fig. 6. Sketch illustrating the rotation (R), precession (P), and nutation (N)16 of a solid body. Wikimedia commons. M. Ohnesorge Studies in History and Philosophy of Science 96 (2022) 51–67 and, in light of the low uncertainties associated with Hansen's results, assigned a mean error of �2.2. Helmert even remarked that “this mean error estimate, in our view, is rather too large than too small”, given its concordance with the IPmeasurements discussed in the volume (see 3.3). Hansen himself had only derived an ellipticity value from the longitu- dinal coefficient, thus reaching an outcome at the lower end of Helmert's error bar (Helmert, 1884, pp. 468–73). Three years later, the French geodesists and astronomer François F�elix Tisserand also employed IM and derived a flattening of 1/297.2 from Hansen's lunar observations. He presented these results in a talk at the IAG's conference in Paris in 1889, raising further international attention to the geodetic utility of the as- tronomical measure (Tisserand, 1890, pp. 8–9). 3.1.2. The earth's precession The second astronomical quantity that Helmert reintroduced to the wider geodetic community is the lunisolar precession15 (Helmert, 1884, pp. 426–38). Precession refers to a periodic circular movement in the orientation of the earth's rotational axis relative to the ecliptic (the plane that intersects with the earth's orbit around the sun at every point) (Fig. 6). Astronomers can observe the precession by recording periodic changes in the celestial coordinates of fixed stars. While the effects of precession had been observed for centuries, book three of Newton's Principia contained its first quantitative explanation (Newton, 1729, Prop. 3). In line with his theory of the earth's figure, Newton explained the magnitude of precession by appealing to the lunar and solar attrac- tion on the equatorial bulge of the ellipsoidal earth. D'Alambert and Euler's first provided a precise expression to that explanation, using their advances in rigid-body mechanics. Since then, the precession is usually denoted by the precessional constant C� A C, where C denotes the moment of inertia around the earth's equatorial axis and A the moment of inertia around its polar axis (Wilson, 1987). The respective magnitudes of A and 15 Since only the luni-solar precession matters for our concerns, I will simply refer to it as precession in what follows. 56 C can only be derived with the help of additional hypotheses about the planet's interior density distribution. The difference in moments of inertia indicates the different impact that the luni-solar “drag” has on the rotational motion the earth at different latitudes. 16 The nutation is a wave-like perturbation of the precession that was discov- ered and mechanically characterised in the mid-eighteenth century (Bradley, 1748, p. 1; d'Alambert, 1749, pp. 73–80). Fig. 7. Measurement inference from precession of fixed stars to polar flattening. M. Ohnesorge Studies in History and Philosophy of Science 96 (2022) 51–67 Yet again, the second volume of Laplace's M�ecanique C�eleste was the pioneering work in exploring the possibilities of using the precession as a measure of ellipticity (Fig. 7). Such efforts are complicated by the fact that the magnitude of precession was not uniquely determined by the earth's ellipticity. Strictly speaking, the movement of the rotational axis is not explained by the earth's flattening, but by the inequality between the moments of inertia around its polar and equatorial axis. For a solid ellipsoid with a homogenous density, this inequality varies with the difference in the length of its two axes (i.e., its ellipticity). If applied to the real earth, however, the magnitude of the inequality is affected by heterogeneities in its interior density distribution. While this implies that the contemporary ignorance about the earth's interior affected the reli- ability of IPC-inferences, Laplace still considered the measure informa- tive. More precisely, he used it to determine an upper ellipticity limit of 1/304. For that, he assumed that the earth's interior consists of sphe- roidal strata whose density increases gradually from its surface to its core and took the earth's central density to be 4.761 times higher than at sea level. His result significantly narrowed down the ellipticity range (1/230 and 1/578) permitted by Clairaut's hydrostatic theorem alone (Laplace, 1829, p. 930). Similar to the lunar perturbations, the study of precession returned to the canon of physical geodesy through Helmert's textbooks (Helmert, 1884, pp. 426–38). Yet, he did not revive Laplace's attempt to employ it as a measure of ellipticity. The crucial steps to measure ellipticity from precession were only taken by prominent geophysicists in the late 1880s and 1890s. The two decades marked the beginning of modern seismo- logical measurement and saw a hitherto unknown interest in the earth's interior (Miyake, 2017b; Schweitzer, 2008). First, Paris-based Rudolphe Radau showed that the different internal density variations proposed by Laplace, Helmert, and others, only result in a negligible change in the corresponding moments of inertia and, ipso facto, the earth's precession (Radau, 1885, 1890). Indeed, Octave Callandreau (Paris), Emil Wiechert (G€ottingen), and George Darwin (Cambridge) consequently inferred nearly concordant ellipticities while using different hypotheses about the earth's density distribution and only assuming that its interior is composed of concentric ellipsoidal strata, whose density increases to- wards the center. Their outcomes were: 1/297.4 (Callandreau, 1889, p. 83), 297 (Wiechert, 1897, p. 241), and 296.4 (Darwin, 1899, p. 119). Darwin, in particular, argued at length that IPC-inferences are of a high value “as an independent means to establish the ellipticity of the earth's surface” (Darwin, 1899, p. 123). 17 This name continues to be used by twenty-first century geodesists. Indi- vidual stations that control for azimuth, latitude, and longitude are often referred to as “Laplace points”: Torge (2001, p. 10). 3.2. Deflections of the vertical as a measure of ellipticity The fifth and final procedure that came to prominence in the early century measures the earth's polar flattening based on the deflections of the vertical (i.e., the direction of the gravity) across a triangulation network. Deflections of the vertical give quantitative estimates of how much the gradient of the terrestrial gravity field in a certain network differs from that of an ellipsoid. Such deflections are stated as angular quantities and can be determined by comparing locally determined as- tronomical coordinates at a certain point in a triangulation network with the coordinates that the same point is assigned to an ellipsoidal reference surface fitted to the triangulation network as a whole. To measure the earth's ellipticity based on such deflections, geodesists have to adjust the 57 ellipticity of the reference ellipsoid so that sum of the squares of all de- flections in a sufficiently large network is minimised. Areal deflections of the vertical constitute our last and fifth measurement indicator, which we will denote as IDV. Contrary to IM- and IPC-inferences, this new measure was not connected to astronomical theory but evolved organically from geodetic measurement practice. After finishing the triangulation of Eastern Prussia in the 1830s, the Prussian astronomer Friedrich Wilhelm Bessel was the first to systemati- cally discuss how to quantify the errors introduced into a triangulation network by scattered gravity anomalies. Let us quickly run through the technicalities.When setting up a triangulation network, you need to orient astronomical telescopes and theodolites. Both are supposed to be fixated on the same equipotential ellipsoidal surface, which stands perpendicular to the direction of surface gravity at every point. If there are any gravity anomalies in the vicinity, the different observation points cannot be projected onto the same ellipsoidal surface. Thus, integrating multiple of such points into one network used in the measurements of ellipticity can introduce systematic errors corresponding to irregular deflections of the vertical throughout the triangulation network. Similar errors occur when multiple triangulation networks are used in an ellipticity measurement, but they cannot be fitted onto the same global ellipsoid (Fig. 8). In his seminal paper Bessel attempted to effectively quantify and anticipate those errors in a procedure later dubbed astrogeodetic network adjustment.17 While Legendre and Laplace were the first to lay out such a procedure (Legendre, 1805, Appendix; Laplace, 1829, pp. 358–70), Bessel credits his inspiration to Gauss's more recent analysis of a trian- gulation network in Hanover (see ch. 2). Bessel proposed to multiply the number of astronomical stations across triangulation networks, so that the ellipsoid might be orientated in such a way that the total amount of residual deviations (in latitude and longitude) can be statistically mini- mised (Bessel, 1837, pp. 295–304). For Bessel, the study of deflection aimed solely at the statistical minimisation of residual error and had no further inferential function. The Ordnance Survey of Great Britain and Ireland employed astronomical measurements for the same purpose (Clarke & James, 1858, p. 606). The eventualmove that turned these network adjustments into a novel measure of ellipticity was taken on the other side of the Atlantic, by the American geodesist William Hayford, chief Computer of the US Coast and Geodetic Survey. The mammoth project involved 509 (!) astronomical control stations spread across the entire North American triangulation network and aimed to test the isostatic compensation of the region. Remember that theories of isostasy entail that there are significantly fewer surface gravity irregularities than previously assumed since topographic mass surpluses were compensated by subterranean density deficits. The result of Hayford's network adjustment offered the most powerful and detailed evidence for regional isostatic compensation available world- wide. In his calculation, he compared how effective different hypotheses about the earth's subterranean density distribution were in minimising residual deflections. He not only concluded that the US-American conti- nental crust is isostatically compensated but also proposed an exact Fig. 8. Illustration of the deflection of the vertical by a gravity anomaly, where “anomaly” refers to any departure from the best-fitting elliptic meridian. The “geoid” represents an equipotential surface that is perpendicular to the direction of gravity at any point. Fig. 9. Hayford's correlation of isostatic compensation depth and deflection residuals, showing that least-squares error minimisation is achieved with hypothesis G (113.8 km). Taken from: John F. Hayford, The Figure of the Earth and Isostasy from Measurements in the United States (Washington 1909), 145. M. Ohnesorge Studies in History and Philosophy of Science 96 (2022) 51–67 equilibrium depth (113.8 km). This procedure visibly outperformed cor- rections for topographical mass surpluses and alternative isostatic compensation depths in minimising residual deflections (Fig. 9). Notably, Hayford's new measure of ellipticity presumed these isostasy results both in content and method. If similar compensations exist across the world, isostatic reductions could guide the astronomical orientation of 58 triangulations and safeguard IT measurements from systematic errors. Taking this thought one step further, an ellipsoid that accounts best for regional areal deflections after applying isostatic reduction can be ex- pected to do so all across the earth's surface (Fig. 10). Thus, Hayford argued, a single very large triangulation network, such as the Northern American, is sufficient to reliably measure the earth's ellipticity. To do so, Fig. 10. Measurement inference from areal network adjustments to polar flattening. M. Ohnesorge Studies in History and Philosophy of Science 96 (2022) 51–67 geodesists need to adjust the parameters of an ellipsoidal reference model in such a way that the remaining deflections across an isostatically compensated and suitably extensive triangulation network are minimised. Hayford gives an admirably clear illustration of this new method and the epistemic benefits it offers, comparing it to manual model-making: The area method is illustrated by supposing that the model maker is given a piece of sheet metal cut to the outline of the continuous triangulation, which is supplied with the necessary astronomic ob- servations, and accurately molded to fit the curvatures of the geoid as shown by the astronomic observations, and that he is then requested to construct the ellipsoid of revolution which will conform most accurately to the bent sheet. (Hayford, 1909, pp. 169–70). While Hayford rightly stressed the benefits of determining ellipticity through his procedure, his promise of superior accuracy has a strong conjectural element. Generalising from only one isostatically adjusted network in Northern America presumes that similar isostatic compen- sations exist across other continents. 18 For more detailed discussions of the origins of the theory of isostasy see: Oreskes (1999, ch. 1), Howarth (2008), and Ohnesorge (2021). 19 For a detailed discussion see Airy (1845, p. 237). 20 For a detailed discussion see Airy (1845, p. 236) and Darwin (1899, pp. 120–23). 3.3. Reasoning with multiple measures Having surveyed the new measurement procedures available at the end of the nineteenth and beginning of the twentieth century, we can attend to how they contributed to resolving physical geodesy's hard problem of coordination. Recall that the severity of problems of coordi- nation varies according to scientists' ability to predict or experimentally control the perturbations arising from the shortcomings of their model of the measurement process. In what follows, I show that increasing the number of physically distinct measurement indicators plays a crucial role in resolving hard problems of coordination. In particular, they allow scientists to isolate previously overlapping perturbation by investigating their different impact on physically distinct measurement indicators. I begin by giving a descriptive account of geodesists methodology before systematising it in the subsequent section. It is illustrative to start with Helmert's textbooks once again. Beyond the comprehensive overview of the different physical quantities relating to the earth's polar flattening, he had proposed a new IP-value. Recall that throughout the nineteenth century, the pendulum values were always significantly larger (~1/288–1/290) than the ones inferred from meridional arcs (~1/297–1/300). Controlled pendulum measurements had enjoyed a higher epistemic standing than triangulations since they were less likely to be affected by unnoticed deflections of the vertical (Bruns, 1878; Fischer, 1868). As we have seen, Helmert could now compare both procedures to the ellipticity value of 297.8 � 2.2 deter- mined through the lunar perturbations (IM) and the ellipticity limits implied by relationship between the earth's ellipticity and its precession (IPC). He noted that IT, IM, and IPC converged quite closely while dis- agreeing with the pendulum values (Helmert, 1884, viii). This motivated Helmert to suspect that the supposedly superior pendulummeasurements have been missing the target, and articulate hypotheses about which unique sources of error might explain this discordance. Helmert proposed two crucial sources of error to explain the discor- dant pendulum measurements: the effects of subterranean compensation and the insufficient distribution of pendulum stations. Building on these hypotheses, he proposed two new systematic error corrections: (i) a 59 different altitude reduction procedure, and (ii) a more evenly and widely distributed selection of pendulum stations, leading him to an alternative IP result of 1/299.6 (Table 1). While (ii) is self-explanatory, (i) deserves some explication. For the purpose of measuring ellipticity, pendulum stations on the earth's irregular surface always needed to be reduced to mean sea level, which was supposed to be approximated by a smooth ellipsoidal surface. Helmert stopped using the standard “Bouguer reduction”, in which the raw pendulummeasurement outcomes at higher altitudes were corrected for the supposed surplus attraction of the additional topographical mass between them and mean sea level. Rather, he applied a “condensation reduction”, where higher stations are reduced to mean sea level as if the topography betweenmeasurement altitude and target surface would be “condensed” into the target surface (Helmert, 1884, 2: 225). This explanation for previous errors and the corresponding new correction are linked to Helmert's belief in global isostastic compensation. The still quite conjectural theory of isostasy implied that “the effects of the continental masses are more or less compensated by a lower density beneath the earth's crust” (Helmert, 1884, p. 364).18 Re- sults supporting the (at least local) existence of isostasy had been recorded in the Himalayas, Caucasus, Harz, Pyrenees, and close to Moscow (Helmert, 1884, pp. 378–79). If isostastic compensation holds globally, it would explain why previous reduction corrections for topo- graphic surplus attraction had introduced systematic errors. Notably, Helmert admitted that all three indicators are subject to non- trivial forms of error. Inferences from themoon's orbit are subject tomany other theoretical corrections, meaning that their reliability depended on the correctness of a whole class of astronomical background assump- tions.19 His pendulum (IP) value derived from 160 stations over- whelmingly concentrated in the Northern Hemisphere and South Asia, potentially ignoring large regional variations. His new reduction pro- cedure for the different pendulum stations was also still conjectural, presuming suspected isostatic compensation effects. Finally, inferences from the precessional constant (IPC) relied on a conjectural model of the earth's interior as consisting of concentric spheroidal strata whose density increases inwardly.20 Hence Helmert did not argue for the superiority of particular measurement procedures. Rather, he used their convergence to motivate a new hypothesis for explaining the discordance of IP. He subse- quently justified this hypothesis by showing how its application to the IP data led to value that alignedwith the other measures muchmore closely: My value receives very good confirmation from the lunar perturba- tions and the precessional constant. One would have to seriously change the latter to make it consistent with the 1/289 flattening values that are accepted by some, while likewise subscribing to the existence of a density law for the earth's interior in form of a simple power series since this law cannot exist with such a flattening and the observed value for the precessional constant (Helmert, 1884, viii). US Coast and Geodetic Survey's astronomer William Harkness approached the problem in a similar manner in 1891. He assembled the Fig. 11. Inverse-flattening/time graph illustrating the conflict and post 1880 convergence between measurements of polar flattening, in which, ‘�’ denotes an IP, ‘Δ’ an IT, ‘х’ an astronomical (IM or IPC) outcome, while ‘օ’ groups together any outcomes obtained from IDV and other non-standard triangulation measurements. Reproduced with permission from: Strasser (1957, Appendix, 95). Table 1 Most important IP-values published before and in Helmert's textbook, showing clearly how he departed from earlier results. Taken from: Georg Strasser, Ellip- soidische Parameter der Erdfigur, Munich 1957, Appendix. No. Year Inverse Ellipticity Researcher 1 1818 292.3 Ilmari Bonsdorff 2 1825 289.1 Edward Sabine 3 1829 288.45 Eduard Schmitt 4 1830 288.1 George Biddell Airy 5 1832 288 Nathaniel Bowditch 7 1834 285.26 Francis Baily 8 1843 286.1 Henrik G. Borenius 9 1868 294.1 Philipp Fischer 10 1874 284.4 Amandus Fischer 11 1880 292.2 Alexander R. Clarke 11 1884 299.26 Friedrich R. Helmert M. Ohnesorge Studies in History and Philosophy of Science 96 (2022) 51–67 most widely discussed IP, IT, IM, and IPC outcomes of the last decades and inferred a best estimate via the method of least square. His goal was not merely to find the best estimate, however, but to identify how much the different measures contributed to the total probable error associated with the result. Harkness noted that IP values (around 1/289) as well as the only IT outlier above 1/297 were responsible for more than half of the probable error. After excluding these values, he determined 1/300.20 as the best estimate, with a probable error of �2.96. Like Helmert, he thus used convergence in light of different sources of error to argue against discordant values. In his case, the target is an older andwidely received IT value from British geodesist Alexander Ross Clarke: In short, the general adjustment, the pendulum experiments, and precession and nutation give a flattening differing little from 1:300, the result from lunar perturbations is uncertain within rather wide limits,21 and Clarke's geodetic arc gives 1:293.5. Thus, it appears that 21 As mentioned earlier, Harkness's did not consider both of Hansen's pertur- bations, while Helmert and Tisserand did. I suspect that this results from a lack of available data in the US at that point, which I cannot substantiate yet. With both perturbations in mind, Harkness's result would have been even clearer. 60 the geodetic value stands quite alone, and as it is almost certainly erroneous, the probable error of the observed value of e could be largely diminished by making it depend solely upon the results from pendulum experiments and precession (Harkness, 1891, p. 143). Harkness himself did not discuss which possible source of error could explain Clarke's individual IT outlier. Fortunately, Helmert only offered such an explanation in 1884: the arc data used in his calculation was unduly centred on Britain, Central Europe, and the Indian subcontinent (Helmert, 1884, 2: vii). The convergence towards ellipticities around 1/297 (approximating the long-standing of IT values discussed in section 2) and Helmert's ex- planations of previous discordances were further substantiated when Callandreau's, Wiechert's, and Darwin's results from the precession in- ferences (IPC) were published during the 1890s. As we have seen, these all fell into 1/297 � 0.4. Darwin, again, explicitly highlights the conver- gence between IPC and the other indicators, as well as their different sources of possible error: The precessional constant could be used to determine the ellipticity of the earth with perhaps as little error as any other method. The un- certainty is, indeed, of a different kind, being dependent on our ignorance of the interior of the earth. […] This estimate of the ellipticity agrees well with the results of all other methods, except that of the pendulum, from which it is concluded that the ellipticity is about 1/299 (Darwin, 1899, p. 123). Note that Darwin already takes for granted that the older pendulum values of about 1/289 are not worthy of consideration, agreeing with Helmert and Harkness. The “disagreement” that he mentions is the slight departure from Helmert's new IP-outcome that we discussed above (1/ 299.26). This last residual conflict was alleviated when Helmert published a new analysis of 1600 (!) pendulum stations in 1901, explicitly respond- ing to Darwin and Wiechert and determining an ellipticity of 1/298.3 � 1.1. In response to Darwin's and Wiechert's work he had proposed three joint hypotheses to explain the remaining deviation, which he tried to correct for in his calculation. Beyond drastically increasing the number and geographic distribution of stations, Helmert had now developed the Table 2 Overview of convergent outcomes towards the beginning of the twentieth century. Values that were disqualified based on comparisons across measurement procedures and the causes of their discordance are italicised. Indicators Potential Sources of Systematic Error Inverse Ellipticity Computer Year Latitudinal Variations in Pendulum lengths Concentration in specific regions, altitude reduction, island, and coast anomalies ~289 mean value 1820–80 299.6 Robert F. Helmert 1884 298.3 Robert F. Helmert 1901 Latitudinal Variations in Lengths of Meridional Arcs Deflections of the vertical during orientation, concentration in specific regions 299 Friedrich W. Bessel 1838 300 George Everest 1842 298 Henry James 1858 293.5 Alexander R. Clarke 1878 Lunar Perturbations Data collected over large periods of time and at different observatories, unaccounted perturbations of the lunar orbit 297.8 Robert F. Helmert 1884 297.2 Felix Tisserand 1887 Precessional Constant Heterogenous interior density distribution of the earth 297.4 Octave Callandreau 1889 297 Emil Wiechert 1897 296.4 George H. Darwin 1899 M. Ohnesorge Studies in History and Philosophy of Science 96 (2022) 51–67 function describing the latitudinal surface gravity variation on an ellip- soid to a higher order, and predicting errors resulting from the unique mass composition around small islands (Helmert, 1901, pp. 331–36). At the beginning of the twentieth century, geodesists had thus reached a first tentative consensus on the earth's polar flattening, involving not only two but four convergent measurement procedures with different sources of error. The following table tries to sum up these developments more concisely, highlighting which values were successively disqualified for departing from the approximate range in which the different measures converged (Table 2). In 1906, finally, Hayford applied his new measurement procedure based on IDV and arrived at an ellipticity of 1/297.8 � 0.9. After the further extension of the North American triangulation, he corrected the value to 1/297.0 � 0.5. As we have seen, Hayford went much beyond Helmert in accounting for the effects that subterranean density distri- butions have on geodetic measurements. He not only stopped correcting for topographic surplus masses but established an exact depth at which irregularities earth crust and mantle are fully balanced out. The success of such a determination across North America added further weight to Helmert's account of the errors in earlier pendulum measurements. Hayford, again, did not justify his result by pointing out the superior reliability of his measure. As we have seen, Hayford's procedure relied on the assumption that the North American isostatic compensation could be generalised across the entire earth. Indeed, Hayford points out that “it is important to note the close agreement [of Helmert's value] with the C. & G. Survey 1906 value for the reciprocal of the flattening, namely, 297.8 � 0.9 […], though the two values depend on different kinds of observations made in different parts of the earth” (Hayford, 1909, p. 173). Hayford's results meant that all major ellipticity measurements since 1884 – involving four different indicators – had converged in the range of 1/296.6 to 1/298.3. Thereby, geodesists had successfully reduced the maximum divergence between measured inverse ellip- ticities from about 15 across two measurement procedures to about 1.7 across four measurement procedures. Notably, the currently accepted value for the earth's polar flattening is 1/298.257223563 (WGS 84) and falls within that convergence interval. While geodesists had not conducted new arc measurements of ellipticity between 1884 and 1909,22 several of the most important arc measurements from before 1884 also align closely with the new consensus (Table 2). After facing an underdetermined choice about the appropriate flattening 22 The new ellipsoid parameters that some geodesists derived had only aimed at regional best-fit surfaces for the Russian and Australian survey, not at a globally adjusted ellipsoid. See: Strasser (1957, pp. 55–57). 61 magnitude and uncertainty about the adequacy of their model before the 1880s, geodesists had now identified a unique convergence in- terval and established well-supported hypotheses for explaining earlier discrepancies (Fig. 11). After the end of thefirst worldwar, the IAG's 1924 general assembly in Madrid formally declared Hayford's outcome as the polar flattening of the first global standard ellipsoid. During the assembly, George Tyrrell McCaw, secretary of the British Colonial Survey and Geophysical Com- mittee, presented a least-squares analysis of the major ellipticities measured through differentmeridional arcs (IT) and informally compared the outcome (1/296.2 � 1.3) with the other three indicators. Unfortu- nately, McCaw did not actually include the other indicators in his least- squares analysis and solely weighed the different arcs according to their length, not considering their individual error sensitivity (McCaw, 1924). After a controversial discussion on the scientific and practical purposes of the new standard model of the earth's figure, the committe instead accepted Hayford's 1909 results - completely corrected for isostasy and in closer agreement with the other indicators - as the model's defining pa- rameters (Dundas, 1924; Lambert, 1926). This marked the first interna- tional consensus on the accuracy and appropriate quantitative dimensions of the ellipsoidmodel of the earth. In 1930, another IAG general assembly in Stockholm settled on a corresponding global standard equation for the variation of surface gravity with the latitudes of the Hayford ellipsoid (Torge, 2017, p. 50). To see how significant this achievement was, keep in mind that the IAGmeetings had settled a scientific problem that had been discussed since the Principia and provided important evidence for New- ton's universal inverse square law. As Samuel Oppenheim, professor of Astronomy at the University of Vienna, notes in his contribution to the seminal Encyklop€adie der Mathematischen Wissenschaften: To determine the exact validity of the Newtonian law based on the distance between attracting bodies, the communicated [geodetic] results matter in multiple ways. The calculation of the earth's flat- tening from pendulum measurements on its surface and its compar- ison with geodetic [triangulation] measurements do not show a full agreement but fall sufficiently into their respective error limits. […] The perturbation of the earth's flattening on the movement of the moon is also nearly fully anticipated by the inverse-square law of attraction (Sommerfeld & Oppenheim 1926, pp. 122–23). We should add to Oppenheim's analysis that geodesists did not only manage to make the different measurements converge within a narrow interval that is compatible with the inverse square law of gravitational attraction. Rather, they also identified plausible physical sources behind earlier discrepancies – most crucially the isostatic compensation of the topographic attraction on pendulum stations. These achievements 23 If scientists do not have any new measurement procedure at their disposal (step 1), they might also repeat step 2 and 3 to assess produce evidence for their explanations of previously discovered outliers. M. Ohnesorge Studies in History and Philosophy of Science 96 (2022) 51–67 offered substantial evidence for Newton's law used in derivations of the ellipsoid model and settled the most prominent measurement problem in contemporary geoscience, vindicating the pluralistic methodology fol- lowed by geodesists. 4. Articulating and defending operational pluralism 4.1. Methodological outlines Above, I have sketched how geodesists solved their hard problem of coordination, reaching a convergence between measurements of the earth's polar flattening and explaining previous errors in pendulum and arc measurements. As I put the matter above, problems of coordination are hard if (i) scientists do not have empirical access to a parameter without relying on a idealised theoretical model, and (ii) their mea- surement indicators are subject to multiple overlapping perturbations they can neither predict theoretically nor shield their measurements against. After struggling with two discordant measurement procedures for two centuries, geodesists finally overcame their measurement prob- lem by introducing additional measures based on different physical in- dicators. In what follows, I distil a distinctive methodology from geodesists successful approach, which I refer to as operational pluralism. After systematically articulating operational pluralism, I show how it applies my descriptive of geodetic practice and discuss its epistemolog- ical significance in light of canonical objections. The methodology followed by geodesists resembles a three-step heuristic that is repeated throughout multiple iterations: (1) Introduce measurement procedures with a physically distinct in- dicator that you believe to be subject to different sources of sys- tematic error. (2) Identify which measures cause outliers from a shared convergence interval. (3) Analyse the perturbations uniquely affecting the discordant measure to explain and anticipate the lack of coordination. Introducing different kinds of measures enlarges the kinds of mea- surement results which scientists can intercompare. If you compare a greater number of different measures, you increase your chance of having some of them converge in a sufficiently narrow interval. As a conse- quence, you can identify outliers from that interval. These become the targets of further inquiry that aims to identify their unique sources of error. If two existing measures M1 and M2 conflict, and one or more additional procedures M3 � Mn agree with M2, the above methodology urges you to further inspect the sources of error associated with M1. By introducingM3, we have moved beyond an underdetermination between conflicting parameter values and can articulate and investigate a hy- pothesis about the operative source of error. This hypothesis is then used to predict errors in further iterations of the methodology and assessed based on howwell it increases future measurement coherence. In the best possible case, the adjusted convergence interval leads to the identifica- tion of new outliers, which again motivate new hypotheses about the corresponding sources of error. Coordination is achieved iteratively, as hypotheses about sources of error and initial numerical convergence intervals are justified based on their contributions to this process of mutual adjustment. Epistemologically, operational pluralism thus re- mains firmly rooted in the iterative model of justification originally developed by Chang (2004, ch. 5) and subsequently adopted by most epistemologists of measurement. The application and utility of operational pluralism depends on several indicators being sufficiently different in their physical constitu- tion and resulting contextual applicability. It is for that reason that I dubbed it “operational”, whereby I refer to the physical operations that scientists perform determine the magnitude of a particular indicator. In our case, geodesists relied on such diverse practices as triangulations or pendulum networks stretching hundreds of kilometres, equally extensive 62 networks of astrogeodetic control stations, or dozens of telescopes across several observatories. All of these measures involve sophisticated oper- ations (or systems of operations, if you like) that are affected by the earth's figure and constitution in subtly different ways. These physical differences initially motivate the hypothesis that a new procedure might be sensitive to different sources of systematic error (step 1). Theoretical or adhoc hypotheses about the specific error sensitivities of the discor- dant measure are then used to explain the detected discordances (step 3) and anticipate them in further measurements (step 1 in the next itera- tion). Both numerical convergence intervals and hypotheses about distinct sources of error are justified diachronically, based on their contribution to subsequent iterations steps 1 to 3.23 Contrary to canonical instances of epistemic iteration (e.g. Chang, 2004, ch. 4) iterative cor- rections are pursued horizontally between different measurement pro- cedures and associated hypotheses about indicator-specific sources of error, not necessarily involving vertical iterations between procedures and general theoretical models of their target. How does this map onto geodetic practice? As we have seen, geode- sists started to strongly rely on additional measures from around 1880. These efforts guided a gradual convergence between five measurement procedures, which we previously identified by their five different in- dicators. We can nicely organise geodesists' approach into three different iterations. During the first iteration, Helmert and Harkness inter- compared the existing measures with two novel procedures based on lunar perturbations (IM) and the earth's precession (IPC) (step 1). Both indicators were believed to be sensitive to different sources of error than arc and pendulum measurements. Helmert and Harkness identified problems with the earlier IP values around 1/289 and Clarke's IT outcome of 1/294 since they departed significantly from a convergence interval between roughly 1/297 and 1/300 (step 2). In the case of IP, the error was explained by an inferior altitude reduction procedure, while Clarke's IT value was explained by a flawed distribution of his arc data across the globe (step 3). In the subsequent iteration, Darwin could further narrow down the convergence interval to 297 � 0.5, identifying a potential problem with Helmert's IP value of 1/299.6 (step 2). At the end of this step, Helmert explained this remaining discordance by three joint hy- potheses about an unrepresentative geographical distribution of stations, island anomalies, and a mathematical shortcoming in the older versions of Clairaut's theorem. After applying these new corrections, he arrived an IP value of 1/298.3 � 1.1, whose associated error bar overlapped with Darwin's result. Finally, Hayford further substantiated the earlier cor- rections in iteration 3, when he introduced the fifth measure with yet another complementary source of error. The error bar of his 1/297.0 � 0.5 overlapped with Helmert's and Darwin's. Of course, this methodology for solving hard coordination problems can fail. Similar forms of reasoning and assessment may indicate a mistaken convergence interval and, subsequently, fail to lead to any fruitful hypotheses about the operative source of errors. Likewise, there may be cases in which even a highly diverse group of measures does not indicate any convergence interval to start with. Nonetheless, our findings show that operational pluralism can serve as a powerful heuristic for solving difficult cases of a recurrent epistemic problem in theory- mediated measurement. Keep in mind that incredibly extensive and costly ellipticity measurements had been pursued for more than 200 years before the above consensus was reached. 4.2. New responses to old worries With this informal reconstruction at hand, we can anticipate some objections. In particular, I will argue that the methodology I distilled from geodetic practice is immune to two pertinent objections against M. Ohnesorge Studies in History and Philosophy of Science 96 (2022) 51–67 similar methodologies. Both objections identify the fallible nature of comparing the outcomes inferred from distinct procedures and take this fallibility to undermine their evidential significance or physical mean- ingfulness. These objections lose their force once we understand indi- vidual comparative inferences as parts of a larger diachronic and iterative process. A third and final worry concerns the problems that persistent discordances pose for the methodology of convergent, theory-mediated measurements proposed by George Smith and William Harper. I argue that my proposal can be read as an addition to this methodology, laying out a possible response to persistent discordances. 4.2.1. Measurement robustness and evidential objections There are clear parallels between my account and discussions of measurement robustness.24 Measurements are robust if “several different procedures yield closely similar results for a certain quantity under measurement” (Basso, 2017, p. 57). While I showed how operational pluralism can resolve hard problems of coordination, philosophers working on robustness point out how converging measures facilitate strong evidence. Donald Campbell's classical work on “multiple deter- mination”, for example, stresses that multiple agreeing measurement indicators increase the “inferential strength” of theories (Campbell & Fiske, 1959). William Wimsatt, similarly, takes measurement robust- ness to provide evidence of the predictive reliability of the theories (Wimsatt, 1981, p. 67). Recent work on robustness in the philosophy of metrology has focussed on the inverse evidential relation, i.e., on the evidence that robustness provides for measurement reliability. Eran Tal and Alessandro Basso have both argued that measuring the same value for an idealised parameter across multiple measurement procedures provides evidence for the reliability of these procedures (Basso, 2017; Tal, 2017a). The classical worries about evidential appeals to robustness are so- called “independence objections” (Basso, 2017, p. 3). Independence ob- jections highlight how difficult it is to assess whether different measures of the same parameter are independent enough from each other (Cart- wright, 1991, p. 154; Stegenga, 2012, p. 209). Sensitivity to the same perturbational effects could turn out to introduce a common systematic error that we had failed to anticipate. In that case, the use of an additional procedure could reinforce the flaws of some of our initial measures and does not offer additional evidence of any kind. My case study suggests that the focus of such debates has been too narrow. Robustness arguments and independence objections are usually presented as claims about the reliability of a particular comparative ev- idence assessment.25 As I presented my case study, geodesists employed multiple indicators in a diachronic and iterative methodology. The aim of such a methodology is not to offer a static evidential criterion of a measurement's reliability but to gradually improve it (Chang, 2004, ch. 5). This severely raises the burden of argument for advocates of inde- pendence objections. Critics need to argue for more than the fallibility of individual convergence intervals. Rather, they need to show that drawing 24 Measurement robustness constitutes a separate problem from derivational robustness, which many take to be a guiding principle of theorising and modelling. The virtues of derivational robustness have been discussed exten- sively by Richard Levins (1966) and, more recently, by Weisberg (2006); Kuorikoski et al. (2010), and Odenbaugh and Alexandrova (2011). A taxonomy of the different notions of “robustness” in modelling and measurement is pro- vided in Wimmsatt (1981). 25 Bayesian accounts – of which Schupbach's (2018) explanatory account might be the most sophisticated example – go even further and specify how much a successful comparison of “independent” results should increase the evidential support for some hypothesis – where “independent” is specified in terms of probabilistic, confirmational, or explanatory independence (depending on the account). Such projects are much more ambitious than my efforts here. I am only defending the salience of a methodology for improving measurement coordina- tion and do not attempt to quantify this salience according to any particular metric. 63 inferences from convergence intervals to operative sources of error cannot improve one's epistemic situation or might even worsen it. Like any kind of inference, inferences from a supposed convergence interval be- tween multiple measures might sometimes be affected by an unac- counted systematic error and numerical convergence is no general criterion for “accuracy” or “truth”. However, the self-corrective character of operational pluralism allows that such shortcomings can themselves become targets of inquiry in future iterations of the methodology. Note that my response is much more optimistic than existing argu- ments presented by William Wimsatt (1987), Jaakko Kuorikoski et al. (2010), Kuorikoski and Marchionni (2016), and Alessandra Basso (2017). Their common idea is that practicing scientists can sufficiently avoid making wrong judgements about independence if they stick to what one might call an “error proviso”. According to that proviso, one should only draw evidential conclusions from a supposedly independent convergence if one knows that the respective measures are subject to different sources of error.26 This claim amounts to a quite rigid precon- dition, presupposing independent means for individuating and predict- ing operative sources of error. I have defined hard problems of coordination as precisely those situations in which scientists' lack the theoretical tools for doing so. In operational pluralism, comparative in- ferences only need to pick out targets for further inquiry. Hence the reference to beliefs about different sources of error (rather than knowl- edge) in step 1 of my heuristic. Scientists do not need to independently confirm the operative sources of error affecting particular indicators before drawing inferences from their convergence or disagreement. Rather, they can increase the credence they give to a numerical convergence interval and hypotheses about specific source of errors in a complementary and iterative manner: convergence intervals may be justified and adjusted by scientists' ability to explain diverging results, while hypotheses about causes of error are justified based on how well they can be used to increase coherence in future measurements. 4.2.2. Physical meaningfulness and spurious convergences A second long-standing worry one might raise against operational pluralism can be traced back to Percy Bridgman's Logic of Modern Physics (1927). For Bridgman, contemporary developments in physics had exposed the danger of taking theoretical parameters to be physically meaningful outside of their domain of operational determinability. Quantum and relativity theory had to “save” physics from a crisis caused by generalising quantity terms to situations in which they were never physically measured (e.g. at very high velocities or subatomic scale). Taking inspiration in these events, Bridgman proposed to radically restrict the inferential domain of theoretical parameters to the specific physical operations that are used to determine their magnitude. If two such operations associated with the same theoretical parameter cannot be employed in the same domain, we have no reason to think agreements or disagreements between them have any physical significance, Hence, we cannot draw justified inferences about the physical world from such comparisons. Bridgman drew quite radical conclusions about the infer- ential scope of measurements, becoming (in)famous for endorsing claims to the extent that astronomical telescopes and rods do not measure the same attribute (Bridgman, 1927, pp. 17–18). The literature is rife in rebuttals to Bridgman's rigid account of the inferential scope of measurements. Almost every commentator agrees that it is overly restrictive and incongruent with scientific concept for- mation (Chang, 2017; Feest, 2005; Hempel, 1966, ch, 7). Instead of adding another rebuttal to the list, I want to take operationalism seriously and show that its basic commitment to a notion of physical 26 Basso weakens this proviso further to include cases in which the same source of error causes discrepancies of different magnitude across different measures. The difference is not central to my argument here because it also requires antecedent knowledge of the errors uniquely affecting certain measures. M. Ohnesorge Studies in History and Philosophy of Science 96 (2022) 51–67 meaningfulness is compatible with the methodology I defend.27 Even if we accept the inferential use of quantities cannot be categorically limited to specific operations – as Bridgman himself conceded later (Chang, 2017) – we can rescue a genuine worry about spurious convergences from his account. I understand a convergence between measures of a quantity as spurious iff some of these measures are subject to domain-specific and unaccounted sources of error. Some remarks in the Logic already even indicate that Bridgman primarily intended his notion of physical mean- ingfulness as a useful tool for identifying spurious convergences, as he noted that the “essential point” of his views is to differentiate “con- structs” from “physically meaningful quantities”,28 where the latter “can be defined [i.e. operationalised] in several alternative ways in terms of physically distinct operations” (Bridgman, 1927, pp. 55–56).29 Bridgman's (charitably interpreted) worry about spurious conver- gences can straightforwardly be applied to our case. Triangulations (IT), pendulum stations (IP), and network-wide deflections of the vertical (IDV) were assessed in different places or regions and involved different theoretical corrections. Bridgman's worry becomes even more acute when we compare any one of those three indicators with observations of the lunar orbit (IM) or the movement of fixed stars (IPC). If his account is correct, my operational pluralism is susceptible to the following objec- tion: It is wrong to first suppose that different indicators permit in- ferences to the magnitude of the same theoretical parameter and then account for indicator-specific errors afterwards. This supposition of my account makes scientists unnecessarily vulnerable to being led astray by the mathematical structure of their models and face confusion once measurement results stop cohering (Bridgman, 1927, p. 42). Conse- quently, they should have taken their measurement procedures to have different local targets until the different perturbations acting in the specific domains of all indicators were fully understood. Geodesists detailed concern about domain specific perturbations shows that they were acutely aware of the problem of spurious conver- gences. However, they integrated this awareness into a productive methodology for isolating and predicting the effects of such perturba- tions on their measurement. Fleshing out how such awareness guides successful measurement practice is the key to blocking the above ob- jection and rescuing a notion of physical meaningfulness that is compatible with operational pluralism. Pace Bridgman, geodesists did not take the danger of spurious convergence to restrict comparisons be- tween measurements with distinct physical indicators and domains. Rather, it is precisely through diachronic and iterative comparisons that they gradually identified and anticipated the operative sources of mea- surement error. In such a methodology, physical meaningfulness does not act as a static criterion for avoiding the risk of spurious convergences. Rather, it plays a dynamic role in the diachronic improvement of mea- surements, in which scientists gradually reduce the risk of spurious con- vergences. This is reflected in the third step of the iterative heuristic proposed in section 4.1. To successfully apply operational pluralism, scientists should not only aim at achieving convergence, but develop hypotheses to explain outliers in virtue of perturbations uniquely affecting specific measurement indicators. These hypotheses may then be 27 For a similar, constructive approach to Bridgman's operationalism see: Chang (2004, 2017). 28 This conception of physical meaningfulness is not coextensive with the formal definitions of meaningfulness as invariance across unit transformation defended in the Foundations of Measurement or, more recently, by Louis Narens (Krantz et al., 2006; Narens, 2007). This is not the place to discuss their dif- ferences in detail, but it suffices to note that the operationalist notion of meaningfulness focusses on invariance across actual physical operations performable by scientists. In contrasts, the “units” in the formal definitions are referring to ideal atomic operations that still need to be coordinated with physical measuring devices. 29 This aspect of Bridgman's thought has also been picked up by Mahmoud Jalloh (draft), who is fleshing out operational invariance in terms of dimensionality. 64 used to predict domain-specific errors and are assessed based on their ability to increase future measurement coherence (cf. 4.2.1). To sum up, I happily concede that Bridgman's worry about spurious convergences is genuine and that a criterion for the physical meaning- fulness of theoretical parameters is practically warranted. However, I think that physical meaningfulness should be conceptualised as a dy- namic criterion, gradually realised in form of scientists' increasing ability to anticipate domain-specific errors.30 For example, we can say that the theoretical parameter “ellipticity” was less physically meaningful in respect to its intended target domain (the physical properties of the earth, such as its attraction, rotation, curvature, or density) before 1880 than it was around 1924. During that period geodesists gradually learned to anticipate the perturbations unique to the subdomains of their particular physical indicators. Conceptualised in this manner, physical meaning- fulness offers a useful normative notion that urges scientists to gradually anticipate domain-specific measurement errors and minimise the risk of spurious convergences. In fact, the subsequent history of geodesy provides a nice example of how operational pluralism guides the pursuit of further discordances and increased physical meaningfulness after scientists achieved an initial convergence. During the twentieth century, geodesists discovered more local deviations from their initial reference model and tied them to a growing body of knowledge about the form and dynamics of the earth's internal strata. Similar to the period investigated in this paper, geodesists discovered, and anticipated such deviations by reasoning with multiple measures. Most notably, they would employ seismological measurements of the varying viscosity in the earth's interior and satellite measurements of the earth's gravity field, which provided new, partially discrepant re- sults. This led to the construction of finer-grained models of the earth's internal constitution and gravity field but also further increased the meaningfulness of the ellipsoid – now backed by various new corrections for the resulting perturbations (Fischer, 1975, pp. 35–42). 4.2.3. Convergent theory-mediated measurement and resistant discrepancies Before closing, I need to address an influential account of physical measurement, which, among other things, has directly inspired various aspects of this proposal. Over the last decades, George Smith and Wil- liam Harper have re-habilitated the methodology of Newton's Principia, which reserves a central role for convergent theory-mediated mea- surements (Harper, 2012; Smith, 2002, 2014). According to that methodology, scientists generate high-quality evidence by (i) induc- tively generalising observed regularities, (ii) deriving new phenomena from them, and (iii) comparing these phenomena with physical mea- surements. On the Smith-Harper account, it is not merely the conver- gence and precision of these measurements that facilitate evidential support. Rather, high quality evidence is generated if any robust de- viations from predicted values can be explained by the initial theory. Thus, pursuing converging theory-mediated measurements produces robust discrepancies, whose subsequent explanation maximises the evidential demand on theory. It comes as no surprise that celestial mechanics is both Smith's and Harper's prime example for gradually converging theory-mediated measurements, arguably constituting one of the most successful research programmes in the history of science. Newtonian gravity allowed astronomers to explain an incredible number empirical discrepancies by postulating additional perturbing bodies that conformed with their theoretical expectations – a legacy since picked up by General Relativity (Harper, 2012, pp. 378–84). Harper and Smith's focus on the paradigm case of celestial mecha- nism raises the question of how scientists should handle cases of theory- 30 Thus, I agree with Teru Miyake and George Smith who define “physically meaningful representations and parameterizations”, as those that “yield de- viations that have physically identifiable sources”. I only insist that physical meaningfulness is best conceptualised as a matter of degrees, increasing grad- ually throughout the process of inquiry (Miyake and Smith 2021, p. 172). M. Ohnesorge Studies in History and Philosophy of Science 96 (2022) 51–67 mediated measurements that remain discordant over time (Miyake, 2013, pp. 313–14; Ohnesorge, 2021). I take it that hard problems of coordination are such cases. Here, scientists cannot shield measurement procedures against the physical causes of discordances, nor can they uniquely identify and predict them by relying on background theory. On my account, such problems can sometimes be overcome by increasing the variety of physically distinct measurement operations, while aiming to isolate operative sources of error. Scientist can then correct persistent discordances between measurements in a process of diachronic revision, in which they gradually introduce new, measurement-specific corrections based on their ability to facilitate measurement convergence. This offers an instance of “epistemic itera- tion” on the horizontal level (Chang, 2004, ch. 5). Hence, I hope that operational pluralism offers a welcome addition to the Smith-Harper account of convergent measurement. 5. Conclusion Recent work in the philosophy of measurement has highlighted that establishing derived measurements involves problems of coordination. I have suggested that such problems are not merely a general predicament of derived measurement, but that their difficulty varies according to scientists' predictive and experimental control over perturbations of the measurement process. Such perturbations arise from discrepancies be- tween idealised models and the de facto physical features of the mea- surement indicators, target, and context. To identify how scientists might respond to hard problems of coordination, I analysed a measurement problem in physical geodesy that had persisted for more than two hun- dred years after its first discussion in Newton's Principia. This problem persisted so stubbornly because geodesists were unable to theoretically predict or experimentally control the perturbations resulting from the earth's heterogenous topographic and subterranean density distribution. Notably, they did eventually resolve their measurement problem, indi- cating that their methodology can provide lessons for resolving similar problems in and beyond physical geoscience. This methodology –which I dubbed operational pluralism – can be pursued by (i) increasing the number of physically distinct measurement indicators at disposal, (ii) identifying outliers from a shared convergence interval, and (iii) explaining their discordance based on perturbations uniquely affecting measurement with specific physical indicators. In repeatedly following steps 1 to 3, scientists can iteratively adjust previous convergence in- tervals and develop increasingly specific hypotheses about the sources of measurement error. I have argued that operational pluralism offers a generalizable and strongmethodology. The methodology is generalizable because it provides a response to a common epistemic problem affecting theory-mediated measurements of large and partially inaccessible physical systems. It is strong because it is immune to canonical objections that have been raised in the literatures on operationalism and measurement robustness. Far from falling prey to these objections, operational pluralism indicates that the focus of the respective debates might have been cast too narrow. While philosophers have questioned the physical meaningfulness and evidential value of individual comparisons between different measures, such comparisons become methodologically salient when used repeatedly and iteratively. CRediT author statement The author confirms being the sole contributor to this work and has approved it for publication. Funding This work was supported by the Cambridge Trust [Grant number: 10556463]. 65 Acknowledgements I thank Hasok Chang for his invaluable guidance in writing this paper and the Ph.D dissertation that I developed it from. I thank George Smith for his generous help in navigating the history of Newtonian gravity. I thank the members of the 2021 Du Châtelet price committee (Katherine Brading, Alisa Bokulich, Daniel Mitchell, and Wendy Parker) for detailed and very insightful comments on earlier versions of this paper. Finally, I thank Cristian Larroulet Philippi, Aditya Jha, Ahmad Elabbar, Mahmoud Jalloh, and Aja Watkins for very helpful feedback throughout. References Airy, G. B. (1826). On the figure of the earth. Transactions of the Royal Society, 13, 548–579. Airy, G. B. (1845). Figure of the Earth. In E. Smedley, Hugh J. Rose, & Henry J. Rose (Eds.), Vol. 5. Encyclopaedia Metropolitana (pp. 165–240). London: Fellowes & Rivington. Airy, G. B. (1861). Corrections of the Elements of the Moon's Orbit, deduced from the Lunar Observations made at the Royal Observatory of Greenwich from 1750 to 1851. Being an extension of a preceding memoir entitled Corrections to the Elements of the Moon's Orbit deduced from the Lunar Observations made at the Royal Observatory of Greenwich from 1750 to 1830. Memoirs of the Royal Astronomical Society, 29, 1. d'Alambert, J.-B. R. (1749). Recherches sur la pr�ecession des �equinoxes et sur la nutation de l'axe de la terre dans le syst�eme newtonien ([Reprod. en fac-sim.]). https://gallica.bnf.fr /ark:/12148/bpt6k3804r. Basso, A. (2017). The appeal to robustness in measurement practice. Studies in History and Philosophy of Science, 65–66(December), 57–66. https://doi.org/10.1016/ j.shpsa.2017.02.001 Belot, G. (2015). Down to earth underdetermination. Philosophy and Phenomenological Research, 91(2), 456–464. https://doi.org/10.1111/phpr.12096 Bessel, F. W. (1837). Ueber den Einfluss der Unregelm€assigkeiten der Figur der Erde auf geod€atische Arbeiten Und ihre Vergleichung mit den astronomischen Bestimmungen. Astronomische Nachrichten, 14(19–21), 269–312. https://doi.org/10.1002/ asna.18370141901 Bessel, F. W. (1838). Gradmessung in Ostpreußen und ihre Verbindung mit Preußischen und Russischen Dreiecksketten. Berlin. BIPM (International Bureau of Weights and Measures). (2006). The International System of Units (SI) Brochure. 8th edition. http://www.bipm.org/en/si/si_brochure/. (Accessed 3 September 2022). Bokulich, A. (2018). Using models to correct data: Paleodiversity and the fossil record. Synthese, 198(24), 5919–5940 (2021). https://doi.org/10.1007/s11229-018-1820-x. Bokulich, A. (2020). Calibration, coherence, and consilience in radiometric measures of geologic time. Philosophy of Science, 87(3), 425–456. Bokulich, A., & Oreskes, N. (2017). Models in geosciences. In L. Magnani, & T. Bertolotti (Eds.), Springer HandbooksSpringer handbook of model-based science (pp. 891–911). Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319- 30526-4_41. Boumans, M. (2007). Invariance and calibration. In M. Boumans (Ed.), Measurement in economics: A handbook (pp. 19–40). Amsterdam: Elsevier. Bradley, J. (1748). A letter to the Right Honourable George Earl of Macclesfield concerning an apparent motion observed in some of the fixed stars; by James Bradley D. D. Astronomer Royal, and F. R. S. Philosophical Transactions (1683-1775), 45, 1–43. Bridgman, P. W. (1927). The logic of modern physics. New York: Beaufort Books. Bruns, H. (1878). Die Figur der Erde: Ein Beitrag zur Europ€aischen Gradmessung. Berlin. Bürg, J. T. (1825). Bürg Epoche Der Mittleren L€ange Des Mondes Für 1779 J€ahrliche Aenderung Derselben, Gleicjung Der L€ange Etc. Astronomische Nachrichten, 4(March), 9. Callandreau, O. (1889). M�emoire Sur La Th�eorie de La Figure Des Plan�etes. Annales de l'Observatoire Imp�erial de Paris. M�emoires, 19(5), 1–84. Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56(2), 81–105. https://doi.org/ 10.1037/h0046016 Cartwright, N. (1991). Replicability, reproducibility, and robustness: Comments on Harry Collins. History of Political Economy, 23(1), 143–155. https://doi.org/10.1215/ 00182702-23-1-143 Chang, H. (2004). Inventing temperature: Measurement and scientific progress. New York, NY: Oxford Univ. Press. Chang, H. (2017). Operationalism: Old lessons and new challenges. In A. Nordmann, & N. M€ossner (Eds.), Reasoning in measurement (pp. 25–38). Routledge. Chapin, S. (1995). The shape of the earth. In The general history of astronomy: Planetary astronomy from the renaissance to the rise of astrophysics (pp. 22–34). Cambridge: Cambridge University Press. Clarke, A. R. (1880). Geodesy. Oxford: Clarendon Press. Clarke, A. R., & James, H. (1858). Ordnance trigonometrical survey of Great Britain and Ireland: Account of the observations and calculations of the principal triangulation, and of the figure, dimensions and mean specific gravity of the earth as derived therefrom. London: Ordnance Survey of Great Britain. Darwin, G. H. (1899). The theory of the figure of the earth carried to the second order of small quantities. Monthly Notices of the Royal Astronomical Society, 60(2), 82–124. https://doi.org/10.1093/mnras/60.2.82 http://refhub.elsevier.com/S0039-3681(22)00125-X/sref1 http://refhub.elsevier.com/S0039-3681(22)00125-X/sref1 http://refhub.elsevier.com/S0039-3681(22)00125-X/sref1 http://refhub.elsevier.com/S0039-3681(22)00125-X/optA9dZDaYbKQ http://refhub.elsevier.com/S0039-3681(22)00125-X/optA9dZDaYbKQ http://refhub.elsevier.com/S0039-3681(22)00125-X/optA9dZDaYbKQ http://refhub.elsevier.com/S0039-3681(22)00125-X/optA9dZDaYbKQ http://refhub.elsevier.com/S0039-3681(22)00125-X/sref2 http://refhub.elsevier.com/S0039-3681(22)00125-X/sref2 http://refhub.elsevier.com/S0039-3681(22)00125-X/sref2 http://refhub.elsevier.com/S0039-3681(22)00125-X/sref2 http://refhub.elsevier.com/S0039-3681(22)00125-X/sref2 https://gallica.bnf.fr/ark:/12148/bpt6k3804r https://gallica.bn