Vol.:(0123456789)1 3 Behavior Research Methods (2024) 56:1863–1899 https://doi.org/10.3758/s13428-023-02124-2 The Misinformation Susceptibility Test (MIST): A psychometrically validated measure of news veracity discernment Rakoen Maertens1  · Friedrich M. Götz2  · Hudson F. Golino3  · Jon Roozenbeek1  · Claudia R. Schneider1  · Yara Kyrychenko1  · John R. Kerr1  · Stefan Stieger4  · William P. McClanahan1,5  · Karly Drabot1  · James He1  · Sander van der Linden1 Accepted: 5 April 2023 / Published online: 29 June 2023 © The Author(s) 2023 Abstract Interest in the psychology of misinformation has exploded in recent years. Despite ample research, to date there is no validated framework to measure misinformation susceptibility. Therefore, we introduce Verification done, a nuanced interpretation schema and assessment tool that simultaneously considers Veracity discernment, and its distinct, measurable abilities (real/fake news detection), and biases (distrust/naïvité—negative/positive judgment bias). We then conduct three studies with seven independent samples (Ntotal = 8504) to show how to develop, validate, and apply the Misinformation Susceptibility Test (MIST). In Study 1 (N = 409) we use a neural network language model to generate items, and use three psychometric methods—factor analysis, item response theory, and exploratory graph analysis—to create the MIST-20 (20 items; completion time < 2 minutes), the MIST-16 (16 items; < 2 minutes), and the MIST-8 (8 items; < 1 minute). In Study 2 (N = 7674) we confirm the internal and predictive validity of the MIST in five national quota samples (US, UK), across 2 years, from three different sampling platforms—Respondi, CloudResearch, and Prolific. We also explore the MIST’s nomological net and generate age-, region-, and country-specific norm tables. In Study 3 (N = 421) we demonstrate how the MIST—in conjunction with Verification done—can provide novel insights on existing psychological interventions, thereby advancing theory development. Finally, we outline the versatile implementations of the MIST as a screening tool, covariate, and intervention evaluation framework. As all methods are transparently reported and detailed, this work will allow other researchers to create similar scales or adapt them for any population of interest. Keywords Misinformation susceptibility · Automated item generation · Fake news · Neural networks · Psychometrics The global spread of misinformation has had a palpable negative impact on society. For instance, conspiracy theories about the coronavirus disease 2019 (COVID-19) vaccines have been linked to increased vaccine hesitancy and a decline in vaccination intentions (Hotez et al., 2021; Loomba et al., 2021; Roozenbeek et al., 2020). Misinformation about the impact of 5G has led to the vandalization of cell phone masts (Jolley & Paterson, 2020), and misinformation about climate change has been associated with a reduction in perceptions of scientific consensus (Maertens et al., 2020; van der Linden et al., 2017). With false and moral-emotional media spread- ing faster and deeper than more accurate and nuanced content (Brady et al., 2017; Vosoughi et al., 2018), the importance of information veracity has become a central debate for scholars and policymakers (Lewandowsky et al., 2017, 2020).1 Rakoen Maertens and Friedrich M. Götz contributed equally to this work. * Rakoen Maertens rm938@cam.ac.uk Friedrich M. Götz friedrich.goetz@ubc.ca 1 Department of Psychology, University of Cambridge, Downing Street, CB2 3EB Cambridge, Cambridgeshire, UK 2 Department of Psychology, University of British Columbia, 2136 West Mall, Vancouver, BC V6T 1Z4, Canada 3 University of Virginia, Charlottesville, VA, USA 4 Karl Landsteiner University of Health Sciences, Krems an der Donau, Austria 5 Max Planck Institute for the Study of Crime, Security and Law, Freiburg im Breisgau, Germany 1 It should be noted that recent research also provides evidence for an alternative perspective, namely that the spread of misinforma- tion could be driven more by an emotional dimension than a veracity dimension (Cinelli et al., 2020). http://orcid.org/0000-0001-8507-5359 http://orcid.org/0000-0001-8900-6844 http://orcid.org/0000-0002-1601-1447 http://orcid.org/0000-0002-8150-9305 http://orcid.org/0000-0002-6612-5186 http://orcid.org/0000-0003-0636-5046 http://orcid.org/0000-0002-6606-5507 http://orcid.org/0000-0002-7784-6624 http://orcid.org/0000-0002-6604-3842 http://orcid.org/0000-0002-1769-8893 http://orcid.org/0000-0002-1859-4914 http://orcid.org/0000-0002-0269-1744 http://crossmark.crossref.org/dialog/?doi=10.3758/s13428-023-02124-2&domain=pdf 1864 Behavior Research Methods (2024) 56:1863–1899 1 3 Accordingly, across disciplines, research on the pro- cesses behind, impact of, and interventions against misin- formation has surged over the past years (for recent reviews, see Pennycook & Rand, 2021; Roozenbeek et al., 2023; Van Bavel, Harris, et al., 2020; van der Linden et al., 2021). Researchers have made progress in designing media and information literacy interventions in the form of educational games (Basol et al., 2021; Roozenbeek & van der Linden, 2019, 2020), “accuracy” primes (Pennycook et al., 2021b; Pennycook et al., 2020), introducing friction (Fazio, 2020), and inoculation messages (Lewandowsky & van der Linden, 2021). Crucially, however, no theoretical framework exists for a nuanced evaluation of misinformation susceptibility, nor a psychometrically validated measurement that provides a reliable measure across studies. Inconsistent interpretation and the need for a new measurement instrument Despite the plethora of research papers on the psychology of misinformation, the field has not converged on a standard- ized way of defining or measuring people’s susceptibility to misinformation. In the absence of such a commonly agreed- upon standard, scholars have been inventive in the way that they employ individually constructed misinformation tests, often with the best intentions to create a good scale, but typi- cally without formal validation (e.g., Pennycook, Epstein, et al., 2021b; Roozenbeek et al., 2021b). The extent of the problem becomes evident when exam- ining how researchers develop their test items and report the success of their models or interventions. Typically, researchers create (based on commonly used misinforma- tion techniques; e.g., Maertens et al., 2021; Roozenbeek & van der Linden, 2019) or select (from a reliable fact- check database; e.g., Cook et al., 2017; Guess et al., 2020; Pennycook et al., 2020; Pennycook & Rand, 2019; Swire et al., 2017; van der Linden et al., 2017) news headlines or social media posts, where participants rate the relia- bility, intentions to share, accuracy, or manipulativeness of these items on a Likert or binary (e.g., true vs. false) scale; for an extensive discussion, see Roozenbeek et al. (2022). Sometimes the news items are presented as plain- text statements (e.g., Roozenbeek et al., 2020), while in other studies researchers present headlines together with an image, source, and lede sentence (e.g., Pennycook & Rand, 2019). The true-to-false ratio often differs, where in some studies only false news items are presented (e.g., Roozen- beek et al., 2020), and in others this is an unbalanced (e.g., Roozenbeek et al., 2021b) or balanced (e.g., Pennycook & Rand, 2019) ratio of true and false items. Often an index score is created by taking the average of all item ratings (an index score reflecting general belief in false or true news items; e.g., Maertens et al., 2021), or by calculating the difference between ratings of true items and false items (veracity discernment; e.g., Pennycook, McPhetres, et al., 2020). Finally, an effect size is calculated, and a claim is made with respect to the effectiveness of the intervention, based on a change in false news ratings (e.g., Roozenbeek & van der Linden, 2019), a combined change in true news ratings and false news ratings (e.g., Guess et al., 2020), or even a change in true news ratings only (Pennycook, McPhetres et al., 2020). It becomes clear that the wide variation in methodolo- gies makes it hard to compare studies or generalize con- clusions beyond the studies themselves. Little is known about the psychometric properties of these ad hoc scales and whether or not they measure a latent trait. As a wide- spread practice in misinformation research, scholars often assume—rather than know—that they are measuring the same construct. As a result, if this bold assumption turned out to be untrue, we would be at risk of obscuring underly- ing phenomena by incorrectly labeling them as the same mechanism, thereby engaging in an illusory essence bias (Brick et al., 2022) and/or falling prey to jingle fallacies (Block, 1995; Condon et al., 2020). As misinformation is a complex issue, the responses on one item set may be a result of motivational factors, while responses on another scale may be more reflective of critical thinking skills, instead of both measuring the same “discernment skill.” We cur- rently do not know how different misinformation suscep- tibility scales are related, or how the true-to-false ratios influence their outcome (Aird et al., 2018) and how much of the effects found are due to response biases rather than changes in skill (Batailler et al., 2022). The limited stud- ies that do look at the issue of scale-specific effects show significant item effects, indicating a risk of skewed con- clusions about intervention effect sizes (e.g., Roozenbeek, Maertens et al., 2021b).2 Relatedly, whether the sampling of test items, their presentation, and response modes have a high ecological validity is often not discussed (Dhami et al., 2004; Roozenbeek et al., 2022), and little is known about the nomological net and reliability of the indices used. In other words, it is difficult to disentangle whether differences between studies are due to differences in the interpretation schema, the measurement instrument, or actual differences in misinformation susceptibility. This indicates a clear need for a unified theoretical framework in conjunction with a standardized instrument with strong internal and external validity. 2 While there are models that take into account the baseline plausi- bility of each item, they still do not reveal what construct each item is measuring. In other words, there may still be unexplained variabil- ity even when controlling for baseline plausibility, such as issues with item stability, and different effect sizes between item sets in interven- tion studies. 1865Behavior Research Methods (2024) 56:1863–1899 1 3 The present research Towards a universal conceptualization and measurement: The Verification done framework Here, we set out to create a theoretical interpretation schema as well as a first psychometrically validated measurement instrument that, in conjunction, resolve the issues mentioned above and offer utility for a wide range of scholars. We extend the current literature by providing the first psycho- metrically integrated conceptualization of misinformation susceptibility that allows for a reliable holistic measurement through the Verification done framework: we can only fully interpret misinformation susceptibility—or the impact of an intervention—by capturing news veracity discernment (V, ability to accurately distinguish real news from fake news) as a general factor, the specific facets real news detection abil- ity (r, ability to correctly identify real news) and fake news detection ability (f, ability to correctly identify fake news), distrust (d; negative judgment bias, being overly skeptical), and naïvité (n; positive judgment bias, being overly gulli- ble), and comparing V, r, f, d, and n alongside each other. A visualization of the Verification done model can be found in Fig. 1. For example, two different interventions may increase discernment ability V to a similar extent, but intervention A might do so by increasing detection ability r, while inter- vention B may accomplish the same by increasing detection ability f. Similarly, two people with the same discernment ability V may have opposite r and f abilities. Changes in detection abilities r or f after an intervention have to be interpreted together with changes in judgment biases d and n to determine whether the intervention has done more than just increase a judgment bias. Existing interventions often look at a limited subset of these five dimensions; for example, the creators of the Bad News Game intervention (Roozenbeek & van der Linden, 2019) originally focused on fake news detection, including only a few real news items. Meanwhile, the accuracy nudge intervention seems to work mainly by addressing real news detection (Pennycook, McPhetres, et al. 2020), although we are not sure about the judgment biases. Another media literacy intervention was found to increase general distrust, but showed improvement on veracity discernment nevertheless (Guess et al., 2020). In order to be able to compare these scores and gain insights into the complete picture, we need to employ the Verification done framework, but also make sure that each scale has high validity and comparability. To accomplish this, through a series of three studies and using a novel neu- ral-network-based item generation approach, we develop the Misinformation Susceptibility Test (MIST): a psycho- metrically validated (based on classical test theory and item response theory, as well as exploratory graph analysis) measurement instrument. The MIST was developed to be the first truly balanced misinformation susceptibility measure with an equal emphasis on discernment, real news detec- tion, fake news detection, and judgment bias. In addition, to put the results into perspectives, all scores should be inter- preted along with national norm tables. In the present study, we describe how we developed and validated the MIST to accomplish these goals, evaluate each of these dimensions, Fig. 1 Visualization of the Verification done model 1866 Behavior Research Methods (2024) 56:1863–1899 1 3 and investigate the practical utility of the MIST for research- ers and practitioners in the field. The Misinformation Susceptibility Test We conduct three studies to develop, validate, and apply the MIST. In Study 1 (N = 409), we employ a multitude of explor- atory factor analysis (EFA)- and item response theory (IRT)- based selection criteria to create a 20-item MIST full-scale and an 8-item MIST short-scale from a larger item pool that was built using a combination of advanced language-based neural network algorithms and real news headline extraction from reliable and unbiased media outlets, and then pre-filtered through multiple iterations of expert review. The resultant MIST scales are balanced (50% real, 50% fake), binary (real/ fake), cumulatively scored instruments that ask participants to rate presented news headlines as either true or false, with higher MIST scores indicating greater discernment ability.3 We also present a new, alternative method to EFA and IRT, namely exploratory graph analysis (EGA; Golino & Epskamp, 2017; Golino et al., 2021), to show how modern psychomet- rics may lead to other robust item selections. We acknowledge that the typical news consumption diet in real life includes more real news than fake news (e.g., Guess et al., 2020). However, as misinformation has the potential to spread faster (Brady et al., 2017; Vosoughi et al., 2018), and we aim to accurately measure a general discern- ment ability as well as both real news detection and fake news detection, in creating the MIST we have given equal representation on both facets. This allows us to generalize across the board—independent of an individual’s news con- sumption ratio. Meanwhile, to capture any biases related to overly positive or negative responses (to news in general), we have later added a method to calculate response biases d and n (these were not part of the original scale development protocol). As such, the MIST exhibits a psychometrically validated higher-order structure, with two validated first- order factors r and f (i.e., real news detection, fake news detection) and one general ability second-order factor V (i.e., veracity discernment), as well as a method to calculate response biases d (i.e., distrust) and n (i.e., naïvité).4 In Study 2 (N = 7674), we employ confirmatory factor analyses (CFA), as well as EGA, to replicate the MIST’s structure across four national quota samples from the UK and the US, establish construct validity via a large, prereg- istered nomological network, and derive norm tables for the general populations of the UK and US and demographic and geographical subgroups. In Study 3 (N = 421), we provide an example of how to implement Verification done and the MIST in the field by applying it in the naturalistic setting of a well-replicated media literacy intervention, the Bad News Game (https:// www. getba dnews. com/). Whereas ample prior studies have attested to the theoretical mechanisms and effects that contribute to the Bad News Game’s effectiveness in reducing misinformation sus- ceptibility (see, e.g., Maertens et al., 2021; Roozenbeek & van der Linden, 2019), within-subject repeated-measures analy- ses of the MIST-8 for pre-and post-game tests in conjunction with the Verification done framework reveal important new insights about how the intervention affects people across dif- ferent evaluative dimensions. This paper demonstrates the ben- efits of integrated theory and assessment development, result- ing in a framework providing nuanced, multifaceted insights that can be gained from a short, versatile, psychometrically sound, and easy-to-administer new measure. Table 1 offers a comprehensive summary of all samples used, detailing their size, demographic breakdowns, included measures, country of origin, recruitment platform, and whether or not they (a) used nationally representative quota and (b) were preregistered. Study 1: Development—Scale construction, exploratory analyses, and psychometric properties Following classic (Clark & Watson, 1995; Loevinger, 1957) and recent (Boateng et al., 2018; Rosellini & Brown, 2021; Zickar, 2020) psychometrics guidelines, and taking into account insights from misinformation scholars (Pennycook et al., 2021a; Roozenbeek et al., 2021b), we devised a four- stage, preregistered scale development protocol (i.e., 1— item generation, 2—expert filtering, 3—quality control, and 4—data-driven selection), shown in Fig. 2. Method Preparatory steps Phase 1: Item generation Fake news There is a debate in the literature on whether the misinformation items administered in misinformation stud- ies should be actual news items circulating in society, or news 3 We chose the binary coding approach (i.e., true versus false head- line) because it allows us to create a straightforward and easy-to-inter- pret structure with either a correct or an incorrect response for each item, which is also easy to implement and analyze in a performance- based IRT model, without compromising on quality (e.g., in Studies 1–2 we validated the MIST with items that are administered with Lik- ert scales, providing evidence for its broader predictive validity). 4 Note that distrust and naïvité were not included in the psychomet- ric scale development protocol, but only added later on as a post hoc calculation. The factor structure used for the scale development using EFA/IRT analyses can be found in Fig. 9, and the structure used for the EGA-based scale can be found in Fig. 7. https://www.getbadnews.com/ https://www.getbadnews.com/ 1867Behavior Research Methods (2024) 56:1863–1899 1 3 Ta bl e 1 S um m ar y of sa m pl es St ud y 1: D ev el - op m en t St ud y 2: V al id at io n St ud y 3: A pp lic at io n Sa m pl e 1A 2A 2B 2C 2D 2E 3 N 40 9 34 79 51 0 12 27 12 45 12 13 42 1 C ou nt ry o f or ig in U SA U SA U SA U K U K U SA U SA N at io na lly re pr es en ta - tiv e qu ot a N o Ye s Ye s Ye s Ye s Ye s N o Re cr ui tm en t pl at fo rm Pr ol ifi c Re sp on di C lo ud Re se ar ch Re sp on di Pr ol ifi c Re sp on di Ba d Ne ws G am e Pr er eg ist ra tio n Ye s N o Ye s N o N o N o N o D em og ra ph ic co m po si tio n A ge M ag e = 3 3. 20 SD ag e = 1 1. 85 A ge M ag e = 4 5. 10 SD ag e = 1 6. 16 A ge M ag e = 4 9. 25 SD ag e = 1 6. 96 A ge M ag e = 4 5. 34 SD ag e = 1 6. 52 A ge M ag e = 4 4. 66 SD ag e = 1 5. 65 A ge M ag e = 4 5. 21 SD ag e = 1 7. 35 A ge 55 .5 8% [1 8, 29 ] 32 .3 0% [3 0, 49 ] 12 .1 1% [5 0, 99 ] G en de r 55 .5 0% fe m al e 42 .3 0% m al e 2. 20 % o th er /n on - bi na ry G en de r 51 .1 1% fe m al e 48 .8 4% m al e 0. 06 % o th er /n on bi na ry G en de r 55 .8 8% fe m al e 43 .5 3% m al e 0. 59 % o th er /n on bi na ry G en de r 51 .6 7% fe m al e 48 .3 3% m al e 0. 00 % o th er /n on - bi na ry G en de r 52 .5 3% fe m al e 47 .0 7% m al e 0. 40 % o th er /n on - bi na ry G en de r 54 .0 0% fe m al e 44 .1 9% m al e 1. 81 % n on bi na ry G en de r 52 .0 2% fe m al e 41 .0 9% m al e 6. 89 % o th er / no nb in ar y Et hn ic ity – Et hn ic ity 76 .8 9% W hi te , C au ca si an , A ng lo , o r E ur op ea n A m er ic an 8. 39 % A si an o r A si an A m er i- ca n 6. 00 % H is pa ni c or L at in o 5. 98 % B la ck o r A fr ic an A m er ic an 1. 12 % N at iv e A m er ic an o r A la sk an N at iv e 0. 54 % M id dl e Ea ste rn 0. 30 % H aw ai ia n or P ac ifi c Is la nd er 0. 77 % O th er /P re fe r n ot to an sw er Et hn ic ity 68 .8 1% W hi te , C au ca si an , A ng lo , o r Eu ro pe an A m er ic an 4. 28 % A si an o r A si an A m er ic an 11 .0 5% H is pa ni c or L at in o 12 .1 2% B la ck o r A fr ic an A m er ic an 2. 50 % N at iv e A m er ic an o r A la sk an N at iv e 0. 18 % M id dl e Ea ste rn 1. 07 % O th er /P re fe r n ot to a ns w er Et hn ic ity 87 .3 3% W hi te 6. 95 % A si an 2. 45 % B la ck 0. 08 % A ra b 2. 13 % M ix ed 1. 06 % O th er Et hn ic ity 86 .1 0% W hi te 7. 47 % A si an 3. 53 % B la ck 0. 16 % A ra b 1. 61 % M ix ed 1. 12 % O th er Et hn ic ity – Et hn ic ity – 1868 Behavior Research Methods (2024) 56:1863–1899 1 3 Ta bl e 1 (c on tin ue d) St ud y 1: D ev el - op m en t St ud y 2: V al id at io n St ud y 3: A pp lic at io n Ed uc at io n 1. 47 % L es s t ha n hi gh sc ho ol de gr ee 9. 29 % H ig h sc ho ol g ra du at e 31 .3 0% S om e co lle ge b ut n o de gr ee 38 .8 8% B ac h- el or 's de gr ee in co lle ge 1. 96 % P ro fe s- si on al d eg re e 13 .4 5% M as te r's de gr ee 3. 67 % D oc to ra l de gr ee Ed uc at io n 1. 74 % D id n ot c om pl et e hi gh sc ho ol 34 .9 8% H ig h sc ho ol d eg re e or e qu iv al en t 15 .0 8% A ss oc ia te d eg re e 31 .8 4% D eg re e (b ac he lo r’s ) or e qu iv al en t 15 .1 1% D eg re e (m as te r’s ) o r ot he r p os tg ra du at e qu al i- fic at io n 1. 25 % D oc to ra te 0. 97 % O th er /P re fe r n ot to sa y Ed uc at io n 2. 55 % L es s t ha n hi gh sc ho ol d eg re e 25 .1 0% H ig h sc ho ol g ra du at e 27 .4 5% S om e co lle ge b ut n o de gr ee 26 .0 8% B ac he lo r's d eg re e in c ol le ge 1. 57 % P ro fe ss io na l d eg re e 13 .9 2% M as te r's d eg re e 3. 33 % D oc to ra l d eg re e Ed uc at io n 11 .0 3% N o fo rm al ed uc at io n ab ov e ag e 16 16 .1 8% P ro fe s- si on al o r t ec hn ic al qu al ifi ca tio ns ab ov e ag e 16 27 .1 2% S ch oo l ed uc at io n up to ag e 18 31 .9 4% D eg re e (b ac he lo r’s ) o r eq ui va le nt 12 .0 9% D eg re e (m as te r’s ) o r ot he r p os tg ra du - at e qu al ifi ca tio n 1. 63 % D oc to ra te Ed uc at io n 6. 27 % N o fo rm al ed uc at io n ab ov e ag e 16 10 .6 8% P ro fe s- si on al o r t ec hn i- ca l q ua lifi ca tio ns ab ov e ag e 16 25 .2 2% S ch oo l ed uc at io n up to ag e 18 38 .6 3% D eg re e (b ac he lo r’s ) o r eq ui va le nt 16 .8 7% D eg re e (m as te r’s ) o r ot he r p os tg ra du - at e qu al ifi ca tio n 2. 33 % D oc to ra te Ed uc at io n 2. 72 % L es s t ha n hi gh sc ho ol d eg re e 24 .7 3% H ig h sc ho ol gr ad ua te o r e qu iv al en t 19 .7 9% S om e co lle ge , bu t n o de gr ee 11 .5 4% A ss oc ia te de gr ee in c ol le ge , 2- ye ar 25 .7 2% B ac he lo r’s de gr ee in c ol le ge , 4- ye ar 12 .4 5% M as te r’s d eg re e 2. 14 % P ro fe ss io na l de gr ee , J D , M D 0. 91 % D oc to ra l d eg re e Ed uc at io n 14 .4 9% H ig h sc ho ol o r l es s 36 .1 0% S om e co lle ge 49 .4 1% H ig he r de gr ee M ea su re d co ns tru ct s - M IS T- 10 0 - B SR - C M Q - C O VI D -1 9 co m - pl ia nc e - C RT - D EP IC T - C V1 9 fa ct -c he ck - M IS T- 20 (i nc l. M IS T- 8) - A O T - A nt i-v ac ci na tio n at tit ud es - C O VI D -1 9 m is in fo rm at io n be lie fs -C RT - N um er ac y - P ol iti ca l i de ol og y - T ru st (i n sc ie nt is ts , j ou rn al - is ts , p ol iti ci an s, th e go ve rn - m en t) - M IS T- 20 (i nc l. M IS T- 8) - B SR - B FI 2- S - C M Q - E D O - D EP IC T SF - G o Vi ra l! - M FQ 20 - S D 4 - S D O - S IN S - S IS ES - S IR IS - S SP C - T ru st (i n m ed ic al p er so nn el , s ci en - tis ts , p ol iti ci an s, jo ur na lis ts , t he go ve rn m en t, sc ie nt ifi c kn ow le dg e, ci vi l s er va nt s, m ai ns tre am m ed ia ) - M IS T- 20 (i nc l. M IS T- 8) - N um er ac y - P ol iti ca l i de ol og y - T ru st (i n m ed ic al pe rs on ne l, sc ie n- tis ts , p ol iti ci an s, jo ur na lis ts , t he go ve rn m en t, sc i- en tifi c kn ow le dg e, ci vi l s er va nt s, m ai ns tre am m ed ia ) - M IS T- 20 (i nc l. M IS T- 8) - N um er ac y - P ol iti ca l i de ol og y - T ru st (i n m ed ic al pe rs on ne l, sc ie n- tis ts , p ol iti ci an s, jo ur na lis ts , t he go ve rn m en t, sc i- en tifi c kn ow le dg e, ci vi l s er va nt s, m ai ns tre am m ed ia ) - M IS T- 16 - M IS T- 8 - B N A O T = A ct iv el y O pe n- m in de d Th in ki ng (B ar on , 2 01 9) ; B FI -2 -S = B ig -F iv e In ve nt or y 2 Sh or t-F or m (S ot o & Jo hn , 2 01 7) ; B N = B ad N ew s G am e (R oo ze nb ee k & v an d er L in de n, 2 01 9) ; B SR = Bu lls hi t R ec ep tiv ity sc al e (P en ny co ok et  a l., 2 01 5) ; C M Q = C on sp ira cy M en ta lit y Q ue sti on na ire (B ru de r e t a l., 2 01 3) ; C RT = C og ni tiv e Re fle ct io n Te st (F re de ric k, 2 00 5) ; D EP IC T = D isc re di tin g- Em ot io n- Po la riz at io n- Im pe rs on at io n- Co ns pi ra cy -T ro lli ng d ec ep tiv e h ea dl in es in ve nt or y (M ae rte ns et  al ., 20 21 ); D EP IC T SF = D EP IC T Ba la nc ed S ho rt Fo rm (M ae rte ns et  al ., 20 21 ); ED O = E co lo gi ca l D om in an ce O rie nt at io n (U en al et  al ., 20 22 ); CV 19 fa ct -c he ck = C O V ID -1 9 fa ct -c he ck ta sk (P en ny co ok , M cP he tre s, et  al ., 20 20 ); G o V ira l! = G o V ira l! Ba la nc ed It em S et (B as ol et  al ., 20 21 ); M FQ 20 = M or al F ou nd at io ns Q ue sti on na ire 2 0- Ite m S ho rt Fo rm (G ra ha m et  al ., 20 11 ); N um er ac y = co m bi na tio n of S ch w ar tz N um er ac y Te st (S ch w ar tz et  al ., 19 97 ) a nd B er lin N um er ac y Te st (C ok el y et  al ., 20 12 ), SD 4 = S ho rt D ar k Te tra d (P au lh us et  al ., 20 20 ); SD O = S oc ia l D om in an ce O rie nt at io n (H o et  al ., 20 15 ); SI N S = th e S in gl e- Ite m N ar ci ss ism S ca le (K on ra th et  al ., 20 14 ); SI SE S = S in gl e- Ite m Se lf- Es te em S ca le (R ob in s e t a l., 2 00 1) S IR IS = S in gl e- Ite m R el ig io us Id en tifi ca tio n Sc al e ( N or en za ya n & H an se n, 2 00 6) ; S SP C = S ho rt Sc al e o f P ol iti ca l C yn ic ism (A ic hh ol ze r & K rit zi ng er , 2 01 6) 1869Behavior Research Methods (2024) 56:1863–1899 1 3 Fig. 2 Development protocol of the Misinformation Susceptibility Test 1870 Behavior Research Methods (2024) 56:1863–1899 1 3 items created by experts that are fictional but feature common misinformation techniques. The former approach arguably provides better ecological validity (Pennycook, Binnendyk, et al., 2021), while the latter provides a cleaner and less con- founded measure since it is less influenced by memory and identity effects (van der Linden & Roozenbeek, 2020). Con- sidering these two approaches and reflecting on representative stimulus sampling (Dhami et al., 2004), we opted for a novel approach that combines the best of both worlds. We employed the generative pretrained transformer 2 (GPT-2)—a neutral- network-based artificial intelligence developed by OpenAI (Radford et al., 2019)—to generate fake news items (cf., Götz et al., 2022; Hommel et al., 2022). The GPT-2 is one of the most powerful open-source text generation tools currently avail- able for free use by researchers. It was trained on eight mil- lion text pages, combines 1.5 billion parameters, and is able to write coherent and credible articles based on just one or a few words of input.5 We did this by asking the GPT-2 to generate a list of fake news items inspired by a smaller set of items. This smaller set contained items from any of five different scales that encompass a wide range of misinformation properties: the Belief in Conspiracy Theories Inventory (BCTI; Swami et al., 2010), the Generic Conspiracist Beliefs scale (GCB; Brotherton et al., 2013), specific Conspiracy Beliefs scales (van Prooijen et al., 2015), the Bullshit Receptivity scale (BSR; Pennycook et al., 2015), and the Discrediting-Emotion-Polarization-Imper- sonation-Conspiracy-Trolling deceptive headlines inventory (DEPICT; Maertens et al., 2021; Roozenbeek & van der Lin- den, 2019). We set out to generate 100 items of good quality, but as this is a new approach, we opted for the generation of at least 300 items. More specifically, we let GPT-2 generate thou- sands of fake news headlines, and tossed out any duplicates and clearly irrelevant items (see Supplement S1 for a full overview of all items generated and those that have been removed). Real news For the real news items, we decided to include items that met each of the following three selection cri- teria: (1) the news items are actual news items (i.e., they circulated as real news), (2) the news source is the most factually correct (i.e., accurate), and (3) is the least biased (i.e., nonpartisan or politically centrist). To do this, we used the Media Bias/Fact Check database (MBFC; https:// media biasf actch eck. com/) to select news sources marked as least biased and scoring very high on factual reporting.6 The news sources we chose were Pew Research (https:// www. pewre search. org/), Gallup (https:// www. gallup. com/), MapLight (https:// mapli ght. org/), Associated Press (https:// www. ap. org/), and World Press Review (http:// world press. org/). We also diversified the selection by including the non-US outlets Reuters (https:// www. reute rs. com/), Africa Check (https:// afric acheck. org/), and JStor Daily (https:// daily. jstor. org/). All outlets received the maximum MBFC score at the time of item selection.7 A full list of the real news items selected can be found in Supplement S1. Overall, this item-generation process resulted in an initial pool of 413 items. The full list of items we produced and methods through which each of them was obtained can be found in Supplement S1. Phase 2: Item condensation To reduce the number of head- lines generated in Phase 1, we followed previous scale devel- opment research and practices (Carpenter, 2018; Haynes et al., 1995; Simms, 2008) and established an expert com- mittee with misinformation researchers from four different cultural backgrounds: Canada, Germany, the Netherlands, and the United States. Each expert conducted an independ- ent review and classified each of the 413 items generated in Phase 1 as either fake news or real news. All items with a three-fourths expert consensus and matching the correct answer key (i.e., the source veracity category)—a total of 289 items—were selected for the next phase.8 A full list of the expert judgments and inter-rater agreement can be found in Supplement S1. Phase 3: Quality control As a final quality control before continuing to the psychometrics study, the two-person item generation committee in combination with an extra third expert—who had not been previously exposed to any of the items—made a final selection of items from Phase 2. Apply- ing a two-thirds expert consensus as cutoff, we selected 100 items (44 fake news, 56 real news) out of the 289 from the previous stage (i.e., we cut 189 items), thus creating a fairly balanced item pool for empirical probing that hosted five times as many items as the final scale that we aimed to con- struct—in keeping with conservative guidelines (Boateng et al., 2018; Weiner et al., 2012). A full list of the item sets 6 MBFC is an independent fact-checking platform that rates media sources on factual reliability as well as ideological bias. At the time of writing, the MBFC database lists over 3700 media outlets and its classifications are frequently used in scientific research (e.g., Bovet & Makse, 2019; Chołoniewski et al., 2020; Cinelli et al., 2021). 7 Three out of six no longer receive the maximum score, and are now considered to have a center-left bias, and score between mostly fac- tual and highly factual reporting: World Press Review (mostly factual, center-left), MapLight (highly factual, center-left), and JStor Daily (highly factual, center-left). This reflects both the dynamic nature of news media and the limits of the classification methodology used. 8 We used three-fourths as a criterion instead of 100% consensus because, as experts, we may be biased ourselves, and therefore we also accepted items where only one expert did not agree. If less than 120 items would remain, then the Phase 1 item generation process would be restarted. 5 For a step-by-step guide on how to set up the GPT-2 to use as a psychometric item generator, see the tutorial paper by Götz et  al. (2023), as well as the useful blog posts by Woolf (2019), Nasser (2020), and Curley (2020). https://mediabiasfactcheck.com/ https://mediabiasfactcheck.com/ https://www.pewresearch.org/ https://www.pewresearch.org/ https://www.gallup.com/ https://maplight.org/ https://www.ap.org/ https://www.ap.org/ http://worldpress.org/ https://www.reuters.com/ https://africacheck.org/ https://africacheck.org/ https://daily.jstor.org/ 1871Behavior Research Methods (2024) 56:1863–1899 1 3 selected per expert and expert agreement can be found in Supplement S1. Implementation Participants In line with widespread recommendations to assess at least 300 respondents during initial scale implementa- tion (Boateng et al., 2018; Clark & Watson, 1995, 2019; Com- rey & Lee, 1992; Guadagnoli & Velicer, 1988), we recruited a community sample of 452 US residents (for a comprehensive sample description see Table 1). The study was carried out on Prolific Academic (https:// www. proli fic. co/), an established crowd-working platform which provides competitive data quality (Palan & Schitter, 2018; Peer et al., 2017). Based on the exclusion criteria laid out in the preregistration, we removed incomplete cases, participants who took either an unreasonably short or long time to complete the study (less than 8 minutes or more than 2 hours), participants who failed an attention check, underage participants, and participants who did not live in the United States, retaining 409 cases for data analysis.9 Of these, 225 participants (i.e., 55.01%) participated in the follow-up data collection eight months later (T2).10 Participants received a set remuneration of 1.67 GBP (equivalent to US$ 2.32) for participating in the T1 ques- tionnaire and 1.10 GBP (equivalent to US$ 1.53) for T2. Procedure, measures, transparency, and openness The preregistrations for T1 and T2 are available on AsPre- dicted https:// aspre dicted. org/ m7vb3. pdf; https:// aspre dicted. org/ js2jz. pdf; any deviations can be found in Sup- plement S2). The supplement, raw and clean datasets, and all analysis scripts in R can be found in the OSF repository (https:// osf. io/ r7phc/). Participants took part in a preregistered online survey. After providing informed consent, participants had to cat- egorize the 100 news headlines from Phase 3 (i.e., the items that were retained after the previous three phases) in two categories: Fake/Deceptive and Real/Factual.11 Participants were told that each headline had only one correct answer. See the preregistration or the Qualtrics files on the OSF reposi- tory for the exact survey framing (https:// osf. io/ r7phc/). After completing the 100-item categorization task, par- ticipants completed the 21 items from the DEPICT inven- tory (a misleading social media post reliability judgment task; Maertens et al., 2021), a 30-item COVID-19 fact-check task (a classical true/false headline evaluation task; Penny- cook, McPhetres, et al., 2020), the Bullshit Receptivity scale (BSR; Pennycook et al., 2015), the Conspiracy Mentality Questionnaire (CMQ; Bruder et al., 2013), the Cognitive Reflection Test (CRT; Frederick, 2005), a COVID-19 com- pliance index (sample item: “I kept a distance of at least two meters to other people”: 1 – does not apply at all, 4 – applies very much), and a demographics questionnaire (see Table 1 for an overview). Finally, participants were debriefed. Eight months later, the participants were recruited again for a test-retest follow-up survey.12 In the follow-up survey, after participants provided informed consent to participate, the final 20-item MIST was administered, the same COVID-19 fact-check task (Pennycook, McPhetres, et al., 2020) and CMQ (Bruder et al., 2013) were repeated, a new COVID-19 compliance index was administered, and finally a full debrief was presented. The complete surveys are available in the OSF repository: https:// osf. io/ r7phc/. The full study received institutional review board (IRB) approval from the Psychology Research Ethics Committee of the University of Cambridge (PRE.2019.108). Analytical strategy 1: Exploratory factor analysis (EFA) and item response theory (IRT) To extract the final MIST-20 and MIST-8 scales from the pre-filtered MIST-100 item pool, we followed an item selec- tion decision tree, which can be found in Supplement S3. Specifically—after ascertaining the general suitability of the data for such procedures—the following EFA- and IRT- based exclusion criteria were employed: (1) factor loadings below .40 (Clark & Watson, 2019; Ford et al., 1986; Hair et al., 2010; Rosellini & Brown, 2021); (2) cross-loadings above .30 (Boateng et al., 2018; Costello & Osborne, 2005); (3) communalities below .4 (Carpenter, 2018; Fabrigar et al., 1999; Worthington & Whittaker, 2006); (4) Cronbach’s α reliability analysis; (5) differential item functioning (DIF) analysis (Holland & Wainer, 1993; Nguyen et al., 2014; Reise et  al., 1993); (6) item information function (IIF) analysis. Finally, we sought to establish initial evidence for construct validity (Cronbach & Meehl, 1955). To do this, we investigated the associations between the MIST scales and 11 All headlines can be found in Supplement S1. 12 We chose to have a follow-up to be able to measure changes in the MIST score over the medium long term. We found a period of eight months fitting for this purpose. 9 We preregistered that we would split the sample in half for explora- tory analyses and confirmatory analyses. However, we used the full Study 1 sample for exploratory analyses instead and conducted a new study with a fresh sample (Study 2) for the confirmatory analyses. This more rigorous and more conservative approach was chosen to boost power and increase the quality of the initial item selection. 10 We looked at the difference in demographics between T1 and T2 Prolific users. While we found no noteworthy differences in age (MT1 = 33.20, MT2 = 35.76) or educational attainment rates, (T1: 38.88% with bachelor’s degree, T2: 41.52% with bachelor’s degree), the per- centage of female participants rose somewhat during the follow-up (T1: 55.50% male, T2: 39.72% male). https://www.prolific.co/ https://aspredicted.org/m7vb3.pdf https://aspredicted.org/js2jz.pdf https://aspredicted.org/js2jz.pdf https://osf.io/r7phc/ https://osf.io/r7phc/ https://osf.io/r7phc/ 1872 Behavior Research Methods (2024) 56:1863–1899 1 3 the DEPICT deceptive headline recognition task (Maertens et al., 2021) and COVID-19 fact-check (Pennycook et al., 2020; concurrent validity). We further examined additional predictive accuracy of the MIST in accounting for variance in DEPICT and fact-check scores above and beyond the CMQ (Bruder et al., 2013), BSR (Pennycook et al., 2015), and CRT (Frederick, 2005; incremental validity). Analytical strategy 2: Exploratory graph analysis (EGA) In this section we explore an alternative method of scale development, based on the new field of exploratory graph analysis (Golino & Epskamp, 2017), rooted in network methods. Network methods in psychology gained momen- tum with the publication of the mutualism model of intel- ligence (Van Der Maas et al., 2006) and network perspec- tive on psychopathology (Borsboom, 2008; Borsboom et al., 2011; Cramer et al., 2010), giving rise to a new subfield of quantitative psychology called network psychometrics (Epskamp et al., 2017; Epskamp et al., 2018). Network models are used to estimate the relationship between mul- tiple variables—typically using the Gaussian graphical model (GGM; Lauritzen, 1996), where nodes (e.g., test items) are connected by edges (or links) that indicate the strength of the association between the variables (Epskamp & Fried, 2018), forming a system of mutually reinforcing elements (Christensen et al., 2020b; Cramer, 2012). Network and latent variable models have been shown to be closely related, and can produce model parameters that are consist- ent with one another (Boker, 2018; Christensen & Golino, 2021c; Epskamp et al., 2017; Golino et al., 2021; Golino & Epskamp, 2017; Marsman et al., 2018). These statistical similarities can be used as a way to explore the dimension- ality structure of measurement instruments in a new frame- work termed exploratory graph analysis (Christensen et al., 2019; Golino & Demetriou, 2017; Golino & Epskamp, 2017; Golino et al., 2020a, 2020b). In network psychometrics (Christensen et  al., 2019; Epskamp et  al., 2018; Epskamp et  al., 2017; Golino & Demetriou, 2017; Golino & Epskamp, 2017; Golino et al., 2020a, 2020b), networks are typically estimated using the Gaussian graphical model (Lauritzen, 1996) using the EBIC- glasso approach (Epskamp & Fried, 2018). The EBICglasso approach operates by minimizing a penalized log-likelihood function and selecting the best model fit (i.e., the optimum level of sparsity in a network) using the extended Bayes- ian information criterion (EBIC; Chen & Chen, 2008). As Golino et al. (2022) argue, the use of weighted network models in psychology opened the doors for network science methods developed in other areas of science to psychologi- cal problems such as dimensionality (e.g., factor analysis). Exploratory graph analysis was originally proposed by Golino and Epskamp et al. (2017), which showed that the GGM model combined with a clustering algorithm for weighted networks (Walktrap; Pons & Latapy, 2005) could accurately recover the number of simulated factors, present- ing higher accuracy than traditional factor analytic-based methods. Later, Golino, Shi, et al. (2020b) compared EGA with different types of factor analytic methods (including two types of parallel analysis), finding that EGA achieves the highest overall accuracy (87.91%) in estimating the num- ber of simulated factors, followed by the traditional parallel analysis with principal components of Horn (1965; 83.01%), and parallel analysis using principal axis factoring proposed by Humphreys and Ilgen (1969; 81.88%). Golino et al. (2022) summarized the advantages of the EGA framework over more traditional methods (Golino, Shi, et al., 2020b): (1) unlike exploratory factor analysis (EFA) methods, EGA does not require a rotation method to interpret the estimated first-order factors (although rotations are rarely discussed in the validation literature, they have significant consequences for validation, e.g., estimation of factor loadings; Sass & Schmitt, 2010); (2) EGA automati- cally places items into factors without the researcher’s direc- tion, which contrasts with exploratory factor analysis, where researchers must decipher a factor loading matrix (such a placement opens the door for dimension and item stabil- ity methods, which is presented next); and (3) the network representation depicts how items relate within and between dimensions. Over the past couple of years, the EGA framework has expanded into several important areas of psychometrics. Christensen and Christensen and Golino (2021c) devel- oped a new metric termed network loadings computed by standardizing node strength—the sum of the edges a node is connected to—split between dimensions identified by EGA. Christensen and Christensen and Golino (2021c) showed in their simulation study that network loadings are akin to factor loadings, but with different reference values. Network loadings of .15, .25, and .35 are equivalent to low (.40), moderate (.55), and high (.70) network loadings, respec- tively (Christensen & Golino, 2021c). The development of network loadings opened new lines of research, such as the development of metric invariance using EGA and permuta- tion tests in a network perspective (Jamison et al., 2022), and determining whether data are generated from a factor or network model (Christensen & Golino, 2021b). Based on the automated item placement of EGA, Chris- tensen and Golino (2021a) developed a bootstrap approach to investigate the stability of items and dimensions estimated by EGA, termed bootstrap exploratory graph analysis, and proposed two new metrics of psychometric quality: item sta- bility and structural consistency. Item stability indicates how often an item replicates in their designated EGA dimension, with values lower than .75 (i.e., that are estimated in their original dimensions in 75% of the bootstrapped samples) 1873Behavior Research Methods (2024) 56:1863–1899 1 3 indicating problematic (or unstable) items. Structural con- sistency, by its turn, indicates how often an EGA dimension exactly replicates and can be used to verify configural (or structural) invariance and determine poor-functioning items (Golino et al., 2022). A complementary approach, called unique variable analysis, was developed to identify redun- dant items and can be used to identify the reason why some items function poorly (Christensen, Garrido, & Golino, 2020a). The fit of a dimensionality structure estimated using EGA to the data can be verified using an innovative fit index termed total entropy fit index (TEFI; Golino, Moulder, et al., 2020a), developed as an alternative to traditional fit meas- ures used in factor analysis and structural equation modeling (SEM). In a comprehensive simulation study, the TEFI dem- onstrated higher accuracy in correctly identifying the num- ber of simulated factors than the comparative fit index (CFI), the root mean square error of approximation (RMSEA), and other indices used in SEM (Golino, Moulder, et al., 2020a). The TEFI is based on the Von Neumann entropy (Von Neumann, 1927)—a measure developed to quantify both the amount of disorder in a system and the entangle- ment between two subsystems (Preskill, 2018). The TEFI index is a relative measure of fit that can be used to compare two or more dimensionality structures. The dimensionality structure with the lowest TEFI value indicates the best fit for the data. Another recent development within the EGA framework is the hierarchical EGA (hierEGA) technique by Jimenez et al. (2022). In their work, Jimenez et al. (2022) proposed an alternative variation to a popular clustering algorithm called Louvain (Blondel et al., 2008) to detect lower- and higher-order factors in data, and showed that this new tech- nique is more effective than traditional factor analytic tech- niques to estimate the structure of first- and second-order factors in generalized bifactor structures. All the EGA-based techniques/metrics mentioned above use the free and open-source R package EGAnet (Golino & Christensen, 2019), which has become one of the main software programs in network psychometrics. In the cur- rent paper, version 1.2.4 of the EGAnet package (Golino & Christensen, 2019) was used, and several strategies were implemented. The first strategy aimed at estimating the dimensionality structure of the 100 MIST items. Then, redundant items were identified using unique variable analy- sis (Christensen et al., 2020a), and for every group or pair of redundant items the one with the higher ratio of main net- work loadings to cross-loadings was kept in the analysis. The stability of the items and the structural consistency of the dimensions were obtained via bootstrap exploratory graph analysis (Christensen & Golino, 2021a) with 500 iterations (using parametric bootstrapping), and items with stability lower than 75% and network loadings lower than .15 were removed from subsequent steps. Once a subset of stable items with at least low to moderate network loadings were found, a subset of the best items per dimension (i.e., with moderate to high network loadings—with a network load- ing of at least .23) were identified, and further item stability and structural consistency metrics were computed until all items were highly stable (with item stability greater than 90%). The metric invariance of the final pool of best items per dimension (moderate to high network loadings and high item stability) was investigated using the EGA permutation test developed by Jamison et al. (2022), having as reference groups sex, age (above or below the median birth year), and education (above or below the median level of formal edu- cation received). The fit of the EGA-estimated dimensions to the data was computed using the total entropy fit index (Golino, Moulder, et al., 2020a) and compared to the two- factor structure of real and fake news items identified using EFA. CFI and RMSEA computed after fitting a confirmatory factor model to the EGA-estimated dimensions were also obtained, and compared to the CFI and RMSEA of the two- factor structure. Additionally, the Satorra (Satorra, 2000) scaled difference test was implemented to verify the struc- ture with the best fit to the data. Results EFA/IRT results Item selection Using parallel analysis with the psych pack- age (Revelle, 2021), we aimed to select a parsimonious fac- tor structure, with each factor reflecting eigenvalues above the 95th percentile of corresponding eigenvalues from 500 simulated random datasets.13 Parallel analysis (with 500 iter- ations) suggested a total of six factors, but only five factors (eigenvalues: F1 = 10.89, F2 = 7.82, F3 = 1.89, F4 = 1.42, F5 = 1.23, F6 = 0.98) matched our criteria and were above the 95th percentile of corresponding eigenvalues from the 500 simulated random datasets (eigenvalue 95th percentile = 0.99).14 Two factors explained most of the variance, which is in line with our theoretical model of two main factors (fake news detection and real news detection). An EFA using the tetrachoric correlation matrix with unweighted least squares 13 The factorability of the data was tested via the Kaiser–Meyer– Olkin (KMO) measure of sampling adequacy and Bartlett’s test of sphericity using R and the EFAtools package (Steiner & Grieder, 2020). Both tests indicated excellent data suitability (Bartlett’s χ2 = 12,896.84, df = 4950, p < .001; KMO = .831) according to estab- lished guidelines (Carpenter, 2018; Tabachnick & Fidell, 2007). 14 These five factors are in line with the criteria set out in the pre- registration, as they have both (i) an eigenvalue > 1 and (ii) an eigen- value larger than the simulated value (above the line of randomly generated data). 1874 Behavior Research Methods (2024) 56:1863–1899 1 3 (ULS) estimation without rotation using the EFAtools pack- age (Steiner & Grieder, 2020) indicated that for both the two-factor structure and the five-factor structure, the first two factors were specifically linked to the real news items and the fake news items, respectively, while the other three factors did not show a pattern easy to interpret and in general showed low factor loadings (< .30).15 See Supplement S4 for a pattern matrix. As we set out to create a measurement instrument for two distinct abilities, real news detection and fake news detection, we continued with a two-factor EFA, employ- ing principal axis factoring and varimax rotation using the psych package (Revelle, 2021).16 Theoretically we would expect a balancing out of positive and negative correlations between the two factors: positive because of the underlying veracity discernment ability, and negative because of the response biases. In line with this, we chose an orthogonal rotation instead of an oblique rotation to separate out fake news detection and real news detection as cleanly as possible. Three iterations were needed to remove all items with a factor loading under .40 (43 items were removed). After this pruning, no items showed cross-loadings larger than .30. Communality analysis using the three-parameter logistic model function in the mirt package (Chalmers, 2012) with 50% guessing chance (c = .50) indicated two items with communality lower than .40 after one iteration. These items were removed. No further iterations yielded any additional removals. A final list of the communalities can be found in Supplement S5. Cronbach’s α reliability analysis with the psych package was used to remove all items that had nega- tive effects (∆α > .001) on the overall reliability of the test (Revelle, 2021). No items had to be removed based on this analysis.17 Differential item functioning using the mirt pack- age was used to explore whether differences in gender or ideology would alter the functioning of the items (Chalmers, 2012). None of the items showed differential functioning for gender or ideology. Finally, using the three-parameter logistic model IRT functions in the mirt package (Chalmers, 2012), we selected the 20 best items (10 fake, 10 real) and the 8 best items (4 fake, 4 real), resulting in the MIST-20 and the MIST-8, respectively. These items were selected based on their discrimination and difficulty values, where we aimed to select a diverse set of items that have high discrimina- tion (a ≥ 2.00 for the MIST-20, a ≥ 3.00 for the MIST-8) yet have a wide range of difficulties (b = [−0.50, 0.50], for each ability), while keeping the guessing parameter at 50% chance (c =.50). We also took into account the topics to ensure both that we covered a wide range of news areas and that there was no repetition of content (Flake et al., 2017). A list of the IRT coefficients and plots can be found in Supplement S1 and Supplement S6, respectively. See Fig. 3 for a MIST-20 item trace line plot, and Fig. 4 for a multidimensional plot of the MIST-20 IRT model pre- dictions. The final items that make up the MIST-20 and MIST-8 are shown in Table 2.18 An overview of different candidate sets and how they performed, as well as the full analysis scripts and the supplement, can be found in the OSF repository: https:// osf. io/ r7phc/. Reliability Inter-item correlations show good internal con- sistency for both the MIST-8 (IICmin = .20, IICmax = .27) and the MIST-20 (IICmin = .22, IICmax = .29). Item-total correlations also show good reliability for both the MIST-8 (ITCmin = .44, ITCmax = .53) and the MIST-20 (ITCmin = .31, ITCmax = .54). Looking further into the MIST-20, we analyze the reli- ability of veracity discernment (V; M = 15.71, SD = 3.35), real news detection (r; M = 7.62, SD = 2.43), and fake news detection (f; M = 8.09, SD = 2.10). In line with the guidelines by Revelle and Condon (2019), we calculate a two-factor McDonald’s ω (McDonald, 1999) as a measure of internal consistency using the psych package (Revelle, 2021), and find good reliability for the general scale and the two facet scales (ωg = 0.79, ωF1 = 0.78, ωF2 = 0.75). Also using the psych package (Revelle, 2021), we calculate the variance decomposition metrics as a measure of stability, finding that F1 explains 14% of the total variance and F2 explains 12% of the total variance. Of all variance explained, 53% comes from F1 (r) and 47% comes from F2 (f), demonstrating a good balance between the two factors. 15 When using EFA with a promax rotation, there is some evidence for two factors for the fake news items and two factors for the real news items, bringing up a total of four factors, but its pattern and meaning is unclear. This alternative structure will be further explored in the EGA section. 16 While we chose to adhere to the more traditional methods for estimating and rotating factors in EFA, we acknowledge that recent research provides arguments for the use of ML estimation and oblique rotations (Goretzko et  al., 2021), and specifically ULS estimation (using the tetrachoric correlation matrix) for dichotomous variables (see Shi et al., 2018). We provide an alternative, modern approach to item selection based on EGA in the section below. 17 We note that some researchers argue that the focus on reliability can reduce the content validity of the scale, as there may be relevant items with weaker loadings (e.g., Flake et al., 2017). However, as no items were removed, this is not a concern for this study. 18 As can be glimpsed from the final set, the misinformation items contain certain words and topics that are more often linked to manip- ulative content, such as “control/manipulate/cause,” “vaccine/virus,” and “government.” These topics were already present in the sample items given to the GPT-2—which led to more of these topics being present in the original fake news item pool than in the real news item pool. This thus represents a feature that was present since the first phase of the development and is not just a consequence of a later selection by the experts or elimination based on factor loadings. https://osf.io/r7phc/ 1875Behavior Research Methods (2024) 56:1863–1899 1 3 Finally, test–retest reliability analysis indicates that MIST scores are moderately positively correlated over a period of eight to nine months (rT1,T2 = 0.58).19 Validity To assess initial validity, we examined the asso- ciations between the MIST scales and two scales that have been used regularly in previous misinformation research— the COVID-19 fact-check by Pennycook, McPhetres, et al. (2020) and the DEPICT task by Maertens et al. (2021)— expecting high correlations (r > .50; concurrent valid- ity) and additional variance explained as compared to the existing CMQ, BSR, and CRT scales (incremental valid- ity; Clark & Watson, 2019; Meehl, 1978). As can be seen in Table 3, we found that the MIST-8 displays a medium to high correlation with the fact-check (rfact-check,MIST-8 = .49) and DEPICT task (rDEPICT,MIST-8 = .45), while the MIST- 20 shows a large positive correlation with both the fact- check (rfact-check,MIST-20 = .58) and the DEPICT task (rDEPICT,MIST-20 = .50). Using a linear model, we found that the explained variance in the fact-check indicates that the MIST- 20 can explain 33% (adjusted R2) of variance by itself. The CMQ, BSR, and CRT combined account for 19%. Adding the MIST-20 on top provides an incremental 18% of explained variance (adjusted R2 = 0.37). The MIST-20 is the strongest predictor in the combined model (t(404) = 10.82, p < .001, β = 0.49, 95% CI [0.40, 0.57]). For the DEPICT task we found that the CMQ, BSR, and CRT combined explain 12% of variance in deceptive headline recognition and 26% when the MIST-20 is added (∆R2 = 0.14), while the MIST-20 alone explains 25%. For the DEPICT task we found the MIST-20 to be the only significant predictor in the combined model (t(404) = 8.94, p < .001, β = 0.43, 95% CI [0.34, 0.53]).20 EGA results In this section we re-analyze the pool of 100 MIST items using EGA. EGA estimated four dimensions (see Fig. 5), which can be identified as two dimensions of real news headlines and two of fake news headlines. Dimension 1 (red nodes on Fig. 5) is a combination of US and international real news headlines, with items such as MIST 96 (US Hispanic Population Reached New High in 2018, But Growth Has Slowed), MIST 92 (Taiwan Seeks to Join Fight Against Global Warming), and MIST 60 (Hyatt Will Remove Small Bottles from Hotel Bathrooms by 2021). Dimension 2 (blue nodes on Fig. 5) has fake news items about science, such as item MIST 8 (Climate Scientists’ Work Is “Unreliable”, a “Deceptive Method of Communication”), and false statements against people with a liberal world view, such as items MIST 16 (Left-Wingers Are More Likely to Lie to Get a Good Grade) and MIST 20 (New Study: Left-Wingers Are More Likely to Lie to Get a Higher Salary). The third dimension (green nodes on Fig. 5) has real news items related to politically charged topics in the US, such as items MIST 70 (Majority in US Still Want Abortion Legal, with Limits), MIST 74 (Most Americans Say It’s OK for Professional Athletes 19 It must be noted that at T2, participants only completed the 20-item MIST, while at T1 participants had to categorize 100 items, with slightly different question and response framings (see full Qual- trics layouts and question framings in the OSF repository: https:// osf. io/ r7phc/). We expect the actual test–retest correlation to be higher. 20 Full model output for the MIST-8 and MIST-20 linear models can be found in Supplement S8. Full analysis scripts can be found in the OSF repository: https:// osf. io/ r7phc/. Fig. 3 Item trace lines for MIST-20 items, for the fake news items in Panel A and real news items in Panel B. The items in the legend are ordered according to their difficulty level https://osf.io/r7phc/ https://osf.io/r7phc/ https://osf.io/r7phc/ 1876 Behavior Research Methods (2024) 56:1863–1899 1 3 to Speak out Publicly about Politics), and MIST 94 (United Nations Gets Mostly Positive Marks from People Around the World). Dimension 4 (orange nodes on Fig. 5) has fake news items related to general conspiracy beliefs, such as item MIST 1 (A Small Group of People Control the World Economy by Manipulating the Price of Gold and Oil), and conspiracies related to the government, such as items MIST 31 (The Gov- ernment Is Actively Destroying Evidence Related to the JFK Assassination) and MIST 32 (The Government Is Conducting a Massive Cover-Up of Their Involvement in 9/11). The unique variable analysis technique (Christensen et al., 2020a) identified two redundant items: MIST 43 (UN: New Report Shows Shark Fin Soup as ‘the Most Important Source of Protein’ for World’s Poor) and MIST 17 (New Data Show Shark Fins Are the ‘Most Important Source of Protein’ for the World’s Poor). The ratio of network loadings (main/cross-loadings) for these items (8.47 and 6.9, respec- tively) suggested that item MIST 43 should be kept in the subsequent analyses. A bootstrap exploratory graph analysis with 500 iterations (parametric bootstrapping) identified four median dimensions (95% CI: 2.11, 5.89) but with very low structural consistency for each dimension (0.09, 0.14, 0.07, and 0.43 for dimensions 1, 2, 3, and 4, respectively). The item stability metric (Christensen & Golino, 2021a) varied from 23% to 98%, with 40% of items presenting inadequate or moderate stability (i.e., lower than 75%, see Fig. 6). Removing the items with item stability lower than 75% and repeating the parametric bootstrap EGA technique with 500 iterations showed that the stability improved consider- ably, leading to structural consistency between 0.61 (dimen- sion 2) and 0.96 (dimension 4), and mean item stability of 93%. From the 59 items selected in the steps above, a subset with network loadings equal to or higher than .155 were selected from each dimension estimated via EGA, resulting in 34 items. A parametric bootstrap EGA with 500 itera- tions followed by item stability analysis was implemented once again, and items with stability lower than 75% were removed, resulting in 32 items. The final selection of items was implemented using the following strategy. Out of the 32 items selected in the previ- ous steps, only those with relatively high network loadings (≥ .23 or ≥ . 235) were used in the subsequent bootEGA and Fig. 4 Multidimensional IRT plot representing the final MIST-20 test 1877Behavior Research Methods (2024) 56:1863–1899 1 3 item stability analysis, which identified 16 highly stable items (see Fig. 7). Exploratory graph analysis identified the same four dimensions described in the first paragraph of this section, but now they presented very high structural consist- ency ranging from .982 to 1, and very high item stability (ranging from 98 to 100%). The network loadings of the final MIST-16 EGA items are presented in Table 4. A metric invariance analysis for EGA using permutation tests (Jamison et al., 2022) was conducted using sex, mean age, and mean education as grouping variables. None of the items exhibited a significant (p < .05) difference in network loadings across the tested groups, suggesting that the 16 items selected using the EGA framework work similarly irrespective of sex, age, and education (see Supplement S19 for an overview). The fit of the four-dimensional structure estimated via EGA was compared to the fit of the two-factor structure of real and fake news items using the total entropy fit index (Golino, Moulder, et al., 2020a), and two traditional factor-analytic fit measures (CFI and RMSEA). To compute the traditional factor-analytic fit indices, a confirmatory factor analysis was implemented using the WLSMV estimator for each structure (see Fig. 8). Table 5 shows that the EGA four-factor struc- ture presented the lowest TEFI and RMSEA, and the highest CFI, suggesting that the four-factor first-order dimensions estimated via EGA fit the data better than the theoretical two- factor structure, although the two-factor structure also has an acceptable fit. The Satorra (Satorra, 2000; Table 6) scaled difference test also showed that the EGA four-factor structure is preferable to the theoretical two first-order factor structure. Two different traditions were used to select a subset of items, one relying on traditional techniques (EFA and IRT) and another relying on modern network psychometric methods (EGA). Looking at the item stability and structural consistency of the dimensions between the two, we found that the MIST- 16 EGA items are stable and consistent, indicating that the four dimensions estimated using exploratory graph analysis are robust and likely to be identified in independent samples. The 20 items selected using EFA/IRT were less robust in terms of stability (see Supplement S19: EGA Metric Invariance Tests). The low stability for some of the items of MIST-20 might indicate that there are a higher or lower number of dimensions underlying the data. The parametric bootstrap EGA analysis (with 500 iterations) of the MIST-20 items indicates that two dimensions are estimated in 21.0% of the bootstrapped samples, three dimensions in 68.2%, and four dimensions in 10.0%. The item stability of the most common structure (three Table 2 Final items selected for MIST-20 and MIST-8 Items in bold are items included in the short version of the test (MIST-8). a = discrimination parameter. b = difficulty parameter Item no. a b Content Fake news MIST_14 3.50 0.53 Government Officials Have Manipulated Stock Prices to Hide Scandals MIST_28 2.69 0.06 The Corporate Media Is Controlled by the Military-industrial Complex: The Major Oil Compa- nies Own the Media and Control Their Agenda MIST_20 3.26 −0.20 New Study: Left-Wingers Are More Likely to Lie to Get a Higher Salary MIST_34 3.42 −0.25 The Government Is Manipulating the Public's Perception of Genetic Engineering in Order to Make People More Accepting of Such Techniques MIST_15 2.34 −0.40 Left-Wing Extremism Causes 'More Damage' to World Than Terrorism, Says UN Report MIST_7 2.57 −0.45 Certain Vaccines Are Loaded with Dangerous Chemicals and Toxins MIST_19 2.00 −0.55 New Study: Clear Relationship Between Eye Color and Intelligence MIST_33 5.60 −0.76 The Government Is Knowingly Spreading Disease Through the Airwaves and Food Supply MIST_10 2.64 −1.02 Ebola Virus 'Caused by US Nuclear Weapons Testing', New Study Says MIST_13 2.86 −1.30 Government Officials Have Illegally Manipulated the Weather to Cause Devastating Storms Real news MIST_50 3.12 0.38 Attitudes Toward EU Are Largely Positive, Both Within Europe and Outside It MIST_82 2.22 0.31 One-in-Three Worldwide Lack Confidence in NGOs MIST_87 2.25 0.14 Reflecting a Demographic Shift, 109 US Counties Have Become Majority Nonwhite Since 2000 MIST_65 2.36 −0.03 International Relations Experts and US Public Agree: America Is Less Respected Globally MIST_60 3.39 −0.09 Hyatt Will Remove Small Bottles from Hotel Bathrooms by 2021 MIST_73 2.43 −0.14 Morocco’s King Appoints Committee Chief to Fight Poverty and Inequality MIST_88 2.79 −0.31 Republicans Divided in Views of Trump’s Conduct, Democrats Are Broadly Critical MIST_53 2.12 −0.37 Democrats More Supportive than Republicans of Federal Spending for Scientific Research MIST_58 8.59 −0.60 Global Warming Age Gap: Younger Americans Most Worried MIST_99 2.26 −0.83 US Support for Legal Marijuana Steady in Past Year 1878 Behavior Research Methods (2024) 56:1863–1899 1 3 dimensions, see Supplement S20) reveals that the items are relatively stable, but still not as stable as the MIST-16 EGA items. A comparison of the three-dimensional structure esti- mated using EGA in the MIST-20 items with the theoretical two-factor structure (see Table 7) shows that the three-factor solution performs slightly better, since it presents lower TEFI and RMSEA, and higher CFI. Discussion In Study 1, we generated 413 news items using GPT-2 auto- mated item generation for fake news, and trusted sources for real news. Through two independent expert committees, we reduced the item pool to 100 items (44 fake and 56 real). We then com- bined item response theory with factor analysis to reduce the item set to the 20 best items for the MIST-20 and the 8 best items for the MIST-8. We found that the final items demonstrate good reliability. In an initial test of validity, we found strong concurrent validity for both the MIST-8 and the MIST-20 as evidenced by their strong associations with the COVID-19 fact- check (a headline evaluation task) and the DEPICT deceptive headline recognition task (a social media post reliability judg- ment task). Moreover, we found that both the MIST-20 and the MIST-8 outperformed the combined model of the CMQ, BSR, and CRT, when explaining variance in fact-check and DEPICT scores, evidencing incremental validity. This study provides the first indication that both the MIST-20 and MIST-8 are psycho- metrically sound, and can explain and test misinformation sus- ceptibility above and beyond the existing scales. Finally, we also presented an alternative approach to item selection, namely one based on EGA that uses network psychometrics to identify the best partition of the multidimensional space, combined with a bootstrap analysis of item and dimensional stability (structural consistency), to identify a set of highly stable items with moder- ate or high network loadings, leading to the selection of 16 items measuring four dimensions of misinformation susceptibility. Study 2: Validation—Confirmatory analyses, nomological net, and national norms Study 2 sought to consolidate and extensively test the psycho- metric soundness of the newly developed MIST-20, MIST-16, and MIST-8 scales. Across five large samples with nationally representative quotas from two countries (US, UK) and three different recruitment platforms (CloudResearch, Prolific, and Respondi) we pursued three goals. First, we used structural equation modeling and reliability analyses to probe the struc- tural stability, model fit, and internal consistency of the MIST across different empirical settings. Second, we built an exten- sive nomological network and examined both the correlation patterns and the predictive power of the MIST to demonstrate convergent, discriminant, and incremental validity. Third, we capitalized on the representativeness of our samples to derive national norms for the general population (UK, US) and spe- cific demographic (UK, US) and geographical subgroups (US). Method: MIST‑20/MIST‑8 Participants As part of our EFA/IRT validation study, we collected data from four samples with nationally representative quota (Ntotal = 8310, Nclean = 6461).21 Sample 2A was a US sample (N = 3692) with interlocking age and gender quota (i.e., each category contains a representative relative proportion of the other category) accessed through Respondi, an International Organization for Standardi- zation (ISO)-certified international organization for market and social science research (for previous applications see, e.g., Dür & Schlipphak, 2021; Heinsohn et al., 2019; Roozenbeek, Free- man, et al., 2021a). After excluding incomplete cases and par- ticipants outside of the quota, 3479 participants were considered for analysis. Sample 2B was a US sample with nationally rep- resentative age, ethnicity, and gender quota (N = 856) recruited through CloudResearch (formerly TurkPrime), an online research platform similar to MTurk but with additional validity checks and more intense participant pool controls (Buhrmester et al., 2018; Litman et al., 2017). After excluding all participants 21 Surveys 2A, 2C, and 2D were designed as part of a separate research project which featured the MIST-20 as an add-on. Survey 2B was designed specifically for this project. Table 3 Incremental validity of MIST-8 and MIST-20 with existing measures * p < .05, ** p < .01, *** p < .001 r Adjusted R2 ∆R2 CV19 fact-check ~   MIST-8 .49 .24   MIST-20 .58 .33 -   CMQ + BSR + CRT .19   CMQ + BSR + CRT + MIST-8 .30 .11*** -   CMQ + BSR + CRT .19   CMQ + BSR + CRT + MIST-20 .37 .18*** DEPICT ~   MIST-8 .45 .20   MIST-20 .50 .25 -   CMQ + BSR + CRT .12   CMQ + BSR + CRT + MIST-8 .22 .11*** -   CMQ + BSR + CRT .12   CMQ + BSR + CRT + MIST-20 .26 .14*** 1879Behavior Research Methods (2024) 56:1863–1899 1 3 who failed an attention check, were underage, did not reside in the United States, did not complete the entire study, completed the study in ≤ 10 minutes, or were a second-time participant, 510 participants remained.22 Sample 2C was a UK sample (N = 2517) based on nationally representative interlocking age and gender quota recruited through Respondi. After excluding incomplete cases and participants outside of our quota criteria, 1227 participants were retained. Lastly, sample 2D was a UK sample (N = 1396) with nationally representative age and gender quota recruited through Prolific. Excluding all entries that fell outside of our quota criteria and all incomplete entries resulted in an analysis sample of 1245 participants. In line with the best practices for scale development to recruit at least 300 participants per sample (Boateng et al., 2018; Clark & Watson, 1995, 2019; Comrey & Lee, 1992; Guadagnoli & Velicer, 1988) and for being highly powered (power = .90, α = .05) to detect the smallest effect size of interest (r = .10, needed N = 1046; Anvari & Lakens, 2021; Funder & Ozer, 2019; Götz, Gosling, et al., 2022), Samples 2A, 2C, and 2D exceed the size requirements. Sample 2B was highly powered (power = .90, α = .05) to detect effect sizes r of .15 (needed N = 463). Power analyses were com- pleted using the pwr package in R (Champely et al., 2021). Detailed demographic breakdowns of all samples are shown in Table 1. Procedure and measures All participants were invited to take part in an online sur- vey through the respective research platforms. After pro- viding informed consent, all participants provided basic 22 This is a slight deviation from the preregistration, as we added incomplete entries, second entries, participants that completed the survey in ≤ 10 minutes, and participants who failed any attention check (instead of both) to the exclusion criteria, thus adopting a more rigorous and conservative exclusion approach than we had preregis- tered. These additional exclusions were to ensure high-quality data. Fig. 5 Structure of the 100 MIST items estimated using exploratory graph analysis 1880 Behavior Research Methods (2024) 56:1863–1899 1 3 Fig. 6 Item stability metric of the MIST-100 items in Study 1 1881Behavior Research Methods (2024) 56:1863–1899 1 3 demographic information and completed the MIST-20 and—depending on their sample group—a select set of additional psychological measures (for a detailed descrip- tion of all constructs assessed in each sample group, see Table 1). All participants received financial compensation in accordance with platform-specific remuneration stand- ards and guidelines on ethical payment at the University of Cambridge. Participants in Samples 2A, 2B, and 2C were paid by the sampling platform directly, while participants in Sample 2D received 2.79 GBP for a 25-minute survey (6.70 GBP per hour). All data collections were approved by the Psychology Research Ethics Committee of the University of Cambridge (PRE.2019.108, PRE.2020.034, PRE.2020.086, PRE.2020.120). Table 4 Network loadings per item and dimension estimated via EGA. Network loadings of .15, .25, and .35 are equivalent to low (.40), moder- ate (.55), and high (.70) network loadings, respectively (Christensen & Golino, 2021c) Item Dim1 Dim2 Dim3 Dim 4 Dim Headline MIST_73 0.35 0.04 −0.01 0.11 1 Morocco’s King Appoints Committee Chief to Fight Poverty and Inequality MIST_96 0.33 −0.12 −0.06 0.10 1 US Hispanic Population Reached New High in 2018, But Growth Has Slowed MIST_60 0.28 0.03 0.07 0.10 1 Hyatt Will Remove Small Bottles from Hotel Bathrooms by 2021 MIST_92 0.24 0.11 0.08 0.09 1 Taiwan Seeks to Join Fight Against Global Warming MIST_47 0.24 0.06 −0.03 0.00 1 About a Quarter of Large US Newspapers Laid off Staff in 2018 MIST_33 0.16 0.40 0.06 0.00 2 The Government Is Knowingly Spreading Disease Through the Airwaves and Food Supply MIST_31 0.00 0.40 0.00 0.01 2 The Government Is Actively Destroying Evidence Related to the JFK Assassination MIST_14 −0.05 0.26 0.06 −0.04 2 Government Officials Have Manipulated Stock Prices to Hide Scandals MIST_1 −0.06 0.22 0.05 −0.02 2 A Small Group of People Control the World Economy by Manipulating the Price of Gold and Oil MIST_32 −0.10 0.31 0.13 0.00 2 The Government Is Conducting a Massive Cover-Up of Their Involvement in 9/11 MIST_20 0.09 0.05 0.44 0.01 3 New Study: Left-Wingers Are More Likely to Lie to Get a Higher Salary MIST_8 0.08 0.09 0.26 0.00 3 Climate Scientists' Work Is 'Unreliable', a 'Deceptive Method of Communication' MIST_16 0.01 0.10 0.39 0.05 3 Left-Wingers Are More Likely to Lie to Get a Good Grade MIST_70 0.14 −0.04 0.00 0.38 4 Majority in US Still Want Abortion Legal, with Limits MIST_74 0.08 0.00 0.04 0.32 4 Most Americans Say It’s OK for Professional Athletes to Speak out Publicly about Politics MIST_94 0.06 0.02 0.02 0.30 4 United Nations Gets Mostly Positive Marks from People Around the World Fig. 7 Final structure of the MIST-16 EGA items (left) and their stability indices (right) estimated using parametric bootstrap EGA with 500 iterations 1882 Behavior Research Methods (2024) 56:1863–1899 1 3 Analytical strategy We adopted a three-pronged analytical strategy. First, we com- puted reliability estimates and conducted confirmatory factor analyses for each subsample, seeking to reproduce, consoli- date, and evaluate the higher-order model derived in Study 1. Second, in an effort to establish construct validity (Cronbach & Meehl, 1955; Strauss & Smith, 2009), we pooled the con- structs assessed across our four validation samples to build a comprehensive, theory-driven, and preregistered (Sample 2B) nomological network. To this end, we cast a wide net and included (1) concepts that should be meaningfully posi- tively correlated with MIST scores (convergent validity; i.e., DEPICT Balanced Short Form; Maertens et al., 2021; Go Viral! Balanced Item Set; Basol et al., 2021), expecting a high positive Pearson r correlation ([0.50, 0.80]), (2) concepts that should be clearly distinct from the MIST (discriminant valid- ity; i.e., Bullshit Receptivity Scale; BSR; Pennycook et al., 2015; Conspiracy Mentality Questionnaire; CMQ; Bruder et al., 2013), expecting a low to medium negative correlation with the MIST (Pearson r = [−0.50, −0.20]), and (3) an array of prominent psychological constructs of general interest (i.e., personality traits, attitudes, and cognitions including the Big Five, Dark Tetrad, Moral Foundations, Social Dominance Orientation, Ecological Dominance Orientation, religiosity, self-esteem, political cynicism, numeracy, and trust in vari- ous public institutions and social agents) for which no a priori expectations were formulated. Third, we leveraged the size and representativeness of our samples to establish norm tables for the US and UK general populations as well as specific demographic and geographical subgroups. Method: MIST‑16 Participants We also collected a new dataset (Sample 2E; November 2022) with the best items per dimension that were identified using the EGA approach (the MIST-16). The dataset was collected using Respondi/Bilendi, in a nationally representative quota sample (N = 1213) of adults from the US. The sample compo- sition was as follows: 54% identifying as female (44% male, 2% nonbinary), 33% between 18 and 34 years, 31% between 35 and 54 years, and 36% between 55 and 75 years; 24% of the participants reported coming from the Midwest (Illinois, Fig. 8 Plot of the confirmatory factor model estimated using the EGA four-factor structure (left) and the theoretical two-factor structure (right) Table 5 Comparison of fit indices of the EGA four-factor model and the theoretical two-factor model Structure TEFI CFI RMSEA EGA four-factor −14.27 0.97 0.03 Theoretical two-factor −11.77 0.91 0.05 Table 6 The Satorra scaled difference test comparing the EGA four- factor structure to the theoretical two first-order factor structure Structure Df Chisq ChisqDiff DfDiff p EGA four-factor 98 112.32 Theoretical two-factor 103 203.49 29.73 5 < .001 1883Behavior Research Methods (2024) 56:1863–1899 1 3 Indiana, Iowa, Kansas, Michigan, Minnesota, Missouri, Nebraska, North Dakota, Ohio, South Dakota, and Wiscon- sin), 17% from the Northeast (Connecticut, Maine, Massachu- setts, New Hampshire, Rhode Island , Vermont, New Jersey, New York, and Pennsylvania), 40% from the South (Florida, Georgia, Maryland, North Carolina, South Carolina, Virginia, West Virginia, Delaware, Alabama, Kentucky, Mississippi, Tennessee, Arkansas, Louisiana, Oklahoma, and Texas), and 20% from the West (Montana, Wyoming, Colorado, New Mexico, Idaho, Utah, Arizona, Nevada, Washington, Oregon, California, Alaska, and Hawaii) of the country. Analytical strategy Exploratory graph analysis—as well as hierarchical EGA (Jimenez et al., 2022)—was applied to the MIST items. The advantage of using hierarchical EGA (Jimenez et al., 2022) on the US representative quota sample collected (using the best MIST items identified in the first stage of EGA analy- sis) is that as the sample size increases, there is a realistic chance of EGA estimating a structure reflecting general fac- tors instead of first-order factors, if the dimensions are hier- archical or form a generalized bifactor structure. Therefore, the item stability and structural consistency of the first-order factors were computed using a hierarchical EGA (Jimenez et al., 2022) version of bootstrap exploratory graph analysis (Christensen & Golino, 2021a). We would like to note that the MIST-16 was developed and validated after the samples from the other validation (Studies 2A–2D) and application (Study 3) studies were col- lected, due to the emergence of new psychometric methods. As the MIST-16 is not a subset of the MIST-20, we do not have the same nomological net and intervention evaluation data available for the MIST-16. However, as the correlation (in Study 1) between the MIST-20 and MIST-16 item sets is large, r = .81, 95% CI [.77, .84], p < .001, we can expect the MIST-20 results to be a close approximation. Results: MIST‑20/MIST‑8 Internal consistency For each sample, we employed SEM to assess model fit— examining both a basic first-order model with two distinct factors (i.e., real news detection, fake news detection; without allowing the factors to correlate) and a theoreti- cally derived higher-order model (Markon, 2019; Thurs- tone, 1944; which establishes a relationship between the two factors) in which both first-order factors load onto a general second-order veracity discernment factor. We then calculated reliability estimates using internal consistency measures (inter-item correlations, item-total correlations, and McDonald’s ω). We used the lavaan package for SEM in R (Rosseel, 2012). In keeping with our theoretical conceptualization of the MIST—with a general ability factor of veracity dis- cernment, and two subordinate factors capturing real news and fake news detection, respectively—we fitted a higher-order model (Markon, 2019; Thurstone, 1944) in which both first-order factors load onto a general second-order veracity discernment factor (see Fig. 9). We first did this with Sample 2A (US quota sample from Respondi). Consistent with conventional guide- lines (RMSEA/SRMR < .10 = acceptable; < .06 = excel- lent; CFI/TLI > .90 = acceptable; > .95 = excellent; Clark & Watson, 2019; Finch & West, 1997; Hu & Bentler, 1999; Pituch & Stevens, 2015; Schumacker et al., 2015), the model fits the data adequately (MIST-20: CFI = .90, TLI = .89, RMSEA = .041, SRMR = .040; MIST-8: CFI = .97, TLI = .95, RMSEA = .030, SRMR = .025).23 We note that the χ2 goodness-of-fit test was significant— signaling lack of fit (MIST-20: χ2 = 1021.86, p < .001; MIST-8: ; χ2 = 72.74, p < .001). However, this should be interpreted with caution, as the χ2 is a test of perfect fit and very sensitive to sample size. As such, as sample sizes approach 500, χ2 is usually significant even if the differ- ences between the observed and model-implied covariance matrices are trivial (Bentler & Bonett, 1980; Curran et al., 2003; Rosellini & Brown, 2021). Taken together, the find- ings thus suggest an adequate model fit for the theoreti- cally derived higher-order model. Importantly, this model also yielded better fit than a tra- ditional basic first-order model (with two distinct fake news and real news factors; MIST-20: χ2 = 1027.17, p < .001, CFI = 0.90, TLI = 0.89, RMSEA = 0.041, SRMR = 0.041; MIST-8: χ2 = 99.46, p < .001, CFI = 0.95, TLI = 0.93, RMSEA = 0.035, SRMR = 0.035). A likelihood-ratio test of the higher-order model versus the first-order model (which 23 We acknowledge that there is a discussion in the literature on defining new (dynamic) fit values depending on the specific model tested (see McNeish & Wolf, 2021). For example, simulations using the ezCutoffs (Schmalbach et  al., 2019) package indicate we would need a CFI and a TLI of larger than 0.99 for excellent fit, in conjunc- tion with an RMSEA of smaller than 0.04/0.03 (MIST-8/MIST-20) and an SRMR smaller than 0.03. However, as the new cutoff values are still under consideration and not well established, we focused on the conventional and—in this case also—preregistered cutoff values for our evaluation. Table 7 Fit of the three- and two-dimensional structures of the MIST- 20 items Structure TEFI CFI RMSEA MIST-20 EGA three-factor −20.70 0.963 0.029 MIST-20 Theoretical two-factor −16.93 0.955 0.032 1884 Behavior Research Methods (2024) 56:1863–1899 1 3 did not include a correlation between the two factors) was significant for both the MIST-20 and the MIST-8 (MIST- 20: ∆χ2 = 5.35, p = .021, MIST-8: ∆χ2 = 26.29, p < .001), indicating a better fit for the higher-order model. Sample comparison Across all four samples, we success- fully reproduced the original higher-order model, with parameters indicating good fit, as well as good internal consistency in all four samples (see Table 8 for a com- plete overview).24 A similar fit is found between the US Respondi and UK Respondi samples, indicating that the MIST works similarly in the UK as it does in the US.25 Meanwhile, larger differences are found between the US Respondi and the US CloudResearch samples, and between the UK Respondi and the UK Prolific samples, indicating that sampling platform plays a larger role than nationality when administering the MIST even when using representa- tive quota sampling. Nomological network26 Convergent validity As preregistered, in Sample 2B27— which was the sample we primarily relied on in constructing the nomological network, as it offered the widest coverage of psychological constructs among our validation samples— the correlation between the general MIST-20 score and the DEPICT Balanced Short Form measure (Maertens et al., 2021) was found to be positive and medium to large, with a significant Pearson correlation of .54 (95% CI [.48, .60], p < .001).28 The MIST-20 correlation with the Go Viral! inventory (Basol et al., 2021) was lower than the estimated value but was significantly correlated, with a Pearson corre- lation of .26 (95% CI [.18, .34], p < .001). Similarly, regard- ing incremental validity, the additional explained variance in the DEPICT Balanced Short Form measure above and beyond the CMQ and the BSR is at the upper side of our prediction, with an additional 20% of variance explained, 25 We would like to stress that this does not imply measurement invar- iance and would like to caution researchers to compare results directly between countries. The current data indicate that the MIST works in the US and the UK and likely measures the same latent construct, but it does not mean that the results are directly comparable. We recom- mend researchers and practitioners keep the focus on comparisons within instead of between countries. For a detailed discussion about cross-cultural generalizability please see Deffner et al. (2022). 24 Supplement S9 includes model plots for both the MIST-20 and MIST-8 for all samples. 26 This section focuses on the nomological network of the general ability factor (veracity discernment) of the MIST-20. However, we have also constructed nomological networks for the subcomponents of the MIST as well as the MIST-8. For parsimony’s sake, these are reported in Supplements S10-S12. 27 Some variables were only analyzed in specific samples, as not all variables were present in all datasets. 28 See https:// aspre dicted. org/ nx7xu. pdf for the preregistration (Sample 2B). Fig. 9 Plot of higher order MIST-8 SEM model in Sample 2A (N = 3479) https://aspredicted.org/nx7xu.pdf 1885Behavior Research Methods (2024) 56:1863–1899 1 3 whereas with 3% it is under the predicted value for the Go Viral! inventory.29 For a more detailed account, see Supple- ment S13. In addition, in Sample 2A, we measured belief in COVID-19 myths, which was significantly positively cor- related and within the preregistered strength of convergent validity measures (r = −.51, 95% CI [−.55, −.47], p < .001). Discriminant validity As preregistered for Sample 2B, the MIST-20 was moderately negatively correlated with the BSR (r = −.21, [−.29, −.13], p < .001) and the CMQ (r = −.38 [−.45, −.30], p < .001). Overall, the correlational pattern of our nomological network supports the construct validity of the MIST, with the MIST being more strongly correlated with the convergent measures than with the discriminant measures (Campbell & Fiske, 1959; Rosellini & Brown, 2021). CRT (Sample 2A) In line with other studies finding a role for the CRT in misinformation detection (e.g., Pennycook & Rand, 2019), we found a significant correlation between the MIST score and the cognitive reflection test, or CRT (r = .29, 95% CI [.26, .32], p < .001). AOT (Sample 2A) We found an even larger significant cor- relation between the MIST score and actively open-minded thinking or AOT (r = .49, 95% CI [.46, .51], p < .001). BFI (Sample 2B) Contrary to our preregistered exploratory hypotheses, in Sample 2B the MIST-20 score was not sig- nificantly correlated with openness, r = .02, 95% CI [−.06, .11], p = .594, and agreeableness was not negatively corre- lated with distrust d, r = .05, 95% CI [−.04, .14], p = .255.30 The MIST-20 score was also not significantly correlated with agreeableness (r = .05, 95% CI [−.04, .14], p = .271) or extraversion (r = −.07, 95% CI [−.15, .02], p = .141), but did significantly correlate with conscientiousness (r = .10, 95% CI [.02, .19], p = .020) and neuroticism (r = −.14, 95% CI [−.23, −.06], p = .001). DT (Sample 2B) The MIST-20 score was negatively correlated with each of the four Dark Tetrad traits: Machiavellianism (r = −.09, 95% CI [−.17, −.00], p = .047), narcissism (r = −.26, 95% CI [−.34, −.18], p < .001), psychopathy (r = −.30, 95% CI [−.37, −.22], p < .001), and sadism (−.22, 95% CI [−.30, −.12], p < .001). However, contrary to our preregistered exploratory hypothesis, Machiavellianism was not negatively correlated with naïvité n, r = .16, 95% CI [.07, .24], p < .001. Trust measures (Sample 2B) In line with our preregistered exploratory hypotheses, we found that the MIST score was correlated with trust in science, r = .33, 95% CI [.25, .41], p < .001, scientists, r = .36, 95% CI [.28, .43], p < .001, and mainstream media, r = .18, 95% CI [.09, .26], p < .001. In addi- tion, we found that trust in doctors, r = .36, 95% CI [.28, .43], p < .001, journalists, r = .19, 95% CI [.11, .27], p < .001, and officials, r = .09, 95% CI [.00, .17], p = .049, was significantly 29 It must be noted that the Go Viral inventory is not a validated meas- urement instrument. Results should be interpreted in light of this. 30 The lack of a significant correlation between the MIST score and openness is somewhat surprising given the strong correlation between the MIST and the AOT score, indicating that openness as measured in the Big Five is not the same as open-minded thinking as measured by the AOT. Table 8 Model fit overview Total N = 6461. Samp = sample. Plat = sampling platform. Pop = sample population. CI = confidence interval; LL = lower limit; UL = upper limit. R = Respondi. C = CloudResearch. P = Prolific. ωtot = McDonald’s Omega. 3F reflects whether the three-factor (higher-order) model pro- vided better fit than the two-factor (two-order) model. ⚬ = descriptively better fit but not significant; * p < .05, ** p < .01, *** p < .001 MIST-20 Samp. Plat. Pop. χ² p CFI TLI RMSEA 95% CI SRMR ωtot 3F LL UL 2A R US 1021.86 < .001 0.90 0.89 0.041 0.039 0.044 0.040 0.76 * 2B C US 264.66 < .001 0.92 0.91 0.035 0.027 0.043 0.051 0.75 ⚬ 2C R UK 473.56 < .001 0.91 0.90 0.041 0.037 0.046 0.049 0.81 *** 2D P UK 432.12 < .001 0.86 0.85 0.038 0.034 0.042 0.045 0.70 *** MIST-8 Samp. Plat. Pop. χ² p CFI TLI RMSEA 95% CI SRMR ωtot 3F LL UL 2A R US 72.74 < .001 0.97 0.95 0.030 0.023 0.037 0.025 0.57 *** 2B C US 30.32 .048 0.96 0.94 0.036 0.003 0.058 0.040 0.58 * 2C R UK 64.13 < .001 0.94 0.91 0.045 0.033 0.058 0.040 0.62 *** 2D P UK 46.91 < .001 0.93 0.90 0.037 0.023 0.050 0.035 0.55 *** 1886 Behavior Research Methods (2024) 56:1863–1899 1 3 positively correlated, while trust in the government, r = −.11, 95% CI [−.20, −.02], p = .012, was significantly negatively cor- related with the MIST-20. We found no significant correlation for either of the two trust-in-politicians scales, ra = −.06, 95% CI [−.14, .03], p = .210, rb = .07, 95% CI [−.02, .15], p = .131. Additional associations For a summary and discussion of the exploratory analyses of MFQ, SDO, EDO, numeracy, anti-vaccination attitudes, self-esteem, religiosity, trust, ide- ology, and demographics, please see Supplement S14. Detailed summary figures separated by outcome category are available in Supplements S10-S12. National norms We used the Respondi samples for each country (i.e., Sam- ple 2A for the US and Sample 2C for the UK) to generate norm tables for general veracity discernment as well as fake news and real news detection.31 As can be gleaned from Table 9, the norms for the two countries were very similar, with minor deviations of single score points, further corrob- orating evidence for the cross-cultural validity of the MIST. Table 10 exhibits norms for the general US population. Full norm tables for the US and the UK, including spe- cific norms based on age (US, UK) and geography (US; i.e., 9 census divisions, 4 census regions), as well as means and standard deviations per item, including a per-item compari- son between Democrats (US)/liberals (UK) and Republicans (US)/conservatives (UK), are available in Supplement S15. Results: MIST‑16 Exploratory graph analysis was applied to the MIST-16 items, as well as hierarchical EGA (Jimenez et al., 2022).32 The item stability and structural consistency of the first- order factors were computed using a hierarchical EGA (Jimenez et al., 2022) version of bootstrap exploratory graph analysis (Christensen & Golino, 2021a).33 The traditional EGA technique indeed identified only two dimensions (real and fake news items, see Fig. 10). The hierarchical EGA technique, on the other hand, identified the original four- dimensional (first-order) structure and two general factors (real and fake news items, see Fig. 11). A parametric bootstrap EGA using the hierarchical EGA method (Jimenez et al., 2022) showed that the four dimen- sions are very stable, being estimated in 90.8% of the 500 bootstrapped samples. In terms of item stability, the MIST- 16 EGA items presented very high stability, except for item MIST 73, which was estimated on their empirical hierarchi- cal EGA first-order dimension in 73% of the bootstrapped samples (see Fig. 12). Discussion In Study 2, we consolidated and expanded the psychomet- ric properties of the MIST. First, we conducted confirma- tory factor analyses across four samples with representative quota from the US and the UK, consistently replicating the higher-order structure yielding good model fit and internal consistency for both the MIST-8 and the MIST-20. Next, we constructed an extensive nomological network of the MIST to assess construct validity (Cronbach & Meehl, 1955). As preregistered, and similar to Study 1, in Sample 2B we found a high correlation between the MIST score and the DEPICT misinformation inventory, supporting convergent validity. Similarly, in Sample 2A we found a medium to high negative correlation between the MIST-20 and a COVID-19 misinfor- mation beliefs inventory, further attesting to the measure’s convergent validity. In addition, we demonstrated that both 31 We chose to create the norm tables based on the Respondi samples instead of pooling all samples, as through recent projects we found some evidence indicating that Respondi samples provide more rep- resentative levels of numeracy, education, and ideology than Prolific, and our experience with CloudResearch is limited. 32 Due to an error in the Qualtrics system, only 15 items were pre- sented to the participants. Item MIST 16 (Left-Wingers Are More Likely to Lie to Get a Good Grade) was left out of the data collection system. 33 As pointed out earlier, the advantage of using hierarchical EGA (Jimenez et  al., 2022) on the US representative quota sample (col- lected using the MIST-16 EGA items) is that as the sample size increases, there is a meaningful chance that the EGA estimates a structure reflecting general factors instead of first-order factors, if the dimensions are hierarchical or form a generalized bifactor structure. Table 9 MIST norm score comparison between US and UK samples Scale Sample Minimum 1st Quartile Median Mean 3rd Quartile Maximum MIST-8 US 0 4 6 6 7 8 UK 0 4 5 5 7 8 MIST-20 US 4 11 14 14 17 20 UK 4 11 13 13 16 20 1887Behavior Research Methods (2024) 56:1863–1899 1 3 the MIST-8 and the MIST-20 explain considerable extra variance above the existing CMQ and BSR scales (MIST- 20: ∆R2 = 20%, MIST-8: ∆R2 = 14%), indicating substantial incremental validity (Clark & Watson, 2019). Surprisingly, however, the correlations of each of the MIST, CMQ, and the BSR with the Go Viral! items were all low (r < .30). Nev- ertheless, the MIST-20 remained the single best predictor for the Go Viral! items, significantly improving the variance explained in a combined model on top of the CMQ and BSR measures (∆R2 = .03). In terms of discriminant validity, as preregistered, in Sample 2B we observed moderate negative associations between the MIST-20 and the BSR as well as the CMQ. In Sample 2A, we also found preliminary evidence for the role of actively open-minded thinking (AOT) as a potential vehicle for better distinction between fake and real news. This aligns with previous research showing that AOT is related to more critical information source evaluation (Baron, 2019) and decreased susceptibility to fake news (Pennycook & Rand, 2020, Pennycook & Rand, 2021). Within the realm of trait measures we found relatively small correlations with the core personality traits. Contrary to our expectations, openness, extraversion, and agreeableness were not significantly related to the MIST-20. Meanwhile, conscien- tiousness exhibited a small positive association. This dovetails well with previous research finding that individuals high in conscientiousness are more likely to read news offline (rather than relying solely on social media; Sindermann et al., 2020) and less likely to share fake news (Lawson & Kakkar, 2021) and engage in conspiracist ideation (Brotherton et al., 2013). We also found a small negative association with neuroticism. As neuroticism is widely understood as a stable predisposition to experience anxiety and fear (Eysenck, 1967; Hofstee et al., 1992; Soto & John, 2017), this is consistent with previous work identifying fear and trait anxiety as positive predictors of conspiracy beliefs (Grzesiak-Feldman, 2013; Swami et al., 2016) as well as other studies finding that those high in neu- roticism tend to rely on social media news feeds and are thus more likely to get caught in filter bubbles and echo chambers (Sindermann et al., 2020). Larger correlations were found with the Dark Tetrad personality traits, which were all negatively related to the MIST-20 score. While the links with Machi- avellianism, psychopathy, and sadism are novel, the positive association with narcissism dovetails well with previous work demonstrating narcissists’ greater susceptibility to conspiracies (Cichocka et al., 2016; Kumareswaran, 2014). Meanwhile, in Sample 2E, we successfully validated the psychometric strength of the EGA-based MIST-16, which also showed evidence for two general factors, fake news detection and real news detection, as well as two facets for each. While EGA uses an entirely different approach for item analysis and selection, the convergent outcome of two general factors and the overlap in the item sets between the two methods show that it is possible—using a variety of methodologies—to develop a psychometrically validated misinformation susceptibility test Table 10 MIST-20 general population norms for the United States (N = 3479) V (Veracity discern- ment) f (Fake news detec- tion) r (Real news detec- tion) Percentile Score Percentile Score Percentile Score 0% 4 0% 0 0% 0 5% 8 5% 3 5% 2 10% 9 10% 4 10% 3 15% 10 15% 5 15% 4 20% 10 20% 5 20% 4 25% 11 25% 6 25% 5 30% 12 30% 7 30% 5 35% 12 35% 7 35% 6 40% 13 40% 7 40% 6 45% 14 45% 8 45% 7 50% 14 50% 8 50% 7 55% 15 55% 8 55% 7 60% 15 60% 9 60% 7 65% 16 65% 9 65% 8 70% 16 70% 9 70% 8 75% 17 75% 9 75% 8 80% 17 80% 10 80% 9 85% 18 85% 10 85% 9 90% 19 90% 10 90% 10 95% 19 95% 10 95% 10 100% 20 100% 10 100% 10 Fig. 10 Structure estimated via EGA using the validation sample 1888 Behavior Research Methods (2024) 56:1863–1899 1 3 with congruent results. Meanwhile, the EGA data show that EGA is a useful new method psychologists can use to design misinformation detection scales (or indeed, any scale), enlarg- ing the toolkit available for scale development. All in all, the nomological network largely confirmed the preregistered relationship patterns—thus corroborating the MIST’s construct validity—while at the same time demon- strating new insights that can be gained by using the MIST- 20 measure, which may stimulate further research. Finally, we leveraged the large size and national representativeness of our validation samples to produce norm tables for the UK and US general populations as well as distinct demo- graphic subgroups in the UK and the US and geographical subgroups in the US. Study 3: Application—A nuanced effectiveness evaluation of a popular media literacy intervention In Study 3, we demonstrate how the MIST can be used in conjunction with the Verification done framework and norm tables.34 We employ the MIST-8 in a simple within-groups pretest /post-test design with the Bad News Game, a major media literacy intervention played by over a million people (Roozenbeek & van der Linden, 2019). The Bad News Game is based on inoculation theory (van der Linden & Roozenbeek, 2020), and both its theoretical mechanisms and its effects have been replicated multiple times (see, e.g., Maertens et al., 2021; Roozenbeek, Maertens et al., 2021), making it a well- established intervention in the literature as a tool to reduce misinformation susceptibility. We therefore hypothesized that the intervention would improve veracity discernment (ability to accurately distinguish real news from fake news), real news detection (ability to correctly flag real news), and fake news detection (ability to correctly tag fake news). In addition, we hypothesized that the Bad News Game would decreases both distrust (negative judgment bias or being hyper-skeptical) and naïvité (positive judgment bias or believing everything). We used norm tables to establish where the baseline MIST scores of our convenience sample lay. Method Participants We collected data from an online community sample of 4024 participants who played the Bad News Game (www.getbad- news.com) between 7 May 2020 and 29 July 2020 and who agreed to participate in the in-game survey. After filtering out participants who did not complete the full study, did not have prior experience with the game, were underage, or entered the study multiple times, and lived outside of the United States, 421 participants remained.35 Based on earlier 34 A MIST implementation guide explaining how researchers and practitioners can set up the MIST in their studies as well as how to calculate the Verification done (Vrf dn) scores can be found in Sup- plement S17. An example Qualtrics survey and a score calculation R script are available in the OSF repository: https:// osf. io/ r7phc/. 35 We restricted our sample to US residents, as we did not have a UK filter and have not yet validated the MIST in any other country. Fig. 11 Structure estimated via hierarchical EGA using the validation sample https://osf.io/r7phc/ 1889Behavior Research Methods (2024) 56:1863–1899 1 3 studies evaluating the Bad News Game (Maertens et al., 2021; Roozenbeek, Maertens et al., 2021), we aimed to be highly powered (power = .90, α = .05) to detect a Cohen’s d effect size of 0.250, which required a sample size of 338, which we exceed in this sample. The power was calculated using the R pwr package (Champely et al., 2021). On average, participants were young (55.58% 18–29 years, 32.30% 30–49, 12.11% over 50), 52.02% identified as female (41.09% male, 6.89% other), and 86% had either a higher education degree or some college experience (see Table 1 for a complete demographics overview). The median ideology on a scale from 1 (liberal) to 7 (conservative) was 3 (M = 2.88, SD = 1.39), indicating a slightly left-leaning audience. Procedure and measures Individuals who played the Bad News Game (Roozenbeek & van der Linden, 2019) were invited to participate in the study. The Bad News Game (www.getbadnews.com) is a free online browser game in which players learn about six com- mon misinformation techniques over the course of 15 min- utes in a simulated social media environment (see Roozen- beek & van der Linden, 2019, for a detailed discussion). In the current study, after providing informed consent, indi- viduals completed the MIST-8 both before and after playing the Bad News Game. Participation was completely volun- tary, and no rewards, monetary or otherwise, were offered. This study was approved by the Psychology Research Ethics Committee of the University of Cambridge (PRE.2020.120, PRE.2020.136). Analytical strategy After contextualizing our findings by juxtaposing the sam- ple’s baseline findings to the US general national norms derived in Study 2, we conducted repeated-measures t-tests for veracity discernment (M = 6.23, SD = 1.53) and for the four subcomponents of the MIST—fake news detec- tion (M = 3.19, SD = 0.92), real news detection (M = 3.04, SD = 0.95), distrust (M = 0.31, SD = 0.63), and naïvité (M = 0.46, SD = 0.69). Results Baseline We found that our US convenience sample scored higher on the MIST than the US population average for verac- ity discernment (see Study 2; 1st QuartilePopulation = 4, 1st QuartileSample = 6).36 36 We found similar results when looking at fake news detection (1st QuartilePopulation = 2, 1st QuartileSample = 3) and real news detection (1st QuartilePopulation = 2, 1st QuartileSample = 3). Fig. 12 Item stability of the hierarchical EGA first-order structure in the validation sample 1890 Behavior Research Methods (2024) 56:1863–1899 1 3 Hypothesis tests V—Veracity discernment Contrary to our expectations, we did not find a significant effect of veracity discernment post-intervention relative to pre-intervention (Mdiff = 0.11, 95% CI [−0.01, 0.23], t(420) = 1.80, p = .072, d = 0.088, 95% CI [−0.103, 0.279]). See Fig. 13, Panel A for a bar plot. r—Real news detection While we found an effect of the intervention on real news detection, the effect was in the opposite direction of our prediction (Mdiff = −0.17, 95% CI [−0.26, −0.08], t(420) = −3.72, p < .001, d = −0.181, 95% CI [−0.373, 0.011]). See Fig. 13, Panel B, for a bar plot. f—Fake news detection In line with our expectations, we did find a positive effect of the intervention on fake news detec- tion (Mdiff = 0.28, 95% CI [0.20, 0.36], t(420) = 6.81, p < .001, d = 0.332, 95% CI [0.138, 0.525]). See Fig. 13, Panel C for a bar plot. d—Distrust Contrary to our hypothesis, we observed an increase in distrust (Mdiff = 0.31, 95% CI [0.22, 0.40], t(420) = 6.94, p < .001, d = 0.338, 95% CI [0.144, 0.532]). See Fig. 13, Panel D for a bar plot. n—Naïvité As hypothesized, we did find a significant reduc- tion in naïvité after intervention (Mdiff = −0.14, 95% CI [−0.20, −0.07], t(420) = −4.12, p < .001, d = −0.201, 95% CI [−0.392, −0.008]). See Fig. 13, Panel E for a bar plot. See Supplement S16 for a detailed summary table with variable descriptive statistics and difference scores. Discussion Traditionally, evaluators of Bad News Game (e.g., Roozen- beek & van der Linden, 2019) only looked at a small amount of (ad hoc-created) real news items and focused on participants’ reliability ratings of a large set of fake news items. Study 3 showed that using the MIST in con- junction with the Verification done framework provided novel insights contrary to our expectations. Although trending towards an effect in the expected direction, par- ticipants did not become significantly better at general news veracity discernment after playing the Bad News Game (p = .072). Looking at the MIST facet scales, we did find significant differences in both fake news detection and real news detection. More specifically, we observed that while people improved in the detection of fake news, they also became worse at the detection of real news. Looking further at response biases, we can also see that the Bad News Game might increase general distrust in news head- lines while also diminishing naïvité. At first sight, these results seem to indicate that the intervention does decrease people’s susceptibility to fake news and reduces general naïvité, but at a potential cost of increased general distrust (hyper-skepticism). Whether this means the intervention works depends on the aim: to decrease susceptibility to misinformation, or to increase the ability to accurately Fig. 13 Plot of Verification done variables applied to the Bad News Game (N = 421). T1 = pretest. T2 = post-test 1891Behavior Research Methods (2024) 56:1863–1899 1 3 discern real news from fake news. The Verification done framework allows interventionists to start differentiating these important questions both theoretically and empiri- cally, and we encourage researchers and practitioners to use the framework independently of the misinformation susceptibility measure used. One reason why the pattern for the subordinate factors may be found is that the Bad News Game focuses mainly on detecting misinformation and warning people about the threats of misinformation, and is less focused on recogniz- ing real news (Roozenbeek & van der Linden, 2019). In addition, as the evidence shows there may be counteract- ing effects (increased distrust but also improved fake news detection), the lack of significant effects for the general fac- tor (the discernment variable) may therefore also be due to these counteracting effects, resulting in an effect that is too small to measure with our sample (N = 421), especially in the context of a short 15-minute intervention in combina- tion with an 8-item scale. Finally, it is also possible that the intervention may simply not be sufficient to make a large enough impact on a general susceptibility factor. In addition, as recommended by our framework, these results need to be interpreted in conjunction with the norm tables. The general sample that was recruited was already highly media-literate. The first quartile of the pretest MIST scores was higher than the population average (verac- ity discernment: 1st QuartilePopulation = 50% accuracy, 1st QuartileSample = 75% accuracy). Effects of the intervention might therefore be different with a more representative sam- ple, or for people performing worse during the pretest phase. The results of this study come with two caveats. First, the MIST-8 was used instead of the MIST-20. As is common for short scales (Rammstedt et al., 2021; Thalmayer et al., 2011)—while maintaining high psychometric quality—the parsimonious MIST-8 is less precise and less reliable than the MIST-20. Since the MIST-20 only takes about 2 minutes to complete, we recommend researchers use the MIST-20 whenever possible. Second, while we were sufficiently pow- ered to detect effect sizes similar to the original evaluation of the intervention (Roozenbeek & van der Linden, 2019), with a sample of 421 participants—as is also reflected in the rather large confidence intervals—we did not have sufficient statistical power to detect smaller nuances (Anvari & Lakens, 2021; Funder & Ozer, 2019; Götz, Gosling, et al., 2022). The results of this study indicate the importance of look- ing at misinformation susceptibility in a more holistic way. Applying the Verification done framework, we discovered key new theoretical dimensions that previous research had overlooked. Evaluators of this intervention, and other inter- ventions, can now disentangle and accurately measure the five dimensions of misinformation susceptibility, thereby expanding our understanding of both the underlying mecha- nisms and the intervention’s practical impact. General discussion We explained the necessity of having a multifaceted measure- ment of misinformation susceptibility, and based on theoreti- cal insights from previous research, developed the Verifica- tion done framework. Then, in three studies and six samples from two countries, we developed, validated, and applied the Misinformation Susceptibility Test (MIST): a holistic test which allows the assessment of veracity discernment ability, its facets fake news detection ability and real news detection ability, and judgment biases distrust and naïvité. In Study 1, we derived a development protocol, gener- ated a set of fake news headlines using the GPT-2 neural network—an advanced language-based machine learning algorithm—and extracted a list of real news headlines from neutral and well-trusted sources. Through psychometric analysis using factor analysis and item response theory, we developed the MIST-8, MIST-16, and the MIST-20 tests. In Study 2, we recruited five samples with nationally rep- resentative quota, two each for the US and the UK, from three different recruitment platforms, and followed a multifaceted validation strategy with the aim of gaining insights into the measure’s validity and replicability. First, confirmatory factor analyses consistently favored the higher-order structure and yielded satisfactory properties that suggest high validity and good reliability of both the MIST-8 and the MIST-20. Second, adopting a wide-net approach, we constructed an extensive nomological network. We found the MIST-8 and MIST-20 to be consistently highly correlated with various fact-check tests— the “COVID-19 fact-check” headline evaluation task (Penny- cook, McPhetres, et al., 2020) and the and “DEPICT” social media post reliability judgment task (Maertens et al., 2021)— thus signaling convergent validity—while being clearly distinct from the existing Conspiracy Mentality Questionnaire (CMQ) and the Bullshit Receptivity Scale (BSR), hence providing evi- dence for discriminant validity. The correlation with ad hoc headline evaluation tasks is strong enough to show that they are measures of a similar construct, but it is also weak enough to demonstrate that they are sufficiently distinct. The MIST offers a reliable, standardized, and validated alternative to these ad hoc tests, with high predictive validity for a wide set of scales, as well as norm tables. However, due to the high stability of the MIST, it is possible that the MIST may turn out to be particu- larly useful for subgroup analyses, and may be less sensitive for the measurement of (small) intervention effects. In addition, the MIST aims to measure generalized susceptibility to misin- formation, which is not tailored to the skills trained in specific interventions. Therefore, the MIST is not meant to replace ad hoc measures, but can exist in conjunction with them, depend- ing on the outcome variable of interest. Moreover, we presented MIST-20 and MIST-8 norm tables for both the UK and the US based on our large samples with nationally representative quota, which can be used to contextualize effects. 1892 Behavior Research Methods (2024) 56:1863–1899 1 3 Using a new, modern, psychometric method, namely exploratory graph analysis (EGA; Golino & Epskamp, 2017), we showed a proof of concept of how EGA can be used to help with establishing the factor structure, the item selection, and the validation of scales such as the MIST. In both Study 1 and Study 2 we show how EGA can lead to potentially more stable item selection than when using the traditional EFA and IRT methods, and present an alternative version of the MIST: the MIST-16. Meanwhile, further analyses reveal that EGA can help to detect extra dimensions as facets of the general factors. Interestingly, the validation sample (Sam- ple 2E) showed that a structure with two generalized factors and four facets had the best fit, potentially informing misin- formation theorists on further dimensions to explore when researching the nature of misinformation. Meanwhile, it also corroborated more evidence that misinformation susceptibil- ity can be viewed through the lens of two general factors (real news detection, fake news detection), and robustly measured as such. This congruence between these two very different psychometric methods shows the robustness of our psycho- metric toolkit and the ability for it to produce reliable scales to measure psychological constructs. In the third and last study, we demonstrated how Veri- fication done and the MIST can be employed in naturalis- tic settings, in this case to evaluate the general effects of a highly popular inoculation intervention. Employing a vali- dated measure to evaluate interventions in combination with the norm tables—which have not been used in this field before—we were able to uncover new mechanisms behind a well-known media literacy intervention, the Bad News Game (Maertens et al., 2021; Roozenbeek & van der Lin- den, 2019), and highlighted both weaknesses and strengths of this intervention that had not been detected before using the classical methods. For example, while the intervention is typically evaluated by looking at fake news reliability rat- ings (e.g., Roozenbeek & van der Linden, 2019) without an evaluation framework or norm tables, we were now able to unveil important dynamics between fake news, distrust, and real news detection. Moreover, our approach allowed us to establish that the average participant who chose to partici- pate in the intervention already scored above the norm when completing the pretest. Moreover, for the first time, we were able to disentangle the five dimensions of misinformation susceptibility using a validated and standardized item set, finding unexpected changes in judgment biases as well as in real news detection (which other research does not necessar- ily find; see Roozenbeek & van der Linden, 2019), which can inspire further research and theoretical development. Never- theless, we must emphasize that the MIST is a generalized measure of susceptibility, relevant for measuring an overarch- ing skill, which is not the sole focus of the Bad News Game intervention. For example, there is a wide range of evidence that shows that the Bad News Game is effective at improving the detection of specific manipulation techniques that typi- cally underlie misinformation that the participant was trained on (e.g., appeal to emotion, polarizing language; Roozenbeek & van der Linden, 2019; Lewandowsky & van der Linden, 2021). Improvements in those specific skills can be best iden- tified with a tailored measurement instrument rather than a “general” measure such as the MIST. Overall, these studies show that it is feasible to develop a psychometrically validated measurement instrument for mis- information susceptibility. Moreover, the evidence discussed in the studies, and in particular the analyses of Table 3, Sup- plement S13, and Supplement S18, show clear evidence for the utility—or indeed superiority—of the new measure com- pared to other measures in terms of predicting outcomes. Implementation An overview of the MIST-20, MIST-16, and MIST-8 item sets can be found in Supplement S21. For an implementation and scoring guide, please see Supplement S17. The supple- ments can be found on the OSF repository at https:// osf. io/ r7phc/. Open‑Source web application To facilitate the implementation of the MIST, we pro- grammed an open-source, user-friendly, online version of the MIST-20, called YourMIST: an interactive self-assess- ment tool designed for easy accessibility and repurposing by individuals, researchers, and practitioners. Our implementa- tion of the MIST-20 utilizes the Python programming lan- guage and the Streamlit web development module to enable a web-based quiz that provides personalized feedback to users. The tool reports scores for each of the components of the Verification done framework, accompanied by detailed explanations and a comparison with the US and UK popula- tion scores. Our web app and the source code are publicly accessible for individual use and adaptation on the OSF repository at https:// osf. io/ r7phc/. Limitations and future research While we firmly believe that the MIST and Verification done mark a substantial methodological advance in the field of misinformation research (Bago et al., 2020; Batailler et al., 2022; Roozenbeek, Maertens et al., 2021; Rosellini & Brown, 2021; Zickar, 2020), it is of course not without limitations. An inevitable challenge of doing any type of systematic and methodologically rigorous news headline research lies in the fact that what might be real news at one point in time might be outdated at a later point in time, while—albeit admittedly much less likely—what is fake news at one point in time might become true or more credible at a later point in time. https://osf.io/r7phc/ https://osf.io/r7phc/ https://osf.io/r7phc/ 1893Behavior Research Methods (2024) 56:1863–1899 1 3 Therefore, similar to an IQ test, it may be necessary to update the MIST over time. Nevertheless, in recent studies, the MIST still shows similar validity as it did 2 years ago. To illustrate, in a recent research project by Said et al. (2023, in prep), a new US quota sample was collected through Respondi with 547 respondents, and both the MIST-8 and MIST-20 showed good internal and predictive validity similar to the original sample (see Supplement S7). For example, the fit indices of the MIST sample collected in August 2022 (MIST-20: CFI = 0.92, TLI = 0.91, RMSEA = 0.039, SRMR = 0.052) showed similar— and for some indices better—fit relative to the sample col- lected in September 2020 (MIST-20: CFI = 0.90, TLI = 0.89, RMSEA = 0.041, SRMR = 0.040). Similarly, the MIST-20 was an even better predictor of performance on the DEPICT decep- tive headlines recognition task (Maertens, Roozenbeek, et al., 2021) in the August 2022 (r = .64, p < .001) sample than it was in the April 2020 sample (r = .50, p < .001). Another related limitation concerns the inherent difficulty in the MIST’s cross-cultural application. While we are greatly encouraged by our finding that the MIST appears to be an equally effective measure in the UK as in the US-American cultural context in which it was originally developed, cross- cultural translation poses a challenge. For obvious reasons, a simple and direct translation may not be sufficient. At the same time, while trustworthy news sources from which real news items could be extracted can doubtlessly be identified in any language, at the time the MIST-20 was developed, the GPT-2 (Radford et al., 2019)—the advanced language- based neural network algorithm that we employed to generate fake news items—was mainly trained on English language corpora. Meanwhile, however, an increasing amount of new research and applications has managed to make the GPT-2 work in the context of other languages (see, e.g., de Vries & Nissim, 2020; Guillou, 2020; for promising initial applica- tions in Dutch, Italian, and Portuguese). Moreover, the recent arrival of GPT-3 and GPT-4, which have support for an increasingly wide range of languages, now enables the field to develop non-English adaptations of the MIST that will empower researchers around the globe to capture the com- plex and multifaceted reality of misinformation spread—and resistance. Even without the GPT-2, researchers can create a database of their own misinformation items and use the same psychometric techniques as outlined in this paper to come to a valid misinformation susceptibility test in any culture. Therefore, we see this paper as a proof of concept on the fea- sibility of using psychometrics to develop a comprehensive misinformation susceptibility test in any culture. One other concern that may be raised is that the MIST may be confounded with general news consumption, mean- ing that those who are more aware of the news may be more likely to score high on the MIST and controlling for this may reduce the MIST’s predictive validity, and that mis- information news engagement is often driven by partisan polarization and outgroup derogation (Osmundsen et al., 2021). To investigate these concerns, we looked at data from a separate study that is currently being prepared, which contains the MIST, the CMQ, and a social media misinfor- mation and manipulative posts discernment test (Maertens et al., 2022, in prep). Looking at these data (N = 2220, US quota sample, Respondi), we found that the MIST was the single best predictor for manipulative headline discernment above the CMQ and news consumption (not controlling for news consumption: β = 0.366, p < .001, controlling for news consumption: β = 0.362, p < .001), that general news consumption was only weakly correlated with MIST perfor- mance (r = 0.218, p < .001), and that news consumption did not have an impact on the MIST’s predictive validity (see Supplement S18). In other words, the MIST discernment score does reflect ecologically valid discernment, and is not confounded by news consumption. Finally—although based on the consistent results across samples and time points it is unlikely that this has confounded the results—it should be noted that in all studies and with all samples, we have excluded participants who did not com- plete the entire study up to the analysis of interest. This means that in Study 1, the test–retest reliability may be influenced by the type of participants who participated in the follow-up (i.e., long-term Prolific users), in Study 2 it is possible that the construct validity findings were influenced by excluding participants who dropped out during the study, and in Study 3 it is possible that the evaluation was influenced by some participants dropping out between the pretest and post-test. We can see many more avenues for future studies using Verification done and the MIST. One example is the imple- mentation of the MIST in geo-psychological studies (Ebert et al., 2021; Rentfrow et al., 2013, 2015) to identify misinfor- mation hotspots and covariates with national, regional, and local levels of misinformation susceptibility. Another strand of research may further deepen our conceptual understand- ing of media literacy. For example, in light of the current findings, it appears that veracity discernment may encom- pass both a comparatively stable, trait-like component, and a more malleable skill component. Future studies may more clearly identify this distinction and find ways to best use these insights to devise effective interventions that foster better detection of both fake news and real news, and in turn ulti- mately lead to greater genuine veracity discernment. Finally, we identify six immediate use cases for the MIST: (1) to prescreen participants for studies, (2) as a covariate to investigate subgroups (e.g., that are highly susceptible to mis- information), (3) as a control variable in a model, (4) to map geographical regions to identify misinformation susceptibility hotspots, (5) to identify brain regions linked to misinformation susceptibility, and (6) to evaluate interventions. In addition, we would like to encourage the use of the Verification done framework as a general method to look at misinformation 1894 Behavior Research Methods (2024) 56:1863–1899 1 3 susceptibility and intervention effects more holistically, inde- pendent of the measure used: indeed, we would encourage practitioners to use the framework with any tests. Conclusion Researchers lack a unifying conceptualization of misinfor- mation susceptibility and too often use unvalidated measures of misinformation susceptibility. We therefore developed a new overarching, unifying and multifaceted interpretation framework (i.e., Verification done) and a new, thoroughly validated measurement instrument based on this framework (i.e., the Misinformation Susceptibility Test; MIST). The current paper acts as a blueprint of integrated theory and assessment development, and opens the door to standard- ized and comparative misinformation susceptibility research. Both researchers and practitioners can now make a thor- ough evaluation of media literacy interventions by compar- ing MIST scores using the norm tables and the Verification done framework. The use of our standardized and psycho- metrically validated instrument allows for a comprehensive evaluation, and also permits holistic comparison studies and tables to be compiled reporting all five Verification done scores. Practitioners in turn can use these scores and com- parisons to choose interventions that best fit their needs. Verification done and the MIST can be employed across a range of psychological disciplines, ranging from cognitive neuroscience to social and personality psychology, to reveal the psychological mechanisms behind susceptibility to mis- information or to test the outcome of interventions. Supplementary Information The online version contains supplemen- tary material available at https:// doi. org/ 10. 3758/ s13428- 023- 02124-2. Author note Parts of the current article were presented at a conference talk given by the first author at the 2021 Annual Convention of the Society for Personality and Social Psychology (SPSP). A preprint of the article was published on PsyArXiv at https:// doi. org/ 10. 31234/ osf. io/ gk68h. The supplements, data, and analysis scripts that support this paper’s findings, including Qualtrics files, analysis code, raw and clean data- sets, and all research materials, are openly available on the Open Sci- ence Framework (OSF) at https:// osf. io/ r7phc/. Preregistrations are available on AsPredicted at https:// aspre dicted. org/ m7vb3. pdf (Study 1, T1), https:// aspre dicted. org/ js2jz. pdf (Study 1, T2), and https:// aspre dicted. org/ nx7xu. pdf (Study 2B). Funding This work was financially supported by the United King- dom Economic and Social Research Council (ESRC), the Cambridge Trust (CT), the Winton Centre for Risk and Evidence Communication (University of Cambridge), the German Academic Exchange Service (DAAD), and the University of Virginia’s 3 Cavaliers Fund and the Center for Global Inquiry and Innovation. Declarations Conflicts of interest/Competing interests The authors have no con- flicts of interest to declare. Ethics approval All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards. The study was reviewed and approved by the Psychology Research Ethics Committee of the University of Cambridge (Study 1: PRE.2019.108; Study 2: PRE.2019.108, PRE.2020.034, PRE.2020.086, PRE.2020.120; Study 3: PRE.2020.120, PRE.2020.136). Consent to participate Informed consent was obtained from all indi- vidual participants included in Study 1, Study 2, and Study 3. Consent for publication The authors affirm that all research participants provided informed consent for the publication of the anonymized data- sets in Study 1 and Study 2. In Study 3, no personal data was collected. Open Access This article is licensed under a Creative Commons Attri- bution 4.0 International License, which permits use, sharing, adapta- tion, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/. References Aichholzer, J., & Kritzinger, S. (2016). Kurzskala politischer Zynismus (KPZ). [Short scale of political cynicism]. Zusammenstellung Sozialwissenschaftlicher Items und Skalen. https:// doi. org/ 10. 6102/ zis245 Aird, M. J., Ecker, U. K. H., Swire, B., Berinsky, A. J., & Lewan- dowsky, S. (2018). Does truth matter to voters? The effects of correcting political misinformation in an Australian sample. Royal Society Open Science, 5(12), Article 180593. https:// doi. org/ 10. 1098/ rsos. 180593 Anvari, F., & Lakens, D. (2021). Using anchor-based methods to deter- mine the smallest effect size of interest. Journal of Experimental Social Psychology. Advance online publication. https:// doi. org/ 10. 1016/j. jesp. 2021. 104159 Bago, B., Rand, D. G., & Pennycook, G. (2020). Fake news, fast and slow: Deliberation reduces belief in false (but not true) news headlines. Journal of Experimental Psychology: General, 149(8), 1608–1613. https:// doi. org/ 10. 1037/ xge00 00729 Baron, J. (2019). Actively open-minded thinking in politics. Cogni- tion, 188, 8–18. https:// doi. org/ 10. 1016/j. cogni tion. 2018. 10. 004 Basol, M., Roozenbeek, J., McClanahan, P., Berriche, M., Uenal, F., & van der Linden, S. (2021). Towards psychological herd immu- nity: Cross-cultural evidence for two prebunking interventions against COVID-19 misinformation. Big Data & Society, 8(1), 1–18. https:// doi. org/ 10. 1177/ 20539 51721 10138 68 Batailler, C., Brannon, S. M., Teas, P. E., & Gawronski, B. (2022). A signal detection approach to understanding the identification of fake news. Perspectives on Psychological Science, 17(1), 78–98. https:// doi. org/ 10. 1177/ 17456 91620 986135 Bentler, P. M., & Bonett, D. G. (1980). Significance tests and good- ness of fit in the analysis of covariance structures. Psycho- logical Bulletin, 88(3), 588–606. https:// doi. org/ 10. 1037/ 0033- 2909. 88.3. 588 https://doi.org/10.3758/s13428-023-02124-2 https://doi.org/10.31234/osf.io/gk68h https://osf.io/r7phc/ https://aspredicted.org/m7vb3.pdf https://aspredicted.org/js2jz.pdf https://aspredicted.org/nx7xu.pdf https://aspredicted.org/nx7xu.pdf http://creativecommons.org/licenses/by/4.0/ https://doi.org/10.6102/zis245 https://doi.org/10.6102/zis245 https://doi.org/10.1098/rsos.180593 https://doi.org/10.1098/rsos.180593 https://doi.org/10.1016/j.jesp.2021.104159 https://doi.org/10.1016/j.jesp.2021.104159 https://doi.org/10.1037/xge0000729 https://doi.org/10.1016/j.cognition.2018.10.004 https://doi.org/10.1177/20539517211013868 https://doi.org/10.1177/1745691620986135 https://doi.org/10.1037/0033-2909.88.3.588 https://doi.org/10.1037/0033-2909.88.3.588 1895Behavior Research Methods (2024) 56:1863–1899 1 3 Block, J. (1995). A contrarian view of the five-factor approach to per- sonality description. Psychological Bulletin, 117(2), 187–215. https:// doi. org/ 10. 1037/ 0033- 2909. 117.2. 187 Blondel, V. D., Guillaume, J.-L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Jour- nal of Statistical Mechanics: Theory and Experiment, 2008(10), P10008. https:// doi. org/ 10. 1088/ 1742- 5468/ 2008/ 10/ P10008 Boateng, G. O., Neilands, T. B., Frongillo, E. A., Melgar-Quiñonez, H. R., & Young, S. L. (2018). Best practices for developing and validating scales for health, social, and behavioral research: A primer. Frontiers in Public Health, 6, 149. https:// doi. org/ 10. 3389/ fpubh. 2018. 00149 Boker, S. M. (2018). Longitudinal multivariate psychology (E. Ferrer, S. M. Boker, & K. J. Grimm, Eds.). Routledge. https:// doi. org/ 10. 4324/ 97813 15160 542 Borsboom, D. (2008). Psychometric perspectives on diagnostic sys- tems. Journal of Clinical Psychology, 64(9), 1089–1108. https:// doi. org/ 10. 1002/ jclp. 20503 Borsboom, D., Cramer, A. O., Schmittmann, V. D., Epskamp, S., & Waldorp, L. J. (2011). The small world of psychopathology. PloS One, 6(11), e27407. https:// doi. org/ 10. 1371/ journ al. pone. 00274 07 Bovet, A., & Makse, H. A. (2019). Influence of fake news in Twitter during the 2016 US presidential election. Nature Communica- tions, 10(1), 7. https:// doi. org/ 10. 1038/ s41467- 018- 07761-2 Brady, W. J., Wills, J. A., Jost, J. T., Tucker, J. A., & Van Bavel, J. J. (2017). Emotion shapes the diffusion of moralized content in social networks. Proceedings of the National Academy of Sci- ences of the United States of America, 114(28), 7313–7318. https:// doi. org/ 10. 1073/ pnas. 16189 23114 Brick, C., Hood, B., Ekroll, V., & de-Wit, L. (2022). Illusory essences: A bias holding back theorizing in psychological science. Per- spectives on Psychological Science, 17(2), 491–506. https:// doi. org/ 10. 1177/ 17456 91621 991838 Brotherton, R., French, C. C., & Pickering, A. D. (2013). Measuring belief in conspiracy theories: The generic conspiracist beliefs scale. Frontiers in Psychology, 4, 1–15. https:// doi. org/ 10. 3389/ fpsyg. 2013. 00279 Bruder, M., Haffke, P., Neave, N., Nouripanah, N., & Imhoff, R. (2013). Measuring individual differences in generic beliefs in conspiracy theories across cultures: Conspiracy mentality questionnaire. Frontiers in Psychology, 4(279), 1–15. https:// doi. org/ 10. 3389/ fpsyg. 2013. 00225 Buhrmester, M. D., Talaifar, S., & Gosling, S. D. (2018). An evaluation of Amazon’s Mechanical Turk, its rapid rise, and its effective use. Perspectives on Psychological Science, 13(2), 149–154. https:// doi. org/ 10. 1177/ 17456 91617 706516 Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56(2), 81–105. https:// www. ncbi. nlm. nih. gov/ pubmed/ 13634 291. Carpenter, S. (2018). Ten steps in scale development and reporting: A guide for researchers. Communication Methods and Measures, 12(1), 25–44. https:// doi. org/ 10. 1080/ 19312 458. 2017. 13965 83 Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the Renvironment. Journal of Statistical Software, 48(6), 1–29. https:// doi. org/ 10. 18637/ jss. v048. i06 Champely, S., Ekstrom, C., Dalgaard, P., Gill, J., Weibelzahl, S., Anandkumar, A., ... & De Rosario, M. H. (2021). pwr: Basic functions for power analysis. The Comprehensive R Archive Net- work. https:// cran.r- proje ct. org/ packa ge= pwr Chen, J., & Chen, Z. (2008). Extended Bayesian information crite- ria for model selection with large model spaces. Biometrika, 95(3), 759–771. https:// doi. org/ 10. 1093/ biomet/ asn034 Chołoniewski, J., Sienkiewicz, J., Dretnik, N., Leban, G., Thel- wall, M., & Hołyst, J. A. (2020). A calibrated measure to compare fluctuations of different entities across timescales. Scientific Reports, 10(1), Article 20673. https:// doi. org/ 10. 1038/ s41598- 020- 77660-4 Christensen, A. P., Cotter, K. N., & Silvia, P. J. (2019). Reopening open- ness to experience: A network analysis of four openness to expe- rience inventories. Journal of Personality Assessment, 101(6), 574–588. https:// doi. org/ 10. 1080/ 00223 891. 2018. 14674 28 Christensen, A. P., Garrido, L. E., & Golino, H. (2020a). Unique variable analysis: A novel approach for detecting redundant variables in multivariate data. PsyArXiv. https:// doi. org/ 10. 31234/ osf. io/ 4kra2 Christensen, A. P., Golino, H., & Silvia, P. J. (2020b). A psychomet- ric network perspective on the validity and validation of per- sonality trait questionnaires. European Journal of Personality, 34(6), 1095–1108. https:// doi. org/ 10. 1002/ per. 2265 Christensen, A. P., & Golino, H. (2021a). Estimating the stability of psychological dimensions via bootstrap exploratory graph analysis: A Monte Carlo simulation and tutorial. Psych, 3(3), 479–500. https:// doi. org/ 10. 3390/ psych 30300 32 Christensen, A. P., & Golino, H. (2021b). Factor or network model? Predictions from neural networks. Journal of Behavioral Data Science, 1(1), 85–126. https:// doi. org/ 10. 35566/ jbds/ v1n1/ p5 Christensen, A. P., & Golino, H. (2021c). On the equivalency of factor and network loadings. Behavior Research Methods, 53, 1563–1580. https:// doi. org/ 10. 3758/ s13428- 020- 01500-6 Cichocka, A., Marchlewska, M., & de Zavala, A. G. (2016). Does self-love or self-hate predict conspiracy beliefs? Narcissism, self-esteem, and the endorsement of conspiracy theories. Social Psychological and Personality Science, 7(2), 157–166. https:// doi. org/ 10. 1177/ 19485 50615 616170 Cinelli, M., Quattrociocchi, W., Galeazzi, A., Valensise, C. M., Brugnoli, E., Schmidt, A. L., et al. (2020). The COVID-19 social media infodemic. Scientific Reports, 10(1), 1–10. https:// doi. org/ 10. 1038/ s41598- 020- 73510-5 Cinelli, M., De Francisci Morales, G., Galeazzi, A., Quattrociocchi, W., & Starnini, M. (2021). The echo chamber effect on social media. Proceedings of the National Academy of Sciences of the United States of America, 118(9), e2023301118. https:// doi. org/ 10. 1073/ pnas. 20233 01118 Clark, L. A., & Watson, D. (1995). Constructing validity: Basic issues in objective scale development. Psychological Assessment, 7(3), 309–319. https:// doi. org/ 10. 1037// 1040- 3590.7. 3. 309 Clark, L. A., & Watson, D. (2019). Constructing validity: New develop- ments in creating objective measuring instruments. Psychological Assessment, 31(12), 1412–1427. https:// doi. org/ 10. 1037/ pas00 00626 Cokely, E. T., Galesic, M., Schulz, E., Ghazal, S., & Garcia-Retamero, R. (2012). Measuring risk literacy: The Berlin numeracy test. Judgment and Decision Making, 7(1), 25–47. http:// journ al. sjdm. org/ 11/ 11808/ jdm11 808. pdf Comrey, A. L., & Lee, H. B. (1992). A first course in factor analysis (2nd ed.). Erlbaum Associates Condon, D. M., Wood, D., Mõttus, R., Booth, T., Costantini, G., Greiff, S., ..., Zimmermann, J. (2020). Bottom up construction of a per- sonality taxonomy. European Journal of Psychological Assess- ment, 36(6), 923–934. https:// doi. org/ 10. 1027/ 1015- 5759/ a0006 26 Cook, J., Lewandowsky, S., & Ecker, U. K. H. (2017). Neutralizing misinformation through inoculation: Exposing misleading argu- mentation techniques reduces their influence. PloS One, 12(5), e0175799. https:// doi. org/ 10. 1371/ journ al. pone. 01757 99 Costello, A. B., & Osborne, J. (2005). Best practices in exploratory factor analysis: Four recommendations for getting the most from your analysis, Practical Assessment, Research, and Eval- uation, 10(1), 7. https:// doi. org/ 10. 7275/ jyj1- 4868 Cramer, A. O. (2012). Why the item “23+ 1” is not in a depression questionnaire: Validity from a network perspective. Meas- urement: Interdisciplinary Research & Perspective, 10(1-2), 50–54. https:// doi. org/ 10. 1080/ 15366 367. 2012. 681973 https://doi.org/10.1037/0033-2909.117.2.187 https://doi.org/10.1088/1742-5468/2008/10/P10008 https://doi.org/10.3389/fpubh.2018.00149 https://doi.org/10.3389/fpubh.2018.00149 https://doi.org/10.4324/9781315160542 https://doi.org/10.4324/9781315160542 https://doi.org/10.1002/jclp.20503 https://doi.org/10.1002/jclp.20503 https://doi.org/10.1371/journal.pone.0027407 https://doi.org/10.1038/s41467-018-07761-2 https://doi.org/10.1073/pnas.1618923114 https://doi.org/10.1177/1745691621991838 https://doi.org/10.1177/1745691621991838 https://doi.org/10.3389/fpsyg.2013.00279 https://doi.org/10.3389/fpsyg.2013.00279 https://doi.org/10.3389/fpsyg.2013.00225 https://doi.org/10.3389/fpsyg.2013.00225 https://doi.org/10.1177/1745691617706516 https://doi.org/10.1177/1745691617706516 https://www.ncbi.nlm.nih.gov/pubmed/13634291 https://www.ncbi.nlm.nih.gov/pubmed/13634291 https://doi.org/10.1080/19312458.2017.1396583 https://doi.org/10.18637/jss.v048.i06 https://cran.r-project.org/package=pwr https://doi.org/10.1093/biomet/asn034 https://doi.org/10.1038/s41598-020-77660-4 https://doi.org/10.1038/s41598-020-77660-4 https://doi.org/10.1080/00223891.2018.1467428 https://doi.org/10.31234/osf.io/4kra2 https://doi.org/10.31234/osf.io/4kra2 https://doi.org/10.1002/per.2265 https://doi.org/10.3390/psych3030032 https://doi.org/10.35566/jbds/v1n1/p5 https://doi.org/10.3758/s13428-020-01500-6 https://doi.org/10.1177/1948550615616170 https://doi.org/10.1038/s41598-020-73510-5 https://doi.org/10.1038/s41598-020-73510-5 https://doi.org/10.1073/pnas.2023301118 https://doi.org/10.1073/pnas.2023301118 https://doi.org/10.1037//1040-3590.7.3.309 https://doi.org/10.1037/pas0000626 http://journal.sjdm.org/11/11808/jdm11808.pdf http://journal.sjdm.org/11/11808/jdm11808.pdf https://doi.org/10.1027/1015-5759/a000626 https://doi.org/10.1371/journal.pone.0175799 https://doi.org/10.7275/jyj1-4868 https://doi.org/10.1080/15366367.2012.681973 1896 Behavior Research Methods (2024) 56:1863–1899 1 3 Cramer, A., Waldorp, L. J., Van Der Maas, H. L., & Borsboom, D. (2010). Comorbidity: A network perspective. Behavioral and Brain Sciences, 33(2-3), 137–150. https:// doi. org/ 10. 1017/ S0140 525X0 99915 67 Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psycho- logical tests. Psychological Bulletin, 52(4), 281–302. https:// doi. org/ 10. 1037/ h0040 957 Curley, A. (2020). How to use GPT-2 in Google Colab. The Startup. https:// medium. com/ swlh/ how- to- use- gpt-2- in- google- colab- de44f 59199 c1 Curran, P. J., Bollen, K. A., Chen, F., Paxton, P., & Kirby, J. B. (2003). Finite sampling properties of the point estimates and confidence intervals of the RMSEA. Sociological Methods & Research, 32(2), 208–252. https:// doi. org/ 10. 1177/ 00491 24103 256130 de Vries, W., & Nissim, M. (2020). As good as new: How to success- fully recycle English GPT-2 to make models for other languages. ArXiv. https:// arxiv. org/ abs/ 2012. 05628. Accessed 10 Dec 2020. Deffner, D., Rohrer, J. M., & McElreath, R. (2022). A causal frame- work for cross-cultural generalizability. Advances in Methods and Practices in Psychological Science, 5(3), 1–18. https:// doi. org/ 10. 1177/ 25152 45922 11063 66 Dhami, M. K., Hertwig, R., & Hoffrage, U. (2004). The role of rep- resentative design in an ecological approach to cognition. Psy- chological Bulletin, 130(6), 959–988. https:// doi. org/ 10. 1037/ 0033- 2909. 130.6. 959 Dür, A., & Schlipphak, B. (2021). Elite cueing and attitudes towards trade agreements: The case of TTIP. European Political Science Review, 13(1), 41–57. https:// doi. org/ 10. 1017/ S1755 77392 00003 4X Ebert, T., Götz, F. M., Gladstone, J. J., Müller, S. R., & Matz, S. C. (2021). Spending reflects not only who we are but also who we are around: The joint effects of individual and geographic per- sonality on consumption. Journal of Personality and Social Psy- chology, 121(2), 378–393. https:// doi. org/ 10. 1037/ pspp0 000344 Epskamp, S., & Fried, E. (2018). A tutorial on regularized partial correlation networks. Psychological Methods, 23(4), 617–634. https:// doi. org/ 10. 1037/ met00 00167 Epskamp, S., Maris, G., Waldorp, L. J., & Borsboom, D. (2018). Net- work psychometrics. In B. Irwing Paul (Ed.), The Wiley hand- book of psychometric testing: A multidisciplinary reference on survey, scale and test development (pp. 953–986). John Wiley & Sons Ltd.. https:// doi. org/ 10. 1002/ 97811 18489 772. ch30 Epskamp, S., Rhemtulla, M., & Borsboom, D. (2017). Generalized network psychometrics: Combining network and latent variable models. Psychometrika, 82(4), 904–927. https:// doi. org/ 10. 1007/ s11336- 017- 9557-x Eysenck, H. J. (1967). The biological basis of personality. Thomas Fabrigar, L. R., Wegener, D. T., MacCallum, R. C., & Strahan, E. J. (1999). Evaluating the use of exploratory factor analysis in psychological research. Psychological Methods, 4(3), 272–299. https:// doi. org/ 10. 1037/ 1082- 989X.4. 3. 272 Fazio, L. K. (2020). Pausing to consider why a headline is true or false can help reduce the sharing of false news. Harvard Kennedy School Mis- information Review, 1(2), 1–8. https:// doi. org/ 10. 37016/ mr- 2020- 009 Finch, J. F., & West, S. G. (1997). The investigation of personality structure: Statistical models. Journal of Research in Personality, 31(4), 439–485. https:// doi. org/ 10. 1006/ jrpe. 1997. 2194 Flake, J. K., Pek, J., & Hehman, E. (2017). Construct validation in social and personality research: Current practice and recom- mendations. Social Psychological and Personality Science, 8(4), 370–378. https:// doi. org/ 10. 1177/ 19485 50617 693063 Ford, J. K., MacCallum, R. C., & Tait, M. (1986). The application of exploratory factor analysis in applied psychology: A criti- cal review and analysis. Personnel Psychology, 39(2), 291–314. https:// doi. org/ 10. 1111/j. 1744- 6570. 1986. tb005 83.x Frederick, S. (2005). Cognitive reflection and decision making. The Journal of Economic Perspectives: A Journal of the American Economic Association, 19(4), 25–42. https:// doi. org/ 10. 1257/ 08953 30057 75196 732 Funder, D. C., & Ozer, D. J. (2019). Evaluating effect size in psycho- logical research: Sense and nonsense. Advances in Methods and Practices in Psychological Science, 2(2), 156–168. https:// doi. org/ 10. 1177/ 25152 45919 847202 Golino, H. F., & Demetriou, A. (2017). Estimating the dimensionality of intelligence like data using exploratory graph analysis. Intel- ligence, 62, 54–70. https:// doi. org/ 10. 1016/j. intell. 2017. 02. 007 Golino, H. F., & Epskamp, S. (2017). Exploratory graph analysis: A new approach for estimating the number of dimensions in psy- chological research. PloS One, 12(6), e0174035. https:// doi. org/ 10. 1371/ journ al. pone. 01740 35 Golino, H. F., & Christensen, A. P. (2019). EGAnet: Exploratory graph analysis: A framework for estimating the number of dimensions in multivariate data using network psychometrics. The Comprehensive R Archive Network. https:// cran.r- proje ct. org/ packa ge= EGAnet Golino, H. F., Christensen, A. P., & Garrido, L. E. (2022). Exploratory graph analysis in context. Revista Psicologia: Teoria e Prática, 24(3), ePTPPA14197. https:// doi. org/ 10. 5935/ 1980- 6906/ ePTPI C15531. en Golino, H. F., Lillard, A. S., Becker, I., & Christensen, A. P. (2021). Investigating the structure of the children’s concentration and empathy scale using exploratory graph analysis. Psychological Test Adaptation and Development, 2(1), 35–49. https:// doi. org/ 10. 1027/ 2698- 1866/ a0000 08 Golino, H. F., Moulder, R., Shi, D., Christensen, A., Garrido, L., Neto, M., et al. (2020a). Entropy fit indices: New fit measures for assess- ing the structure and dimensionality of multiple latent variables. Multivariate Behavioral Research, 56(6), 874–902. https:// doi. org/ 10. 1080/ 00273 171. 2020. 17796 42 Golino, H. F., Shi, D., Garrido, L. E., Christensen, A. P., Nieto, M. D., Sadana, R., et al. (2020b). Investigating the performance of exploratory graph analysis and traditional techniques to identify the number of latent factors: A simulation and tutorial. Psychologi- cal Methods, 25(3), 292–230. https:// doi. org/ 10. 1037/ met00 00255 Goretzko, D., Pham, T. T. H., & Bühner, M. (2021). Exploratory fac- tor analysis: Current use, methodological developments and recommendations for good practice. Current Psychology, 40(7), 3510–3521. https:// doi. org/ 10. 1007/ s12144- 019- 00300-2 Götz, F. M., Maertens, R., Loomba, S., & van der Linden, S. (2023). Let the algorithm speak: How to use neural networks for auto- matic item generation in psychological scale development. Psy- chological Methods. Advance online publication. https:// doi. org/ 10. 1037/ met00 00540 Götz, F. M., Gosling, S. D., & Rentfrow, P. J. (2022). Small effects: The indispensable foundation for a cumulative psychological science. Perspectives on Psychological Science, 17(1), 205– 215. https:// doi. org/ 10. 1177/ 17456 91620 9844 Graham, J., Nosek, B. A., Haidt, J., Iyer, R., Koleva, S., & Ditto, P. H. (2011). Mapping the moral domain. Journal of Personality and Social Psychology, 101(2), 366–385. https:// doi. org/ 10. 1037/ a0021 847 Guadagnoli, E., & Velicer, W. F. (1988). Relation of sample size to the stability of component patterns. Psychological Bulletin, 103(2), 265–275. https:// doi. org/ 10. 1037/ 0033- 2909. 103.2. 265 Guess, A. M., Lerner, M., Lyons, B., Montgomery, J. M., Nyhan, B., Reifler, J., & Sircar, N. (2020). A digital media literacy inter- vention increases discernment between mainstream and false news in the United States and India. Proceedings of the National Academy of Sciences of the United States of America, 117(27), 15536–15545. https:// doi. org/ 10. 1073/ pnas. 19204 98117 Grzesiak-Feldman, M. (2013). The effect of high-anxiety situations on conspiracy thinking. Current Psychology, 32(1), 100–118. https:// doi. org/ 10. 1007/ s12144- 013- 9165-6 Guillou, P. (2020). Faster than training from scratch — Fine-tuning the English GPT-2 in any language with Hugging Face and fastai v2 https://doi.org/10.1017/S0140525X09991567 https://doi.org/10.1017/S0140525X09991567 https://doi.org/10.1037/h0040957 https://doi.org/10.1037/h0040957 https://medium.com/swlh/how-to-use-gpt-2-in-google-colab-de44f59199c1 https://medium.com/swlh/how-to-use-gpt-2-in-google-colab-de44f59199c1 https://doi.org/10.1177/0049124103256130 https://arxiv.org/abs/2012.05628 https://doi.org/10.1177/25152459221106366 https://doi.org/10.1177/25152459221106366 https://doi.org/10.1037/0033-2909.130.6.959 https://doi.org/10.1037/0033-2909.130.6.959 https://doi.org/10.1017/S175577392000034X https://doi.org/10.1037/pspp0000344 https://doi.org/10.1037/met0000167 https://doi.org/10.1002/9781118489772.ch30 https://doi.org/10.1007/s11336-017-9557-x https://doi.org/10.1007/s11336-017-9557-x https://doi.org/10.1037/1082-989X.4.3.272 https://doi.org/10.37016/mr-2020-009 https://doi.org/10.1006/jrpe.1997.2194 https://doi.org/10.1177/1948550617693063 https://doi.org/10.1111/j.1744-6570.1986.tb00583.x https://doi.org/10.1257/089533005775196732 https://doi.org/10.1257/089533005775196732 https://doi.org/10.1177/2515245919847202 https://doi.org/10.1177/2515245919847202 https://doi.org/10.1016/j.intell.2017.02.007 https://doi.org/10.1371/journal.pone.0174035 https://doi.org/10.1371/journal.pone.0174035 https://cran.r-project.org/package=EGAnet https://doi.org/10.5935/1980-6906/ePTPIC15531.en https://doi.org/10.1027/2698-1866/a000008 https://doi.org/10.1027/2698-1866/a000008 https://doi.org/10.1080/00273171.2020.1779642 https://doi.org/10.1080/00273171.2020.1779642 https://doi.org/10.1037/met0000255 https://doi.org/10.1007/s12144-019-00300-2 https://doi.org/10.1037/met0000540 https://doi.org/10.1037/met0000540 https://doi.org/10.1177/17456916209844 https://doi.org/10.1037/a0021847 https://doi.org/10.1037/a0021847 https://doi.org/10.1037/0033-2909.103.2.265 https://doi.org/10.1073/pnas.1920498117 https://doi.org/10.1007/s12144-013-9165-6 1897Behavior Research Methods (2024) 56:1863–1899 1 3 (practical case with Portuguese). Medium. https:// medium. com/@ pierre_ guill ou/ faster- than- train ing- from- scrat ch- fine- tuning- the- engli sh- gpt-2- in- any- langu age- with- huggi ng- f2ec0 5c987 87 Hair, J. F., Anderson, R. E., Babin, B. J., & Black, W. C. (2010). Mul- tivariate data analysis: A global perspective (7th ed.). Pearson Haynes, S. N., Richard, D. C. S., & Kubany, E. S. (1995). Content validity in psychological assessment: A functional approach to concepts and methods. Psychological Assessment, 7(3), 238–247. https:// doi. org/ 10. 1037/ 1040- 3590.7. 3. 238 Heinsohn, T., Fatke, M., Israel, J., Marschall, S., & Schultze, M. (2019). Effects of voting advice applications during election campaigns: Evidence from a panel study at the 2014 European elections. Journal of Information Technology & Politics, 16(3), 250–264. https:// doi. org/ 10. 1080/ 19331 681. 2019. 16442 65 Ho, A. K., Sidanius, J., Kteily, N., Sheehy-Skeffington, J., Pratto, F., Henkel, K. E., Foels, R., & Stewart, A. L. (2015). The nature of social dominance orientation: Theorizing and measuring prefer- ences for intergroup inequality using the new SDO7 scale. Jour- nal of Personality and Social Psychology, 109(6), 1003–1028. https:// doi. org/ 10. 1037/ pspi0 000033 Hofstee, W. K., de Raad, B., & Goldberg, L. R. (1992). Integration of the big five and circumplex approaches to trait structure. Journal of Personality and Social Psychology, 63(1), 146–163. https:// doi. org/ 10. 1037// 0022- 3514. 63.1. 146 Holland, P. W., & Wainer, H. (Eds.). (1993). Differential item func- tioning. Lawrence Erlbaum. https:// psycn et. apa. org/ record/ 1993- 97193- 000 Hommel, B. E., Wollang, F. J. M., Kotova, V., Zacher, H., & Schmukle, S. C. (2022). Transformer-based deep neural language modeling for construct-specific automatic item generation. Psychometrika, 87(2), 749–772. https:// doi. org/ 10. 1007/ s11336- 021- 09823-9 Horn, J. L. (1965). A rationale and test for the number of factors in factor analysis. Psychometrika, 30(2), 179–185. https:// doi. org/ 10. 1007/ BF022 89447 Hotez, P., Batista, C., Ergonul, O., Figueroa, J. P., Gilbert, S., Gursel, M., Hassanain, M., Kang, G., Kim, J. H., Lall, B., Larson, H., Naniche, D., Sheahan, T., Shoham, S., Wilder-Smith, A., Strub- Wourgaft, N., Yadav, P., & Bottazzi, M. E. (2021). Correcting COVID-19 vaccine misinformation. EClinicalMedicine, 33, Article 100780. https:// doi. org/ 10. 1016/j. eclinm. 2021. 100780 Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covari- ance structure analysis: Conventional criteria versus new alterna- tives. Structural Equation Modeling: A Multidisciplinary Jour- nal, 6(1), 1–55. https:// doi. org/ 10. 1080/ 10705 51990 95401 18 Humphreys, L. G., & Ilgen, D. R. (1969). Note on a criterion for the number of common factors. Educational and Psychological Measurement, 29(3), 571–578. https:// doi. org/ 10. 1177/ 00131 64469 02900 303 Jamison, L., Golino, H., & Christensen, A. P. (2022). Metric invariance in exploratory graph analysis via permutation testing. PsycArxiv. https:// doi. org/ 10. 31234/ osf. io/ j4rx9 Jimenez, M., Abad, F. J., Garcia-Garzon, E., Golino, H., Christensen, A. P., & Garrido, L. E. (2022). Dimensionality assessment in generalized bi-factor structures: A network psychometrics approach. PsyArXiv. https:// doi. org/ 10. 31234/ osf. io/ 2ujdk Jolley, D., & Paterson, J. L. (2020). Pylons ablaze: Examining the role of 5G COVID-19 conspiracy beliefs and support for violence. British Journal of Social Psychology, 59(3), 628–640. https:// doi. org/ 10. 1111/ bjso. 12394 Konrath, S., Meier, B. P., & Bushman, B. J. (2014). Development and val- idation of the Single Item Narcissism Scale (SINS). PloS One, 9(8), Article e103469. https:// doi. org/ 10. 1371/ journ al. pone. 01034 69 Kumareswaran, D. J. (2014). The psychopathological foundations of conspiracy theorists. Victoria University of Wellington. http:// hdl. handle. net/ 10063/ 3603 Lauritzen, S. L. (1996). Graphical models (Vol. 17). Clarendon Press. Lawson, A., & Kakkar, H. (2021). Of pandemics, politics, and personal- ity: The role of conscientiousness and political ideology in shar- ing of fake news. PsyArXiv. https:// doi. org/ 10. 31234/ osf. io/ ves5m Lewandowsky, S., Ecker, U. K. H., & Cook, J. (2017). Beyond misin- formation: Understanding and coping with the “post-truth” era. Journal of Applied Research in Memory and Cognition, 6(4), 353–369. https:// doi. org/ 10. 1016/j. jarmac. 2017. 07. 008 Lewandowsky, S., Smillie, L., Garcia, D., Hertwig, R., Weatherall, J., Egidy, S., Robertson, R. E., O’Connor, C., Kozyreva, A., Lorenz- Spreen, P., Blaschke, Y., & Leiser, M. R. (2020). Technology and democracy: Understanding the influence of online technologies on political behaviour and decision-making. Publications Office of the European Union. https:// doi. org/ 10. 2760/ 709177 Lewandowsky, S., & van der Linden, S. (2021). Countering misin- formation and fake news through inoculation and prebunking. European Review of Social Psychology, 32(2), 348–384. https:// doi. org/ 10. 1080/ 10463 283. 2021. 18769 83 Litman, L., Robinson, J., & Abberbock, T. (2017). TurkPrime.com: A versatile crowdsourcing data acquisition platform for the behav- ioral sciences. Behavior Research Methods, 49(2), 433–442. https:// doi. org/ 10. 3758/ s13428- 016- 0727-z Loevinger, J. (1957). Objective tests as instruments of psychological theory. Psychological Reports, 3(3), 635–694. https:// doi. org/ 10. 2466/ pr0. 1957.3. 3. 635 Loomba, S., de Figueiredo, A., Piatek, S. J., de Graaf, K., & Larson, H. J. (2021). Measuring the impact of COVID-19 vaccine mis- information on vaccination intent in the UK and USA. Nature Human Behaviour, 5(3), 337–348. https:// doi. org/ 10. 1038/ s41562- 021- 01056-1 Marsman, M., Borsboom, D., Kruis, J., Epskamp, S., van Bork, R., Waldorp, L., et al. (2018). An introduction to network psycho- metrics: Relating Ising network models to item response theory models. Multivariate Behavioral Research, 53(1), 15–35. McNeish, D., & Wolf, M. G. (2021). Dynamic fit index cutoffs for confirmatory factor analysis models. Psychological Methods. Advance online publication. https:// doi. org/ 10. 1037/ met00 00425 Maertens, R., Anseel, F., & van der Linden, S. (2020). Combatting climate change misinformation: Evidence for longevity of inocu- lation and consensus messaging effects. Journal of Environmen- tal Psychology, 70, 101455. https:// doi. org/ 10. 1016/j. jenvp. 2020. 101455 Maertens, R., Roozenbeek, J., Basol, M., & van der Linden, S. (2021). Long-term effectiveness of inoculation against misinformation: Three longitudinal experiments. Journal of Experimental Psy- chology: Applied, 27(1), 1–16. https:// doi. org/ 10. 1037/ xap00 00315 Maertens, R., Roozenbeek, J., Simons, J., Lewandowsky, S., Maturo, V., Goldberg, B., ...,, van der Linden, S. (2022). Psychological booster shots targeting memory increase long-term resistance against misinformation. [Manuscript in preparation] Markon, K. E. (2019). Bifactor and hierarchical models: Specifica- tion, inference, and interpretation. Annual Review of Clinical Psychology, 15, 51–69. https:// doi. org/ 10. 1146/ annur ev- clinp sy- 050718- 095522 McDonald, R. P. (1999). Test theory: A unified treatment. Psychology Press. https:// doi. org/ 10. 4324/ 97814 10601 087 Meehl, P. E. (1978). Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress of soft psychology. Journal of Consulting and Clinical Psychology, 46(4), 806–834. https:// doi. org/ 10. 1037/ 0022- 006X. 46.4. 806 Nasser, M. A. (2020) Step-by-step guide on how to train GPT-2 on books using Google Colab. Towards Data Science. https:// towardsdatascience.com/step-by-step-guide-on-how-to-train- gpt-2-on-books-using-google-colab-b3c6fa15fef0 Nguyen, T. H., Han, H.-R., Kim, M. T., & Chan, K. S. (2014). An intro- duction to item response theory for patient-reported outcome https://medium.com/@pierre_guillou/faster-than-training-from-scratch-fine-tuning-the-english-gpt-2-in-any-language-with-hugging-f2ec05c98787 https://medium.com/@pierre_guillou/faster-than-training-from-scratch-fine-tuning-the-english-gpt-2-in-any-language-with-hugging-f2ec05c98787 https://medium.com/@pierre_guillou/faster-than-training-from-scratch-fine-tuning-the-english-gpt-2-in-any-language-with-hugging-f2ec05c98787 https://doi.org/10.1037/1040-3590.7.3.238 https://doi.org/10.1080/19331681.2019.1644265 https://doi.org/10.1037/pspi0000033 https://doi.org/10.1037//0022-3514.63.1.146 https://doi.org/10.1037//0022-3514.63.1.146 https://psycnet.apa.org/record/1993-97193-000 https://psycnet.apa.org/record/1993-97193-000 https://doi.org/10.1007/s11336-021-09823-9 https://doi.org/10.1007/BF02289447 https://doi.org/10.1007/BF02289447 https://doi.org/10.1016/j.eclinm.2021.100780 https://doi.org/10.1080/10705519909540118 https://doi.org/10.1177/001316446902900303 https://doi.org/10.1177/001316446902900303 https://doi.org/10.31234/osf.io/j4rx9 https://doi.org/10.31234/osf.io/2ujdk https://doi.org/10.1111/bjso.12394 https://doi.org/10.1111/bjso.12394 https://doi.org/10.1371/journal.pone.0103469 http://hdl.handle.net/10063/3603 http://hdl.handle.net/10063/3603 https://doi.org/10.31234/osf.io/ves5m https://doi.org/10.1016/j.jarmac.2017.07.008 https://doi.org/10.2760/709177 https://doi.org/10.1080/10463283.2021.1876983 https://doi.org/10.1080/10463283.2021.1876983 https://doi.org/10.3758/s13428-016-0727-z https://doi.org/10.2466/pr0.1957.3.3.635 https://doi.org/10.2466/pr0.1957.3.3.635 https://doi.org/10.1038/s41562-021-01056-1 https://doi.org/10.1038/s41562-021-01056-1 https://doi.org/10.1037/met0000425 https://doi.org/10.1016/j.jenvp.2020.101455 https://doi.org/10.1016/j.jenvp.2020.101455 https://doi.org/10.1037/xap0000315 https://doi.org/10.1037/xap0000315 https://doi.org/10.1146/annurev-clinpsy-050718-095522 https://doi.org/10.1146/annurev-clinpsy-050718-095522 https://doi.org/10.4324/9781410601087 https://doi.org/10.1037/0022-006X.46.4.806 https://doi.org/10.1037/0022-006X.46.4.806 1898 Behavior Research Methods (2024) 56:1863–1899 1 3 measurement. The Patient, 7(1), 23–35. https:// doi. org/ 10. 1007/ s40271- 013- 0041-0 Norenzayan, A., & Hansen, I. G. (2006). Belief in supernatural agents in the face of death. Personality & Social Psychology Bulletin, 32(2), 174–187. https:// doi. org/ 10. 1177/ 01461 67205 280251 Osmundsen, M., Bor, A., Vahlstrup, P. B., Bechmann, A., & Petersen, M. B. (2021). Partisan polarization is the primary psychological motivation behind political fake news sharing on Twitter. Ameri- can Political Science Review, 115(3), 999–1015. https:// doi. org/ 10. 1017/ S0003 05542 10002 90 Palan, S., & Schitter, C. (2018). Prolific.ac—A subject pool for online experiments. Journal of Behavioral and Experimental Finance, 17, 22–27. https:// doi. org/ 10. 1016/j. jbef. 2017. 12. 004 Paulhus, D. L., Buckels, E. E., Trapnell, P. D., & Jones, D. N. (2020). Screening for dark personalities. European Journal of Psycho- logical Assessment, 37(3), 208–222. https:// doi. org/ 10. 1027/ 1015- 5759/ a0006 02 Peer, E., Brandimarte, L., Samat, S., & Acquisti, A. (2017). Beyond the Turk: Alternative platforms for crowdsourcing behavioral research. Journal of Experimental Social Psychology, 70, 153– 163. https:// doi. org/ 10. 1016/j. jesp. 2017. 01. 006 Pennycook, G., Binnendyk, J., Newton, C., & Rand, D. G. (2021a). A practical guide to doing behavioral research on fake news and misinformation. Collabra: Psychology, 7(1), 25293. https:// doi. org/ 10. 1525/ colla bra. 25293 Pennycook, G., Cheyne, J. A., Barr, N., Koehler, D. J., & Fugelsang, J. A. (2015). On the reception and detection of pseudo-profound bullshit. Judgment and Decision Making, 10(6), 549–563. http:// journ al. sjdm. org/ 15/ 15923a/ jdm15 923a. pdf Pennycook, G., Epstein, Z., Mosleh, M., Arechar, A. A., Eckles, D., & Rand, D. G. (2021b). Shifting attention to accuracy can reduce misinformation online. Nature, 592, 590–595. https:// doi. org/ 10. 1038/ s41586- 021- 03344-2 Pennycook, G., McPhetres, J., Zhang, Y., Lu, J. G., & Rand, D. G. (2020). Fighting COVID-19 misinformation on social media: Experimental evidence for a scalable accuracy-nudge interven- tion. Psychological Science, 31(7), 770–780. https:// doi. org/ 10. 1177/ 09567 97620 939054 Pennycook, G., & Rand, D. G. (2019). Lazy, not biased: Susceptibility to partisan fake news is better explained by lack of reasoning than by motivated reasoning. Cognition, 188, 39–50. https:// doi. org/ 10. 1016/j. cogni tion. 2018. 06. 011 Pennycook, G., & Rand, D. G. (2020). Who falls for fake news? The roles of bullshit receptivity, overclaiming, familiarity, and ana- lytic thinking. Journal of Personality, 88(2), 185–200. https:// doi. org/ 10. 1111/ jopy. 12476 Pennycook, G., & Rand, D. G. (2021). The psychology of fake news. Trends in Cognitive Sciences, 25(5), 388–402. https:// doi. org/ 10. 1016/j. tics. 2021. 02. 007 Pituch, K. A., & Stevens, J. P. (2015). Applied multivariate statistics for the social sciences: Analyses with SAS and IBM’s SPSS. Rout- ledge. https:// doi. org/ 10. 4324/ 97813 15814 919 Pons, P., & Latapy, M. (2005). Computing communities in large net- works using random walks. In Pi. Yolum, T. Güngör, F. Gürgen, & C. Özturan (Eds.), Computer and information sciences - ISCIS 2005 (pp. 284–293). Berlin, Heidelberg: Springer. https:// doi. org/ 10. 1007/ 11569 596_ 31 Preskill, J. (2018). Quantum Shannon entropy. In J. Preskill (Ed.), Quantum information (p. 94). Cambridge University Press. https:// arxiv. org/ pdf/ 1604. 07450. pdf Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. https:// d4muc fpksy wv. cloud front. net/ better- langu age- models/ langu age- models. pdf Rammstedt, B., Lechner, C. M., & Danner, D. (2021). Short forms do not fall short: A comparison of three (extra-)short forms of the Big Five. European Journal of Psychological Assessment, 37(1), 23–32. https:// doi. org/ 10. 1027/ 1015- 5759/ a0005 74 Reise, S. P., Widaman, K. F., & Pugh, R. H. (1993). Confirmatory factor analysis and item response theory: Two approaches for exploring measurement invariance. Psychological Bulletin, 114(3), 552–566. https:// doi. org/ 10. 1037/ 0033- 2909. 114.3. 552 Rentfrow, P. J., Gosling, S. D., Jokela, M., Stillwell, D. J., Kosinski, M., & Potter, J. (2013). Divided we stand: Three psychological regions of the United States and their political, economic, social, and health correlates. Journal of Personality and Social Psychol- ogy, 105(6), 996–1012. https:// doi. org/ 10. 1037/ a0034 434 Rentfrow, P. J., Jokela, M., & Lamb, M. E. (2015). Regional personality differences in Great Britain. PloS One, 10(3), e0122245. https:// doi. org/ 10. 1371/ journ al. pone. 01222 45 Revelle, W. (2021). psych: Procedures for psychological, psychometric, and personality research. The Comprehensive R Archive Network. https:// cran.r- proje ct. org/ packa ge= psych Revelle, W., & Condon, D. M. (2019). Reliability from α to ω: A tuto- rial. Psychological Assessment, 31(12), 1395–1411. https:// doi. org/ 10. 1037/ pas00 00754 Robins, R. W., Hendin, H. M., & Trzesniewski, K. H. (2001). Measuring global self-esteem: Construct validation of a single-item measure and the Rosenberg self-esteem scale. Personality & Social Psychology Bulletin, 27(2), 151–161. https:// doi. org/ 10. 1177/ 01461 67201 272002 Roozenbeek, J., Culloty, E., & Suiter, J. (2023). Countering misinfor- mation: Evidence, knowledge gaps, and implications of current interventions. European Psychologist. In press. https:// doi. org/ 10. 31234/ osf. io/ b52um Roozenbeek, J., Freeman, A. L. J., & van der Linden, S. (2021a). How accurate are accuracy nudges? A pre-registered direct replica- tion of Pennycook et al. (2020). Psychological Science, 32(7), 1169–1178. https:// doi. org/ 10. 1177/ 09567 97621 10245 35 Roozenbeek, J., Maertens, R., Herzog, S., Geers, M., Kurvers, R., Sultan, M., & van der Linden, S. (2022). Susceptibility to mis- information is consistent across question framings and response modes and better explained by myside bias and partisanship than analytical thinking. Judgment and Decision Making, 17(3), 547– 573. http:// journ al. sjdm. org/ 22/ 220228/ jdm22 0228. pdf Roozenbeek, J., Maertens, R., McClanahan, W., & van der Linden, S. (2021b). Disentangling item and testing effects in inoculation research on online misinformation: Solomon revisited. Educa- tional and Psychological Measurement, 81(2), 340–362. https:// doi. org/ 10. 1177/ 00131 64420 940378 Roozenbeek, J., Schneider, C. R., Dryhurst, S., Kerr, J., Freeman, A. L. J., Recchia, G., van der Bles, A. M., & van der Linden, S. (2020). Susceptibility to misinformation about COVID-19 around the world. Royal Society Open Science, 7(10), 201199. https:// doi. org/ 10. 1098/ rsos. 201199 Roozenbeek, J., & van der Linden, S. (2019). Fake news game con- fers psychological resistance against online misinformation. Palgrave Communications, 5(1), 65. https:// doi. org/ 10. 1057/ s41599- 019- 0279-9 Roozenbeek, J., & van der Linden, S. (2020). Breaking Harmony Square: A game that “inoculates” against political misinforma- tion. Harvard Kennedy School Misinformation Review, 1(8), 1–26. https:// doi. org/ 10. 37016/ mr- 2020- 47 Rosellini, A. J., & Brown, T. A. (2021). Developing and validating clin- ical questionnaires. Annual Review of Clinical Psychology, 17, 55–81. https:// doi. org/ 10. 1146/ annur ev- clinp sy- 081219- 115343 Rosseel, Y. (2012). lavaan: An R package for structural equation mod- eling and more. Journal of Statistical Software, 48(2), 1–36. https:// doi. org/ 10. 18637/ jss. v048. i02 Said, N., Maertens, R., Jürgen, B., & Roozenbeek, J. (2023). The Manipulative Online Content Recognition Inventory (MOCRI). [Manuscript in preparation] https://doi.org/10.1007/s40271-013-0041-0 https://doi.org/10.1007/s40271-013-0041-0 https://doi.org/10.1177/0146167205280251 https://doi.org/10.1017/S0003055421000290 https://doi.org/10.1017/S0003055421000290 https://doi.org/10.1016/j.jbef.2017.12.004 https://doi.org/10.1027/1015-5759/a000602 https://doi.org/10.1027/1015-5759/a000602 https://doi.org/10.1016/j.jesp.2017.01.006 https://doi.org/10.1525/collabra.25293 https://doi.org/10.1525/collabra.25293 http://journal.sjdm.org/15/15923a/jdm15923a.pdf http://journal.sjdm.org/15/15923a/jdm15923a.pdf https://doi.org/10.1038/s41586-021-03344-2 https://doi.org/10.1038/s41586-021-03344-2 https://doi.org/10.1177/0956797620939054 https://doi.org/10.1177/0956797620939054 https://doi.org/10.1016/j.cognition.2018.06.011 https://doi.org/10.1016/j.cognition.2018.06.011 https://doi.org/10.1111/jopy.12476 https://doi.org/10.1111/jopy.12476 https://doi.org/10.1016/j.tics.2021.02.007 https://doi.org/10.1016/j.tics.2021.02.007 https://doi.org/10.4324/9781315814919 https://doi.org/10.1007/11569596_31 https://doi.org/10.1007/11569596_31 https://arxiv.org/pdf/1604.07450.pdf https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf https://doi.org/10.1027/1015-5759/a000574 https://doi.org/10.1037/0033-2909.114.3.552 https://doi.org/10.1037/a0034434 https://doi.org/10.1371/journal.pone.0122245 https://doi.org/10.1371/journal.pone.0122245 https://cran.r-project.org/package=psych https://doi.org/10.1037/pas0000754 https://doi.org/10.1037/pas0000754 https://doi.org/10.1177/0146167201272002 https://doi.org/10.31234/osf.io/b52um https://doi.org/10.31234/osf.io/b52um https://doi.org/10.1177/09567976211024535 http://journal.sjdm.org/22/220228/jdm220228.pdf https://doi.org/10.1177/0013164420940378 https://doi.org/10.1177/0013164420940378 https://doi.org/10.1098/rsos.201199 https://doi.org/10.1098/rsos.201199 https://doi.org/10.1057/s41599-019-0279-9 https://doi.org/10.1057/s41599-019-0279-9 https://doi.org/10.37016/mr-2020-47 https://doi.org/10.1146/annurev-clinpsy-081219-115343 https://doi.org/10.18637/jss.v048.i02 1899Behavior Research Methods (2024) 56:1863–1899 1 3 Sass, D. A., & Schmitt, T. A. (2010). A comparative investigation of rotation criteria within exploratory factor analysis. Multivariate Behavioral Research, 45(1), 73–103. https:// doi. org/ 10. 1080/ 00273 17090 35048 10 Satorra, A. (2000). Scaled and adjusted restricted tests in multi-sample analysis of moment structures. In Innovations in multivariate statistical analysis (pp. 233–247). Springer. https:// doi. org/ 10. 1007/ 978-1- 4615- 4603-0_ 17 Schmalbach, B., Irmer, J. P., & Schultze, M. (2019). ezCutoffs: Fit measure cutoffs in SEM. The Comprehensive R Archive Network. https:// cran.r- proje ct. org/ packa ge= ezCut offs Schumacker, R. E., Lomax, R. G., & Schumacker, R. (2015). A begin- ner’s guide to structural equation modeling (4th ed.). Routledge. https:// www. routl edge. com/A- Begin ners- Guide- to- Struc tural- Equat ion- Model ing- Fourth- Editi on/ Schum acker- Lomax- Schum acker- Lomax/p/ book/ 97811 38811 935. Accessed 10 Dec 2020. Schwartz, L. M., Woloshin, S., Black, W. C., & Welch, H. G. (1997). The role of numeracy in understanding the benefit of screening mammography. Annals of Internal Medicine, 127(11), 966–972. https:// doi. org/ 10. 7326/ 0003- 4819- 127- 11- 19971 2010- 00003 Shi, D., DiStefano, C., McDaniel, H. L., & Jiang, Z. (2018). Examining chi- square test statistics under conditions of large model size and ordinal data. Structural Equation Modeling: A Multidisciplinary Journal, 25(6), 924–945. https:// doi. org/ 10. 1080/ 10705 511. 2018. 14496 53 Simms, L. J. (2008). Classical and modern methods of psychological scale construction. Social and Personality Psychology Compass, 2(1), 414–433. https:// doi. org/ 10. 1111/j. 1751- 9004. 2007. 00044.x Sindermann, C., Elhai, J. D., Moshagen, M., & Montag, C. (2020). Age, gender, personality, ideological attitudes and individual differences in a person’s news spectrum: How many and who might be prone to “filter bubbles” and “echo chambers” online? Heliyon, 6(1), Article e03214. https:// doi. org/ 10. 1016/j. heliy on. 2020. e03214 Soto, C. J., & John, O. P. (2017). Short and extra-short forms of the Big Five Inventory–2: The BFI-2-S and BFI-2-XS. Journal of Research in Personality, 68, 69–81. https:// doi. org/ 10. 1016/j. jrp. 2017. 02. 004 Steiner, M., & Grieder, S. (2020). EFAtools: An R package with fast and flexible implementations of exploratory factor analysis tools. Journal of Open Source Software, 5(53), 2521. https:// doi. org/ 10. 21105/ joss. 02521 Strauss, M. E., & Smith, G. T. (2009). Construct validity: Advances in theory and methodology. Annual Review of Clinical Psychology, 5, 1–25. https:// doi. org/ 10. 1146/ annur ev. clinp sy. 032408. 153639 Swami, V., Chamorro-Premuzic, T., & Furnham, A. (2010). Unanswered questions: A preliminary investigation of personality and individual difference predictors of 9/11 conspiracist beliefs. Applied Cogni- tive Psychology, 24(6), 749–761. https:// doi. org/ 10. 1002/ acp. 1583 Swami, V., Furnham, A., Smyth, N., Weis, L., Lay, A., & Clow, A. (2016). Putting the stress on conspiracy theories: Examining associations between psychological stress, anxiety, and belief in conspiracy theories. Personality and Individual Differences, 99, 72–76. https:// doi. org/ 10. 1016/j. paid. 2016. 04. 084 Swire, B., Berinsky, A. J., Lewandowsky, S., & Ecker, U. K. H. (2017). Processing political misinformation: Comprehending the Trump phenomenon. Royal Society Open Science, 4(3), 160802. https:// doi. org/ 10. 1098/ rsos. 160802 Tabachnick, B. G., & Fidell, L. S. (2007). Using multivariate sta- tistics (5th ed.). Pearson. https:// psycn et. apa. org/ record/ 2006- 03883- 000 Thalmayer, A. G., Saucier, G., & Eigenhuis, A. (2011). Compara- tive validity of brief to medium-length Big Five and Big Six Personality Questionnaires. Psychological Assessment, 23(4), 995–1009. https:// doi. org/ 10. 1037/ a0024 165 Thurstone, L. L. (1944). Second-order factors. Psychometrika, 9(2), 71–100. https:// doi. org/ 10. 1007/ BF022 88715 Uenal, F., Sidanius, J., Maertens, R., Hudson, S. K. T., Davis, G., & Ghani, A. (2022). The roots of ecological dominance orientation: Assessing individual preferences for an anthropocentric and hier- archically organized world. Journal of Environmental Psychol- ogy, 81, 101783. https:// doi. org/ 10. 1016/j. jenvp. 2022. 101783 Van Bavel, J. J., Harris, E. A., Pärnamets, P., Rathje, S., Doell, K., & Tucker, J. A. (2020). Political psychology in the digital (mis) information age: A model of news belief and sharing. PsyArXiv. https:// doi. org/ 10. 31234/ osf. io/ u5yts van der Linden, S., Leiserowitz, A., Rosenthal, S., & Maibach, E. (2017). Inoculating the public against misinformation about cli- mate change. Global Challenges, 1(2), 1600008. https:// doi. org/ 10. 1002/ gch2. 20160 0008 van der Linden, S., & Roozenbeek, J. (2020). Psychological inoculation against fake news. In R. Greifeneder, M. Jaffé, E. J. Newman, & N. Schwarz (Eds.), The psychology of fake news: Accepting, sharing, and correcting misinformation. Routledge https:// www. routl edge. com/p/ book/ 97803 67271 831 van der Linden, S., Roozenbeek, J., Maertens, R., Basol, M., Kácha, O., Rathje, S., & Traberg, C. S. (2021). How can psychological science help counter the spread of fake news? The Spanish Jour- nal of Psychology, 24, e25. https:// doi. org/ 10. 1017/ SJP. 2021. 23 Van Der Maas, H. L., Dolan, C. V., Grasman, R. P., Wicherts, J. M., Huizenga, H. M., & Raijmakers, M. E. (2006). A dynamical model of general intelligence: The positive manifold of intel- ligence by mutualism. Psychological Review, 113(4), 842–861. https:// doi. org/ 10. 1037/ 0033- 295X. 113.4. 842 van Prooijen, J.-W., Krouwel, A. P. M., & Pollet, T. V. (2015). Political extremism predicts belief in conspiracy theories. Social Psycho- logical and Personality Science, 6(5), 570–578. https:// doi. org/ 10. 1177/ 19485 50614 567356 Von Neumann, J. (1927). Wahrscheinlichkeitstheoretischer Aufbau der Quantenmechanik. Nachrichten von Der Gesellschaft Der Wis- senschaften Zu Göttingen, Mathematisch-Physikalische Klasse, 1927, 245–272. http:// eudml. org/ doc/ 59230 Vosoughi, S., Roy, D., & Aral, S. (2018). The spread of true and false news online. Science, 359(6380), 1146–1151. https:// doi. org/ 10. 1126/ scien ce. aap95 59 Weiner, I. B., Schinka, J. A., & Velicer, W. F. (2012). Handbook of psychology: Research methods in psychology (2nd ed., Vol. 2). John Wiley & Sons Woolf, M. (2019) How to make custom AI-generated text with GPT-2. Max Woolf’s Blog. https:// minim axir. com/ 2019/ 09/ howto- gpt2/ Worthington, R. L., & Whittaker, T. A. (2006). Scale development research: A content analysis and recommendations for best prac- tices. The Counseling Psychologist, 34(6), 806–838. https:// doi. org/ 10. 1177/ 00110 00006 288127 Zickar, M. J. (2020). Measurement development and evaluation. Annual Review of Organizational Psychology and Organiza- tional Behavior, 7, 213–232. https:// doi. org/ 10. 1146/ annur ev- orgps ych- 012119- 044957 Open Practices Statement: Availability of data, code materials (data transparency) The supplements, data, and analysis scripts that support this paper’s findings, including Qualtrics files, analysis code, raw and clean datasets, and all research materials, are openly available on the Open Science Framework (OSF) at https:// osf. io/ r7phc/. Preregistrations are available on AsPredicted at https:// aspre dicted. org/ m7vb3. pdf (Study 1, T1), https:// aspre dicted. org/ js2jz. pdf (Study 1, T2), and https:// aspre dicted. org/ nx7xu. pdf (Study 2B). Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. https://doi.org/10.1080/00273170903504810 https://doi.org/10.1080/00273170903504810 https://doi.org/10.1007/978-1-4615-4603-0_17 https://doi.org/10.1007/978-1-4615-4603-0_17 https://cran.r-project.org/package=ezCutoffs https://www.routledge.com/A-Beginners-Guide-to-Structural-Equation-Modeling-Fourth-Edition/Schumacker-Lomax-Schumacker-Lomax/p/book/9781138811935 https://www.routledge.com/A-Beginners-Guide-to-Structural-Equation-Modeling-Fourth-Edition/Schumacker-Lomax-Schumacker-Lomax/p/book/9781138811935 https://www.routledge.com/A-Beginners-Guide-to-Structural-Equation-Modeling-Fourth-Edition/Schumacker-Lomax-Schumacker-Lomax/p/book/9781138811935 https://doi.org/10.7326/0003-4819-127-11-199712010-00003 https://doi.org/10.1080/10705511.2018.1449653 https://doi.org/10.1111/j.1751-9004.2007.00044.x https://doi.org/10.1016/j.heliyon.2020.e03214 https://doi.org/10.1016/j.jrp.2017.02.004 https://doi.org/10.21105/joss.02521 https://doi.org/10.21105/joss.02521 https://doi.org/10.1146/annurev.clinpsy.032408.153639 https://doi.org/10.1002/acp.1583 https://doi.org/10.1016/j.paid.2016.04.084 https://doi.org/10.1098/rsos.160802 https://doi.org/10.1098/rsos.160802 https://psycnet.apa.org/record/2006-03883-000 https://psycnet.apa.org/record/2006-03883-000 https://doi.org/10.1037/a0024165 https://doi.org/10.1007/BF02288715 https://doi.org/10.1016/j.jenvp.2022.101783 https://doi.org/10.31234/osf.io/u5yts https://doi.org/10.1002/gch2.201600008 https://doi.org/10.1002/gch2.201600008 https://www.routledge.com/p/book/9780367271831 https://www.routledge.com/p/book/9780367271831 https://doi.org/10.1017/SJP.2021.23 https://doi.org/10.1037/0033-295X.113.4.842 https://doi.org/10.1177/1948550614567356 https://doi.org/10.1177/1948550614567356 http://eudml.org/doc/59230 https://doi.org/10.1126/science.aap9559 https://doi.org/10.1126/science.aap9559 https://minimaxir.com/2019/09/howto-gpt2/ https://doi.org/10.1177/0011000006288127 https://doi.org/10.1177/0011000006288127 https://doi.org/10.1146/annurev-orgpsych-012119-044957 https://doi.org/10.1146/annurev-orgpsych-012119-044957 https://osf.io/r7phc/ https://aspredicted.org/m7vb3.pdf https://aspredicted.org/m7vb3.pdf https://aspredicted.org/js2jz.pdf https://aspredicted.org/nx7xu.pdf The Misinformation Susceptibility Test (MIST): A psychometrically validated measure of news veracity discernment Abstract Inconsistent interpretation and the need for a new measurement instrument The present research Towards a universal conceptualization and measurement: The Verification done framework The Misinformation Susceptibility Test Study 1: Development—Scale construction, exploratory analyses, and psychometric properties Method Preparatory steps Phase 1: Item generation Implementation Procedure, measures, transparency, and openness Analytical strategy 1: Exploratory factor analysis (EFA) and item response theory (IRT) Analytical strategy 2: Exploratory graph analysis (EGA) Results EFAIRT results EGA results Discussion Study 2: Validation—Confirmatory analyses, nomological net, and national norms Method: MIST-20MIST-8 Participants Procedure and measures Analytical strategy Method: MIST-16 Participants Analytical strategy Results: MIST-20MIST-8 Internal consistency Nomological network26 National norms Results: MIST-16 Discussion Study 3: Application—A nuanced effectiveness evaluation of a popular media literacy intervention Method Participants Procedure and measures Analytical strategy Results Baseline Hypothesis tests Discussion General discussion Implementation Open-Source web application Limitations and future research Conclusion Author note References