Vol.:(0123456789)1 3

Behavior Research Methods (2024) 56:1863–1899 
https://doi.org/10.3758/s13428-023-02124-2

The Misinformation Susceptibility Test (MIST): A psychometrically 
validated measure of news veracity discernment

Rakoen Maertens1  · Friedrich M. Götz2  · Hudson F. Golino3  · Jon Roozenbeek1  · Claudia R. Schneider1  · 
Yara Kyrychenko1  · John R. Kerr1  · Stefan Stieger4  · William P. McClanahan1,5  · Karly Drabot1  · James He1  · 
Sander van der Linden1 

Accepted: 5 April 2023 / Published online: 29 June 2023 
© The Author(s) 2023

Abstract
Interest in the psychology of misinformation has exploded in recent years. Despite ample research, to date there is no validated 
framework to measure misinformation susceptibility. Therefore, we introduce Verification done, a nuanced interpretation schema 
and assessment tool that simultaneously considers Veracity discernment, and its distinct, measurable abilities (real/fake news 
detection), and biases (distrust/naïvité—negative/positive judgment bias). We then conduct three studies with seven independent 
samples (Ntotal = 8504) to show how to develop, validate, and apply the Misinformation Susceptibility Test (MIST). In Study 1 
(N = 409) we use a neural network language model to generate items, and use three psychometric methods—factor analysis, item 
response theory, and exploratory graph analysis—to create the MIST-20 (20 items; completion time < 2 minutes), the MIST-16 
(16 items; < 2 minutes), and the MIST-8 (8 items; < 1 minute). In Study 2 (N = 7674) we confirm the internal and predictive 
validity of the MIST in five national quota samples (US, UK), across 2 years, from three different sampling platforms—Respondi, 
CloudResearch, and Prolific. We also explore the MIST’s nomological net and generate age-, region-, and country-specific norm 
tables. In Study 3 (N = 421) we demonstrate how the MIST—in conjunction with Verification done—can provide novel insights 
on existing psychological interventions, thereby advancing theory development. Finally, we outline the versatile implementations 
of the MIST as a screening tool, covariate, and intervention evaluation framework. As all methods are transparently reported and 
detailed, this work will allow other researchers to create similar scales or adapt them for any population of interest.

Keywords Misinformation susceptibility · Automated item generation · Fake news · Neural networks · Psychometrics

The global spread of misinformation has had a palpable 
negative impact on society. For instance, conspiracy theories 

about the coronavirus disease 2019 (COVID-19) vaccines 
have been linked to increased vaccine hesitancy and a decline 
in vaccination intentions (Hotez et al., 2021; Loomba et al., 
2021; Roozenbeek et al., 2020). Misinformation about the 
impact of 5G has led to the vandalization of cell phone masts 
(Jolley & Paterson, 2020), and misinformation about climate 
change has been associated with a reduction in perceptions of 
scientific consensus (Maertens et al., 2020; van der Linden 
et al., 2017). With false and moral-emotional media spread-
ing faster and deeper than more accurate and nuanced content 
(Brady et al., 2017; Vosoughi et al., 2018), the importance of 
information veracity has become a central debate for scholars 
and policymakers (Lewandowsky et al., 2017, 2020).1

Rakoen Maertens and Friedrich M. Götz contributed equally to this work.

 * Rakoen Maertens 
 rm938@cam.ac.uk

 Friedrich M. Götz 
 friedrich.goetz@ubc.ca

1 Department of Psychology, University of Cambridge, 
Downing Street, CB2 3EB Cambridge, Cambridgeshire, UK

2 Department of Psychology, University of British Columbia, 
2136 West Mall, Vancouver, BC V6T 1Z4, Canada

3 University of Virginia, Charlottesville, VA, USA
4 Karl Landsteiner University of Health Sciences, 

Krems an der Donau, Austria
5 Max Planck Institute for the Study of Crime, Security 

and Law, Freiburg im Breisgau, Germany

1 It should be noted that recent research also provides evidence for 
an alternative perspective, namely that the spread of misinforma-
tion could be driven more by an emotional dimension than a veracity 
dimension (Cinelli et al., 2020).

http://orcid.org/0000-0001-8507-5359
http://orcid.org/0000-0001-8900-6844
http://orcid.org/0000-0002-1601-1447
http://orcid.org/0000-0002-8150-9305
http://orcid.org/0000-0002-6612-5186
http://orcid.org/0000-0003-0636-5046
http://orcid.org/0000-0002-6606-5507
http://orcid.org/0000-0002-7784-6624
http://orcid.org/0000-0002-6604-3842
http://orcid.org/0000-0002-1769-8893
http://orcid.org/0000-0002-1859-4914
http://orcid.org/0000-0002-0269-1744
http://crossmark.crossref.org/dialog/?doi=10.3758/s13428-023-02124-2&domain=pdf


1864 Behavior Research Methods (2024) 56:1863–1899

1 3

Accordingly, across disciplines, research on the pro-
cesses behind, impact of, and interventions against misin-
formation has surged over the past years (for recent reviews, 
see Pennycook & Rand, 2021; Roozenbeek et al., 2023; Van 
Bavel, Harris, et al., 2020; van der Linden et al., 2021). 
Researchers have made progress in designing media and 
information literacy interventions in the form of educational 
games (Basol et al., 2021; Roozenbeek & van der Linden, 
2019, 2020), “accuracy” primes (Pennycook et al., 2021b; 
Pennycook et al., 2020), introducing friction (Fazio, 2020), 
and inoculation messages (Lewandowsky & van der Linden, 
2021). Crucially, however, no theoretical framework exists 
for a nuanced evaluation of misinformation susceptibility, 
nor a psychometrically validated measurement that provides 
a reliable measure across studies.

Inconsistent interpretation and the need 
for a new measurement instrument

Despite the plethora of research papers on the psychology of 
misinformation, the field has not converged on a standard-
ized way of defining or measuring people’s susceptibility to 
misinformation. In the absence of such a commonly agreed-
upon standard, scholars have been inventive in the way that 
they employ individually constructed misinformation tests, 
often with the best intentions to create a good scale, but typi-
cally without formal validation (e.g., Pennycook, Epstein, 
et al., 2021b; Roozenbeek et al., 2021b).

The extent of the problem becomes evident when exam-
ining how researchers develop their test items and report 
the success of their models or interventions. Typically, 
researchers create (based on commonly used misinforma-
tion techniques; e.g., Maertens et al., 2021; Roozenbeek 
& van der Linden, 2019) or select (from a reliable fact-
check database; e.g., Cook et al., 2017; Guess et al., 2020; 
Pennycook et al., 2020; Pennycook & Rand, 2019; Swire 
et al., 2017; van der Linden et al., 2017) news headlines 
or social media posts, where participants rate the relia-
bility, intentions to share, accuracy, or manipulativeness 
of these items on a Likert or binary (e.g., true vs. false) 
scale; for an extensive discussion, see Roozenbeek et al. 
(2022). Sometimes the news items are presented as plain-
text statements (e.g., Roozenbeek et al., 2020), while in 
other studies researchers present headlines together with an 
image, source, and lede sentence (e.g., Pennycook & Rand, 
2019). The true-to-false ratio often differs, where in some 
studies only false news items are presented (e.g., Roozen-
beek et al., 2020), and in others this is an unbalanced (e.g., 
Roozenbeek et al., 2021b) or balanced (e.g., Pennycook & 
Rand, 2019) ratio of true and false items. Often an index 
score is created by taking the average of all item ratings 
(an index score reflecting general belief in false or true 

news items; e.g., Maertens et al., 2021), or by calculating 
the difference between ratings of true items and false items 
(veracity discernment; e.g., Pennycook, McPhetres, et al., 
2020). Finally, an effect size is calculated, and a claim is 
made with respect to the effectiveness of the intervention, 
based on a change in false news ratings (e.g., Roozenbeek 
& van der Linden, 2019), a combined change in true news 
ratings and false news ratings (e.g., Guess et al., 2020), 
or even a change in true news ratings only (Pennycook, 
McPhetres et al., 2020).

It becomes clear that the wide variation in methodolo-
gies makes it hard to compare studies or generalize con-
clusions beyond the studies themselves. Little is known 
about the psychometric properties of these ad hoc scales 
and whether or not they measure a latent trait. As a wide-
spread practice in misinformation research, scholars often 
assume—rather than know—that they are measuring the 
same construct. As a result, if this bold assumption turned 
out to be untrue, we would be at risk of obscuring underly-
ing phenomena by incorrectly labeling them as the same 
mechanism, thereby engaging in an illusory essence bias 
(Brick et al., 2022) and/or falling prey to jingle fallacies 
(Block, 1995; Condon et al., 2020). As misinformation is a 
complex issue, the responses on one item set may be a result 
of motivational factors, while responses on another scale 
may be more reflective of critical thinking skills, instead 
of both measuring the same “discernment skill.” We cur-
rently do not know how different misinformation suscep-
tibility scales are related, or how the true-to-false ratios 
influence their outcome (Aird et al., 2018) and how much 
of the effects found are due to response biases rather than 
changes in skill (Batailler et al., 2022). The limited stud-
ies that do look at the issue of scale-specific effects show 
significant item effects, indicating a risk of skewed con-
clusions about intervention effect sizes (e.g., Roozenbeek, 
Maertens et al., 2021b).2 Relatedly, whether the sampling 
of test items, their presentation, and response modes have a 
high ecological validity is often not discussed (Dhami et al., 
2004; Roozenbeek et al., 2022), and little is known about 
the nomological net and reliability of the indices used. In 
other words, it is difficult to disentangle whether differences 
between studies are due to differences in the interpretation 
schema, the measurement instrument, or actual differences 
in misinformation susceptibility. This indicates a clear need 
for a unified theoretical framework in conjunction with a 
standardized instrument with strong internal and external 
validity.

2 While there are models that take into account the baseline plausi-
bility of each item, they still do not reveal what construct each item 
is measuring. In other words, there may still be unexplained variabil-
ity even when controlling for baseline plausibility, such as issues with 
item stability, and different effect sizes between item sets in interven-
tion studies.


1865Behavior Research Methods (2024) 56:1863–1899 

1 3

The present research

Towards a universal conceptualization 
and measurement: The Verification done framework

Here, we set out to create a theoretical interpretation schema 
as well as a first psychometrically validated measurement 
instrument that, in conjunction, resolve the issues mentioned 
above and offer utility for a wide range of scholars. We 
extend the current literature by providing the first psycho-
metrically integrated conceptualization of misinformation 
susceptibility that allows for a reliable holistic measurement 
through the Verification done framework: we can only fully 
interpret misinformation susceptibility—or the impact of an 
intervention—by capturing news veracity discernment (V, 
ability to accurately distinguish real news from fake news) as 
a general factor, the specific facets real news detection abil-
ity (r, ability to correctly identify real news) and fake news 
detection ability (f, ability to correctly identify fake news), 
distrust (d; negative judgment bias, being overly skeptical), 
and naïvité (n; positive judgment bias, being overly gulli-
ble), and comparing V, r, f, d, and n alongside each other. A 
visualization of the Verification done model can be found in 
Fig. 1. For example, two different interventions may increase 
discernment ability V to a similar extent, but intervention 
A might do so by increasing detection ability r, while inter-
vention B may accomplish the same by increasing detection 
ability f. Similarly, two people with the same discernment 
ability V may have opposite r and f abilities. Changes in 
detection abilities r or f after an intervention have to be 

interpreted together with changes in judgment biases d 
and n to determine whether the intervention has done more 
than just increase a judgment bias. Existing interventions 
often look at a limited subset of these five dimensions; for 
example, the creators of the Bad News Game intervention 
(Roozenbeek & van der Linden, 2019) originally focused on 
fake news detection, including only a few real news items. 
Meanwhile, the accuracy nudge intervention seems to work 
mainly by addressing real news detection (Pennycook, 
McPhetres, et al. 2020), although we are not sure about the 
judgment biases. Another media literacy intervention was 
found to increase general distrust, but showed improvement 
on veracity discernment nevertheless (Guess et al., 2020).

In order to be able to compare these scores and gain 
insights into the complete picture, we need to employ the 
Verification done framework, but also make sure that each 
scale has high validity and comparability. To accomplish 
this, through a series of three studies and using a novel neu-
ral-network-based item generation approach, we develop 
the Misinformation Susceptibility Test (MIST): a psycho-
metrically validated (based on classical test theory and 
item response theory, as well as exploratory graph analysis) 
measurement instrument. The MIST was developed to be the 
first truly balanced misinformation susceptibility measure 
with an equal emphasis on discernment, real news detec-
tion, fake news detection, and judgment bias. In addition, to 
put the results into perspectives, all scores should be inter-
preted along with national norm tables. In the present study, 
we describe how we developed and validated the MIST to 
accomplish these goals, evaluate each of these dimensions, 

Fig. 1  Visualization of the Verification done model


1866 Behavior Research Methods (2024) 56:1863–1899

1 3

and investigate the practical utility of the MIST for research-
ers and practitioners in the field.

The Misinformation Susceptibility Test

We conduct three studies to develop, validate, and apply the 
MIST. In Study 1 (N = 409), we employ a multitude of explor-
atory factor analysis (EFA)- and item response theory (IRT)-
based selection criteria to create a 20-item MIST full-scale 
and an 8-item MIST short-scale from a larger item pool that 
was built using a combination of advanced language-based 
neural network algorithms and real news headline extraction 
from reliable and unbiased media outlets, and then pre-filtered 
through multiple iterations of expert review. The resultant 
MIST scales are balanced (50% real, 50% fake), binary (real/
fake), cumulatively scored instruments that ask participants 
to rate presented news headlines as either true or false, with 
higher MIST scores indicating greater discernment ability.3 
We also present a new, alternative method to EFA and IRT, 
namely exploratory graph analysis (EGA; Golino & Epskamp, 
2017; Golino et al., 2021), to show how modern psychomet-
rics may lead to other robust item selections.

We acknowledge that the typical news consumption diet 
in real life includes more real news than fake news (e.g., 
Guess et al., 2020). However, as misinformation has the 
potential to spread faster (Brady et al., 2017; Vosoughi et al., 
2018), and we aim to accurately measure a general discern-
ment ability as well as both real news detection and fake 
news detection, in creating the MIST we have given equal 
representation on both facets. This allows us to generalize 
across the board—independent of an individual’s news con-
sumption ratio. Meanwhile, to capture any biases related to 
overly positive or negative responses (to news in general), 
we have later added a method to calculate response biases d 
and n (these were not part of the original scale development 
protocol). As such, the MIST exhibits a psychometrically 
validated higher-order structure, with two validated first-
order factors r and f (i.e., real news detection, fake news 
detection) and one general ability second-order factor V 
(i.e., veracity discernment), as well as a method to calculate 
response biases d (i.e., distrust) and n (i.e., naïvité).4

In Study 2 (N = 7674), we employ confirmatory factor 
analyses (CFA), as well as EGA, to replicate the MIST’s 
structure across four national quota samples from the UK 
and the US, establish construct validity via a large, prereg-
istered nomological network, and derive norm tables for the 
general populations of the UK and US and demographic and 
geographical subgroups.

In Study 3 (N = 421), we provide an example of how to 
implement Verification done and the MIST in the field by 
applying it in the naturalistic setting of a well-replicated media 
literacy intervention, the Bad News Game (https:// www. getba 
dnews. com/). Whereas ample prior studies have attested to the 
theoretical mechanisms and effects that contribute to the Bad 
News Game’s effectiveness in reducing misinformation sus-
ceptibility (see, e.g., Maertens et al., 2021; Roozenbeek & van 
der Linden, 2019), within-subject repeated-measures analy-
ses of the MIST-8 for pre-and post-game tests in conjunction 
with the Verification done framework reveal important new 
insights about how the intervention affects people across dif-
ferent evaluative dimensions. This paper demonstrates the ben-
efits of integrated theory and assessment development, result-
ing in a framework providing nuanced, multifaceted insights 
that can be gained from a short, versatile, psychometrically 
sound, and easy-to-administer new measure. Table 1 offers a 
comprehensive summary of all samples used, detailing their 
size, demographic breakdowns, included measures, country of 
origin, recruitment platform, and whether or not they (a) used 
nationally representative quota and (b) were preregistered.

Study 1: Development—Scale construction, 
exploratory analyses, and psychometric 
properties

Following classic (Clark & Watson, 1995; Loevinger, 1957) 
and recent (Boateng et al., 2018; Rosellini & Brown, 2021; 
Zickar, 2020) psychometrics guidelines, and taking into 
account insights from misinformation scholars (Pennycook 
et al., 2021a; Roozenbeek et al., 2021b), we devised a four-
stage, preregistered scale development protocol (i.e., 1—
item generation, 2—expert filtering, 3—quality control, and 
4—data-driven selection), shown in Fig. 2.

Method

Preparatory steps

Phase 1: Item generation

Fake news There is a debate in the literature on whether the 
misinformation items administered in misinformation stud-
ies should be actual news items circulating in society, or news 

3 We chose the binary coding approach (i.e., true versus false head-
line) because it allows us to create a straightforward and easy-to-inter-
pret structure with either a correct or an incorrect response for each 
item, which is also easy to implement and analyze in a performance-
based IRT model, without compromising on quality (e.g., in Studies 
1–2 we validated the MIST with items that are administered with Lik-
ert scales, providing evidence for its broader predictive validity).
4 Note that distrust and naïvité were not included in the psychomet-
ric scale development protocol, but only added later on as a post hoc 
calculation. The factor structure used for the scale development using 
EFA/IRT analyses can be found in Fig. 9, and the structure used for 
the EGA-based scale can be found in Fig. 7.

https://www.getbadnews.com/
https://www.getbadnews.com/


1867Behavior Research Methods (2024) 56:1863–1899 

1 3

Ta
bl

e 
1 

 S
um

m
ar

y 
of

 sa
m

pl
es

St
ud

y 
1:

 D
ev

el
-

op
m

en
t

St
ud

y 
2:

 V
al

id
at

io
n

St
ud

y 
3:

  
A

pp
lic

at
io

n

Sa
m

pl
e

1A
2A

2B
2C

2D
2E

3
N

40
9

34
79

51
0

12
27

12
45

12
13

42
1

C
ou

nt
ry

 o
f 

or
ig

in
U

SA
U

SA
U

SA
U

K
U

K
U

SA
U

SA

N
at

io
na

lly
 

re
pr

es
en

ta
-

tiv
e 

qu
ot

a

N
o

Ye
s

Ye
s

Ye
s

Ye
s

Ye
s

N
o

Re
cr

ui
tm

en
t 

pl
at

fo
rm

Pr
ol

ifi
c

Re
sp

on
di

C
lo

ud
Re

se
ar

ch
Re

sp
on

di
Pr

ol
ifi

c
Re

sp
on

di
Ba

d 
Ne

ws
  

G
am

e
Pr

er
eg

ist
ra

tio
n

Ye
s

N
o

Ye
s

N
o

N
o

N
o

N
o

D
em

og
ra

ph
ic

 
co

m
po

si
tio

n
A

ge
M

ag
e =

 3
3.

20
SD

ag
e =

 1
1.

85

A
ge

M
ag

e =
 4

5.
10

SD
ag

e =
 1

6.
16

A
ge

M
ag

e =
 4

9.
25

SD
ag

e =
 1

6.
96

A
ge

M
ag

e =
 4

5.
34

SD
ag

e =
 1

6.
52

A
ge

M
ag

e =
 4

4.
66

SD
ag

e =
 1

5.
65

A
ge

M
ag

e =
 4

5.
21

SD
ag

e =
 1

7.
35

A
ge

55
.5

8%
 [1

8,
  

29
]

32
.3

0%
 [3

0,
  

49
]

12
.1

1%
 [5

0,
  

99
]

G
en

de
r

55
.5

0%
 fe

m
al

e
42

.3
0%

 m
al

e
2.

20
%

 o
th

er
/n

on
-

bi
na

ry

G
en

de
r

51
.1

1%
 fe

m
al

e
48

.8
4%

 m
al

e
0.

06
%

 o
th

er
/n

on
bi

na
ry

G
en

de
r

55
.8

8%
 fe

m
al

e
43

.5
3%

 m
al

e
0.

59
%

 o
th

er
/n

on
bi

na
ry

G
en

de
r

51
.6

7%
 fe

m
al

e
48

.3
3%

 m
al

e
0.

00
%

 o
th

er
/n

on
-

bi
na

ry

G
en

de
r

52
.5

3%
 fe

m
al

e
47

.0
7%

 m
al

e
0.

40
%

 o
th

er
/n

on
-

bi
na

ry

G
en

de
r

54
.0

0%
 fe

m
al

e
44

.1
9%

 m
al

e
1.

81
%

 n
on

bi
na

ry

G
en

de
r

52
.0

2%
 fe

m
al

e
41

.0
9%

 m
al

e
6.

89
%

 o
th

er
/

no
nb

in
ar

y
Et

hn
ic

ity
–

Et
hn

ic
ity

76
.8

9%
 W

hi
te

, C
au

ca
si

an
, 

A
ng

lo
, o

r E
ur

op
ea

n 
A

m
er

ic
an

8.
39

%
 A

si
an

 o
r A

si
an

 A
m

er
i-

ca
n

6.
00

%
 H

is
pa

ni
c 

or
 L

at
in

o
5.

98
%

 B
la

ck
 o

r A
fr

ic
an

 
A

m
er

ic
an

1.
12

%
 N

at
iv

e 
A

m
er

ic
an

 o
r 

A
la

sk
an

 N
at

iv
e

0.
54

%
 M

id
dl

e 
Ea

ste
rn

0.
30

%
 H

aw
ai

ia
n 

or
 P

ac
ifi

c 
Is

la
nd

er
0.

77
%

 O
th

er
/P

re
fe

r n
ot

 to
 

an
sw

er

Et
hn

ic
ity

68
.8

1%
 W

hi
te

, C
au

ca
si

an
, A

ng
lo

, o
r 

Eu
ro

pe
an

 A
m

er
ic

an
4.

28
%

 A
si

an
 o

r A
si

an
 A

m
er

ic
an

11
.0

5%
 H

is
pa

ni
c 

or
 L

at
in

o
12

.1
2%

 B
la

ck
 o

r A
fr

ic
an

 A
m

er
ic

an
2.

50
%

 N
at

iv
e 

A
m

er
ic

an
 o

r A
la

sk
an

 
N

at
iv

e
0.

18
%

 M
id

dl
e 

Ea
ste

rn
1.

07
%

 O
th

er
/P

re
fe

r n
ot

 to
 a

ns
w

er

Et
hn

ic
ity

87
.3

3%
 W

hi
te

6.
95

%
 A

si
an

2.
45

%
 B

la
ck

0.
08

%
 A

ra
b

2.
13

%
 M

ix
ed

1.
06

%
 O

th
er

Et
hn

ic
ity

86
.1

0%
 W

hi
te

7.
47

%
 A

si
an

3.
53

%
 B

la
ck

0.
16

%
 A

ra
b

1.
61

%
 M

ix
ed

1.
12

%
 O

th
er

Et
hn

ic
ity

–
Et

hn
ic

ity
–


1868 Behavior Research Methods (2024) 56:1863–1899

1 3

Ta
bl

e 
1 

 (c
on

tin
ue

d) St
ud

y 
1:

 D
ev

el
-

op
m

en
t

St
ud

y 
2:

 V
al

id
at

io
n

St
ud

y 
3:

  
A

pp
lic

at
io

n

Ed
uc

at
io

n
1.

47
%

 L
es

s t
ha

n 
hi

gh
 sc

ho
ol

 
de

gr
ee

9.
29

%
 H

ig
h 

sc
ho

ol
 g

ra
du

at
e

31
.3

0%
 S

om
e 

co
lle

ge
 b

ut
 n

o 
de

gr
ee

38
.8

8%
 B

ac
h-

el
or

's 
de

gr
ee

 in
 

co
lle

ge
1.

96
%

 P
ro

fe
s-

si
on

al
 d

eg
re

e
13

.4
5%

 M
as

te
r's

 
de

gr
ee

3.
67

%
 D

oc
to

ra
l 

de
gr

ee

Ed
uc

at
io

n
1.

74
%

 D
id

 n
ot

 c
om

pl
et

e 
hi

gh
 

sc
ho

ol
34

.9
8%

 H
ig

h 
sc

ho
ol

 d
eg

re
e 

or
 e

qu
iv

al
en

t
15

.0
8%

 A
ss

oc
ia

te
 d

eg
re

e
31

.8
4%

 D
eg

re
e 

(b
ac

he
lo

r’s
) 

or
 e

qu
iv

al
en

t
15

.1
1%

 D
eg

re
e 

(m
as

te
r’s

) o
r 

ot
he

r p
os

tg
ra

du
at

e 
qu

al
i-

fic
at

io
n

1.
25

%
 D

oc
to

ra
te

0.
97

%
 O

th
er

/P
re

fe
r n

ot
 to

 sa
y

Ed
uc

at
io

n
2.

55
%

 L
es

s t
ha

n 
hi

gh
 sc

ho
ol

 d
eg

re
e

25
.1

0%
 H

ig
h 

sc
ho

ol
 g

ra
du

at
e

27
.4

5%
 S

om
e 

co
lle

ge
 b

ut
 n

o 
de

gr
ee

26
.0

8%
 B

ac
he

lo
r's

 d
eg

re
e 

in
 c

ol
le

ge
1.

57
%

 P
ro

fe
ss

io
na

l d
eg

re
e

13
.9

2%
 M

as
te

r's
 d

eg
re

e
3.

33
%

 D
oc

to
ra

l d
eg

re
e

Ed
uc

at
io

n
11

.0
3%

 N
o 

fo
rm

al
 

ed
uc

at
io

n 
ab

ov
e 

ag
e 

16
16

.1
8%

 P
ro

fe
s-

si
on

al
 o

r t
ec

hn
ic

al
 

qu
al

ifi
ca

tio
ns

 
ab

ov
e 

ag
e 

16
27

.1
2%

 S
ch

oo
l 

ed
uc

at
io

n 
up

 to
 

ag
e 

18
31

.9
4%

 D
eg

re
e 

(b
ac

he
lo

r’s
) o

r 
eq

ui
va

le
nt

12
.0

9%
 D

eg
re

e 
(m

as
te

r’s
) o

r 
ot

he
r p

os
tg

ra
du

-
at

e 
qu

al
ifi

ca
tio

n
1.

63
%

 D
oc

to
ra

te

Ed
uc

at
io

n
6.

27
%

 N
o 

fo
rm

al
 

ed
uc

at
io

n 
ab

ov
e 

ag
e 

16
10

.6
8%

 P
ro

fe
s-

si
on

al
 o

r t
ec

hn
i-

ca
l q

ua
lifi

ca
tio

ns
 

ab
ov

e 
ag

e 
16

25
.2

2%
 S

ch
oo

l 
ed

uc
at

io
n 

up
 to

 
ag

e 
18

38
.6

3%
 D

eg
re

e 
(b

ac
he

lo
r’s

) o
r 

eq
ui

va
le

nt
16

.8
7%

 D
eg

re
e 

(m
as

te
r’s

) o
r 

ot
he

r p
os

tg
ra

du
-

at
e 

qu
al

ifi
ca

tio
n

2.
33

%
 D

oc
to

ra
te

Ed
uc

at
io

n
2.

72
%

  L
es

s t
ha

n 
hi

gh
 

sc
ho

ol
 d

eg
re

e
24

.7
3%

 H
ig

h 
sc

ho
ol

 
gr

ad
ua

te
 o

r e
qu

iv
al

en
t

19
.7

9%
 S

om
e 

co
lle

ge
, 

bu
t n

o 
de

gr
ee

11
.5

4%
 A

ss
oc

ia
te

 
de

gr
ee

 in
 c

ol
le

ge
, 

2-
ye

ar
25

.7
2%

  B
ac

he
lo

r’s
 

de
gr

ee
 in

 c
ol

le
ge

, 
4-

ye
ar

12
.4

5%
 M

as
te

r’s
 d

eg
re

e
2.

14
%

 P
ro

fe
ss

io
na

l 
de

gr
ee

, J
D

, M
D

0.
91

%
 D

oc
to

ra
l d

eg
re

e

Ed
uc

at
io

n
14

.4
9%

 H
ig

h 
sc

ho
ol

 o
r l

es
s

36
.1

0%
 S

om
e 

co
lle

ge
49

.4
1%

 H
ig

he
r 

de
gr

ee

M
ea

su
re

d 
co

ns
tru

ct
s

- M
IS

T-
10

0
- B

SR
- C

M
Q

- C
O

VI
D

-1
9 

co
m

-
pl

ia
nc

e
- C

RT
 

- D
EP

IC
T

- C
V1

9 
fa

ct
-c

he
ck

- M
IS

T-
20

 (i
nc

l. 
M

IS
T-

8)
- A

O
T

- A
nt

i-v
ac

ci
na

tio
n 

at
tit

ud
es

- C
O

VI
D

-1
9 

m
is

in
fo

rm
at

io
n 

be
lie

fs
-C

RT
 

- N
um

er
ac

y
- P

ol
iti

ca
l i

de
ol

og
y

- T
ru

st
 (i

n 
sc

ie
nt

is
ts

, j
ou

rn
al

-
is

ts
, p

ol
iti

ci
an

s, 
th

e 
go

ve
rn

-
m

en
t)

- M
IS

T-
20

 (i
nc

l. 
M

IS
T-

8)
- B

SR
- B

FI
2-

S
- C

M
Q

- E
D

O
- D

EP
IC

T 
SF

- G
o 

Vi
ra

l!
- M

FQ
20

- S
D

4
- S

D
O

- S
IN

S
- S

IS
ES

- S
IR

IS
- S

SP
C

- T
ru

st
 (i

n 
m

ed
ic

al
 p

er
so

nn
el

, s
ci

en
-

tis
ts

, p
ol

iti
ci

an
s, 

jo
ur

na
lis

ts
, t

he
 

go
ve

rn
m

en
t, 

sc
ie

nt
ifi

c 
kn

ow
le

dg
e,

 
ci

vi
l s

er
va

nt
s, 

m
ai

ns
tre

am
 m

ed
ia

)

- M
IS

T-
20

 (i
nc

l. 
M

IS
T-

8)
- N

um
er

ac
y

- P
ol

iti
ca

l i
de

ol
og

y
- T

ru
st

 (i
n 

m
ed

ic
al

 
pe

rs
on

ne
l, 

sc
ie

n-
tis

ts
, p

ol
iti

ci
an

s, 
jo

ur
na

lis
ts

, t
he

 
go

ve
rn

m
en

t, 
sc

i-
en

tifi
c 

kn
ow

le
dg

e,
 

ci
vi

l s
er

va
nt

s, 
m

ai
ns

tre
am

 
m

ed
ia

)

- M
IS

T-
20

 (i
nc

l. 
M

IS
T-

8)
- N

um
er

ac
y

- P
ol

iti
ca

l i
de

ol
og

y
- T

ru
st

 (i
n 

m
ed

ic
al

 
pe

rs
on

ne
l, 

sc
ie

n-
tis

ts
, p

ol
iti

ci
an

s, 
jo

ur
na

lis
ts

, t
he

 
go

ve
rn

m
en

t, 
sc

i-
en

tifi
c 

kn
ow

le
dg

e,
 

ci
vi

l s
er

va
nt

s, 
m

ai
ns

tre
am

 
m

ed
ia

)

- M
IS

T-
16

- M
IS

T-
8

- B
N

A
O

T 
=

 A
ct

iv
el

y 
O

pe
n-

m
in

de
d 

Th
in

ki
ng

 (B
ar

on
, 2

01
9)

; B
FI

-2
-S

 =
 B

ig
-F

iv
e 

In
ve

nt
or

y 
2 

Sh
or

t-F
or

m
 (S

ot
o 

&
 Jo

hn
, 2

01
7)

; B
N

 =
 B

ad
 N

ew
s G

am
e 

(R
oo

ze
nb

ee
k 

&
 v

an
 d

er
 L

in
de

n,
 2

01
9)

; B
SR

 =
 

Bu
lls

hi
t R

ec
ep

tiv
ity

 sc
al

e 
(P

en
ny

co
ok

 et
 a

l.,
 2

01
5)

; C
M

Q
 =

 C
on

sp
ira

cy
 M

en
ta

lit
y 

Q
ue

sti
on

na
ire

 (B
ru

de
r e

t a
l.,

 2
01

3)
; C

RT
 =

 C
og

ni
tiv

e 
Re

fle
ct

io
n 

Te
st 

(F
re

de
ric

k,
 2

00
5)

; D
EP

IC
T 

=
 D

isc
re

di
tin

g-
Em

ot
io

n-
Po

la
riz

at
io

n-
Im

pe
rs

on
at

io
n-

Co
ns

pi
ra

cy
-T

ro
lli

ng
 d

ec
ep

tiv
e h

ea
dl

in
es

 in
ve

nt
or

y 
(M

ae
rte

ns
 et

 al
., 

20
21

); 
D

EP
IC

T 
SF

 =
 D

EP
IC

T 
Ba

la
nc

ed
 S

ho
rt 

Fo
rm

 (M
ae

rte
ns

 et
 al

., 
20

21
); 

ED
O

 =
 E

co
lo

gi
ca

l 
D

om
in

an
ce

 O
rie

nt
at

io
n 

(U
en

al
 et

 al
., 

20
22

); 
CV

19
 fa

ct
-c

he
ck

 =
 C

O
V

ID
-1

9 
fa

ct
-c

he
ck

 ta
sk

 (P
en

ny
co

ok
, M

cP
he

tre
s, 

et
 al

., 
20

20
); 

G
o 

V
ira

l! 
=

 G
o 

V
ira

l! 
Ba

la
nc

ed
 It

em
 S

et
 (B

as
ol

 et
 al

., 
20

21
); 

M
FQ

20
 

=
 M

or
al

 F
ou

nd
at

io
ns

 Q
ue

sti
on

na
ire

 2
0-

Ite
m

 S
ho

rt 
Fo

rm
 (G

ra
ha

m
 et

 al
., 

20
11

); 
N

um
er

ac
y 

=
 co

m
bi

na
tio

n 
of

 S
ch

w
ar

tz
 N

um
er

ac
y 

Te
st 

(S
ch

w
ar

tz
 et

 al
., 

19
97

) a
nd

 B
er

lin
 N

um
er

ac
y 

Te
st 

(C
ok

el
y 

et
 al

., 
20

12
), 

SD
4 

=
 S

ho
rt 

D
ar

k 
Te

tra
d 

(P
au

lh
us

 et
 al

., 
20

20
); 

SD
O

 =
 S

oc
ia

l D
om

in
an

ce
 O

rie
nt

at
io

n 
(H

o 
et

 al
., 

20
15

); 
SI

N
S 

=
 th

e S
in

gl
e-

Ite
m

 N
ar

ci
ss

ism
 S

ca
le

 (K
on

ra
th

 et
 al

., 
20

14
); 

SI
SE

S 
=

 S
in

gl
e-

Ite
m

 
Se

lf-
Es

te
em

 S
ca

le
 (R

ob
in

s e
t a

l.,
 2

00
1)

 S
IR

IS
 =

 S
in

gl
e-

Ite
m

 R
el

ig
io

us
 Id

en
tifi

ca
tio

n 
Sc

al
e (

N
or

en
za

ya
n 

&
 H

an
se

n,
 2

00
6)

; S
SP

C 
=

 S
ho

rt 
Sc

al
e o

f P
ol

iti
ca

l C
yn

ic
ism

 (A
ic

hh
ol

ze
r &

 K
rit

zi
ng

er
, 2

01
6)


1869Behavior Research Methods (2024) 56:1863–1899 

1 3

Fig. 2  Development protocol of the 
Misinformation Susceptibility Test


1870 Behavior Research Methods (2024) 56:1863–1899

1 3

items created by experts that are fictional but feature common 
misinformation techniques. The former approach arguably 
provides better ecological validity (Pennycook, Binnendyk, 
et al., 2021), while the latter provides a cleaner and less con-
founded measure since it is less influenced by memory and 
identity effects (van der Linden & Roozenbeek, 2020). Con-
sidering these two approaches and reflecting on representative 
stimulus sampling (Dhami et al., 2004), we opted for a novel 
approach that combines the best of both worlds. We employed 
the generative pretrained transformer 2 (GPT-2)—a neutral-
network-based artificial intelligence developed by OpenAI 
(Radford et al., 2019)—to generate fake news items (cf., Götz 
et al., 2022; Hommel et al., 2022). The GPT-2 is one of the 
most powerful open-source text generation tools currently avail-
able for free use by researchers. It was trained on eight mil-
lion text pages, combines 1.5 billion parameters, and is able to 
write coherent and credible articles based on just one or a few 
words of input.5 We did this by asking the GPT-2 to generate a 
list of fake news items inspired by a smaller set of items. This 
smaller set contained items from any of five different scales 
that encompass a wide range of misinformation properties: the 
Belief in Conspiracy Theories Inventory (BCTI; Swami et al., 
2010), the Generic Conspiracist Beliefs scale (GCB; Brotherton 
et al., 2013), specific Conspiracy Beliefs scales (van Prooijen 
et al., 2015), the Bullshit Receptivity scale (BSR; Pennycook 
et al., 2015), and the Discrediting-Emotion-Polarization-Imper-
sonation-Conspiracy-Trolling deceptive headlines inventory 
(DEPICT; Maertens et al., 2021; Roozenbeek & van der Lin-
den, 2019). We set out to generate 100 items of good quality, 
but as this is a new approach, we opted for the generation of at 
least 300 items. More specifically, we let GPT-2 generate thou-
sands of fake news headlines, and tossed out any duplicates and 
clearly irrelevant items (see Supplement S1 for a full overview 
of all items generated and those that have been removed).

Real news For the real news items, we decided to include 
items that met each of the following three selection cri-
teria: (1) the news items are actual news items (i.e., they 
circulated as real news), (2) the news source is the most 
factually correct (i.e., accurate), and (3) is the least biased 
(i.e., nonpartisan or politically centrist). To do this, we used 
the Media Bias/Fact Check database (MBFC; https:// media 
biasf actch eck. com/) to select news sources marked as least 
biased and scoring very high on factual reporting.6 The news 

sources we chose were Pew Research (https:// www. pewre 
search. org/), Gallup (https:// www. gallup. com/), MapLight 
(https:// mapli ght. org/), Associated Press (https:// www. ap. 
org/), and World Press Review (http:// world press. org/). We 
also diversified the selection by including the non-US outlets 
Reuters (https:// www. reute rs. com/), Africa Check (https:// 
afric acheck. org/), and JStor Daily (https:// daily. jstor. org/). 
All outlets received the maximum MBFC score at the time 
of item selection.7 A full list of the real news items selected 
can be found in Supplement S1.

Overall, this item-generation process resulted in an initial 
pool of 413 items. The full list of items we produced and 
methods through which each of them was obtained can be 
found in Supplement S1.

Phase 2: Item condensation To reduce the number of head-
lines generated in Phase 1, we followed previous scale devel-
opment research and practices (Carpenter, 2018; Haynes 
et al., 1995; Simms, 2008) and established an expert com-
mittee with misinformation researchers from four different 
cultural backgrounds: Canada, Germany, the Netherlands, 
and the United States. Each expert conducted an independ-
ent review and classified each of the 413 items generated 
in Phase 1 as either fake news or real news. All items with 
a three-fourths expert consensus and matching the correct 
answer key (i.e., the source veracity category)—a total of 
289 items—were selected for the next phase.8 A full list of 
the expert judgments and inter-rater agreement can be found 
in Supplement S1.

Phase 3: Quality control As a final quality control before 
continuing to the psychometrics study, the two-person item 
generation committee in combination with an extra third 
expert—who had not been previously exposed to any of the 
items—made a final selection of items from Phase 2. Apply-
ing a two-thirds expert consensus as cutoff, we selected 100 
items (44 fake news, 56 real news) out of the 289 from the 
previous stage (i.e., we cut 189 items), thus creating a fairly 
balanced item pool for empirical probing that hosted five 
times as many items as the final scale that we aimed to con-
struct—in keeping with conservative guidelines (Boateng 
et al., 2018; Weiner et al., 2012). A full list of the item sets 

6 MBFC is an independent fact-checking platform that rates media 
sources on factual reliability as well as ideological bias. At the time 
of writing, the MBFC database lists over 3700 media outlets and its 
classifications are frequently used in scientific research (e.g., Bovet & 
Makse, 2019; Chołoniewski et al., 2020; Cinelli et al., 2021).

7 Three out of six no longer receive the maximum score, and are now 
considered to have a center-left bias, and score between mostly fac-
tual and highly factual reporting: World Press Review (mostly factual, 
center-left), MapLight (highly factual, center-left), and JStor Daily 
(highly factual, center-left). This reflects both the dynamic nature of 
news media and the limits of the classification methodology used.
8 We used three-fourths as a criterion instead of 100% consensus 
because, as experts, we may be biased ourselves, and therefore we 
also accepted items where only one expert did not agree. If less than 
120 items would remain, then the Phase 1 item generation process 
would be restarted.

5 For a step-by-step guide on how to set up the GPT-2 to use as a 
psychometric item generator, see the tutorial paper by Götz et  al. 
(2023), as well as the useful blog posts by Woolf (2019), Nasser 
(2020), and Curley (2020).

https://mediabiasfactcheck.com/
https://mediabiasfactcheck.com/
https://www.pewresearch.org/
https://www.pewresearch.org/
https://www.gallup.com/
https://maplight.org/
https://www.ap.org/
https://www.ap.org/
http://worldpress.org/
https://www.reuters.com/
https://africacheck.org/
https://africacheck.org/
https://daily.jstor.org/


1871Behavior Research Methods (2024) 56:1863–1899 

1 3

selected per expert and expert agreement can be found in 
Supplement S1.

Implementation

Participants In line with widespread recommendations to 
assess at least 300 respondents during initial scale implementa-
tion (Boateng et al., 2018; Clark & Watson, 1995, 2019; Com-
rey & Lee, 1992; Guadagnoli & Velicer, 1988), we recruited a 
community sample of 452 US residents (for a comprehensive 
sample description see Table 1). The study was carried out on 
Prolific Academic (https:// www. proli fic. co/), an established 
crowd-working platform which provides competitive data 
quality (Palan & Schitter, 2018; Peer et al., 2017). Based on the 
exclusion criteria laid out in the preregistration, we removed 
incomplete cases, participants who took either an unreasonably 
short or long time to complete the study (less than 8 minutes or 
more than 2 hours), participants who failed an attention check, 
underage participants, and participants who did not live in the 
United States, retaining 409 cases for data analysis.9 Of these, 
225 participants (i.e., 55.01%) participated in the follow-up 
data collection eight months later (T2).10

Participants received a set remuneration of 1.67 GBP 
(equivalent to US$ 2.32) for participating in the T1 ques-
tionnaire and 1.10 GBP (equivalent to US$ 1.53) for T2.

Procedure, measures, transparency, and openness

The preregistrations for T1 and T2 are available on AsPre-
dicted https:// aspre dicted. org/ m7vb3. pdf; https:// aspre 
dicted. org/ js2jz. pdf; any deviations can be found in Sup-
plement S2). The supplement, raw and clean datasets, and 
all analysis scripts in R can be found in the OSF repository 
(https:// osf. io/ r7phc/).

Participants took part in a preregistered online survey. 
After providing informed consent, participants had to cat-
egorize the 100 news headlines from Phase 3 (i.e., the items 
that were retained after the previous three phases) in two 
categories: Fake/Deceptive and Real/Factual.11 Participants 

were told that each headline had only one correct answer. See 
the preregistration or the Qualtrics files on the OSF reposi-
tory for the exact survey framing (https:// osf. io/ r7phc/).

After completing the 100-item categorization task, par-
ticipants completed the 21 items from the DEPICT inven-
tory (a misleading social media post reliability judgment 
task; Maertens et al., 2021), a 30-item COVID-19 fact-check 
task (a classical true/false headline evaluation task; Penny-
cook, McPhetres, et al., 2020), the Bullshit Receptivity scale 
(BSR; Pennycook et al., 2015), the Conspiracy Mentality 
Questionnaire (CMQ; Bruder et al., 2013), the Cognitive 
Reflection Test (CRT; Frederick, 2005), a COVID-19 com-
pliance index (sample item: “I kept a distance of at least two 
meters to other people”: 1 – does not apply at all, 4 – applies 
very much), and a demographics questionnaire (see Table 1 
for an overview). Finally, participants were debriefed. Eight 
months later, the participants were recruited again for a 
test-retest follow-up survey.12 In the follow-up survey, after 
participants provided informed consent to participate, the 
final 20-item MIST was administered, the same COVID-19 
fact-check task (Pennycook, McPhetres, et al., 2020) and 
CMQ (Bruder et al., 2013) were repeated, a new COVID-19 
compliance index was administered, and finally a full debrief 
was presented. The complete surveys are available in the 
OSF repository: https:// osf. io/ r7phc/.

The full study received institutional review board (IRB) 
approval from the Psychology Research Ethics Committee 
of the University of Cambridge (PRE.2019.108).

Analytical strategy 1: Exploratory factor analysis (EFA) 
and item response theory (IRT)

To extract the final MIST-20 and MIST-8 scales from the 
pre-filtered MIST-100 item pool, we followed an item selec-
tion decision tree, which can be found in Supplement S3. 
Specifically—after ascertaining the general suitability of 
the data for such procedures—the following EFA- and IRT-
based exclusion criteria were employed: (1) factor loadings 
below .40 (Clark & Watson, 2019; Ford et al., 1986; Hair 
et al., 2010; Rosellini & Brown, 2021); (2) cross-loadings 
above .30 (Boateng et al., 2018; Costello & Osborne, 2005); 
(3) communalities below .4 (Carpenter, 2018; Fabrigar et al., 
1999; Worthington & Whittaker, 2006); (4) Cronbach’s α 
reliability analysis; (5) differential item functioning (DIF) 
analysis (Holland & Wainer, 1993; Nguyen et al., 2014; 
Reise et  al., 1993); (6) item information function (IIF) 
analysis. Finally, we sought to establish initial evidence for 
construct validity (Cronbach & Meehl, 1955). To do this, we 
investigated the associations between the MIST scales and 

11 All headlines can be found in Supplement S1.

12 We chose to have a follow-up to be able to measure changes in the 
MIST score over the medium long term. We found a period of eight 
months fitting for this purpose.

9 We preregistered that we would split the sample in half for explora-
tory analyses and confirmatory analyses. However, we used the full 
Study 1 sample for exploratory analyses instead and conducted a new 
study with a fresh sample (Study 2) for the confirmatory analyses. 
This more rigorous and more conservative approach was chosen to 
boost power and increase the quality of the initial item selection.
10 We looked at the difference in demographics between T1 and T2 
Prolific users. While we found no noteworthy differences in age (MT1 
= 33.20, MT2 = 35.76) or educational attainment rates, (T1: 38.88% 
with bachelor’s degree, T2: 41.52% with bachelor’s degree), the per-
centage of female participants rose somewhat during the follow-up 
(T1: 55.50% male, T2: 39.72% male).

https://www.prolific.co/
https://aspredicted.org/m7vb3.pdf
https://aspredicted.org/js2jz.pdf
https://aspredicted.org/js2jz.pdf
https://osf.io/r7phc/
https://osf.io/r7phc/
https://osf.io/r7phc/


1872 Behavior Research Methods (2024) 56:1863–1899

1 3

the DEPICT deceptive headline recognition task (Maertens 
et al., 2021) and COVID-19 fact-check (Pennycook et al., 
2020; concurrent validity). We further examined additional 
predictive accuracy of the MIST in accounting for variance 
in DEPICT and fact-check scores above and beyond the 
CMQ (Bruder et al., 2013), BSR (Pennycook et al., 2015), 
and CRT (Frederick, 2005; incremental validity).

Analytical strategy 2: Exploratory graph analysis (EGA)

In this section we explore an alternative method of scale 
development, based on the new field of exploratory graph 
analysis (Golino & Epskamp, 2017), rooted in network 
methods. Network methods in psychology gained momen-
tum with the publication of the mutualism model of intel-
ligence (Van Der Maas et al., 2006) and network perspec-
tive on psychopathology (Borsboom, 2008; Borsboom et al., 
2011; Cramer et al., 2010), giving rise to a new subfield 
of quantitative psychology called network psychometrics 
(Epskamp et al., 2017; Epskamp et al., 2018). Network 
models are used to estimate the relationship between mul-
tiple variables—typically using the Gaussian graphical 
model (GGM; Lauritzen, 1996), where nodes (e.g., test 
items) are connected by edges (or links) that indicate the 
strength of the association between the variables (Epskamp 
& Fried, 2018), forming a system of mutually reinforcing 
elements (Christensen et al., 2020b; Cramer, 2012). Network 
and latent variable models have been shown to be closely 
related, and can produce model parameters that are consist-
ent with one another (Boker, 2018; Christensen & Golino, 
2021c; Epskamp et al., 2017; Golino et al., 2021; Golino 
& Epskamp, 2017; Marsman et al., 2018). These statistical 
similarities can be used as a way to explore the dimension-
ality structure of measurement instruments in a new frame-
work termed exploratory graph analysis (Christensen et al., 
2019; Golino & Demetriou, 2017; Golino & Epskamp, 2017; 
Golino et al., 2020a, 2020b).

In network psychometrics (Christensen et  al., 2019; 
Epskamp et  al., 2018; Epskamp et  al., 2017; Golino & 
Demetriou, 2017; Golino & Epskamp, 2017; Golino et al., 
2020a, 2020b), networks are typically estimated using the 
Gaussian graphical model (Lauritzen, 1996) using the EBIC-
glasso approach (Epskamp & Fried, 2018). The EBICglasso 
approach operates by minimizing a penalized log-likelihood 
function and selecting the best model fit (i.e., the optimum 
level of sparsity in a network) using the extended Bayes-
ian information criterion (EBIC; Chen & Chen, 2008). As 
Golino et al. (2022) argue, the use of weighted network 
models in psychology opened the doors for network science 
methods developed in other areas of science to psychologi-
cal problems such as dimensionality (e.g., factor analysis).

Exploratory graph analysis was originally proposed 
by Golino and Epskamp et al. (2017), which showed that 

the GGM model combined with a clustering algorithm for 
weighted networks (Walktrap; Pons & Latapy, 2005) could 
accurately recover the number of simulated factors, present-
ing higher accuracy than traditional factor analytic-based 
methods. Later, Golino, Shi, et al. (2020b) compared EGA 
with different types of factor analytic methods (including 
two types of parallel analysis), finding that EGA achieves 
the highest overall accuracy (87.91%) in estimating the num-
ber of simulated factors, followed by the traditional parallel 
analysis with principal components of Horn (1965; 83.01%), 
and parallel analysis using principal axis factoring proposed 
by Humphreys and Ilgen (1969; 81.88%).

Golino et al. (2022) summarized the advantages of the 
EGA framework over more traditional methods (Golino, 
Shi, et al., 2020b): (1) unlike exploratory factor analysis 
(EFA) methods, EGA does not require a rotation method to 
interpret the estimated first-order factors (although rotations 
are rarely discussed in the validation literature, they have 
significant consequences for validation, e.g., estimation of 
factor loadings; Sass & Schmitt, 2010); (2) EGA automati-
cally places items into factors without the researcher’s direc-
tion, which contrasts with exploratory factor analysis, where 
researchers must decipher a factor loading matrix (such a 
placement opens the door for dimension and item stabil-
ity methods, which is presented next); and (3) the network 
representation depicts how items relate within and between 
dimensions.

Over the past couple of years, the EGA framework has 
expanded into several important areas of psychometrics. 
Christensen and Christensen and Golino (2021c) devel-
oped a new metric termed network loadings computed by 
standardizing node strength—the sum of the edges a node is 
connected to—split between dimensions identified by EGA. 
Christensen and Christensen and Golino (2021c) showed 
in their simulation study that network loadings are akin to 
factor loadings, but with different reference values. Network 
loadings of .15, .25, and .35 are equivalent to low (.40), 
moderate (.55), and high (.70) network loadings, respec-
tively (Christensen & Golino, 2021c). The development of 
network loadings opened new lines of research, such as the 
development of metric invariance using EGA and permuta-
tion tests in a network perspective (Jamison et al., 2022), 
and determining whether data are generated from a factor or 
network model (Christensen & Golino, 2021b).

Based on the automated item placement of EGA, Chris-
tensen and Golino (2021a) developed a bootstrap approach 
to investigate the stability of items and dimensions estimated 
by EGA, termed bootstrap exploratory graph analysis, and 
proposed two new metrics of psychometric quality: item sta-
bility and structural consistency. Item stability indicates how 
often an item replicates in their designated EGA dimension, 
with values lower than .75 (i.e., that are estimated in their 
original dimensions in 75% of the bootstrapped samples) 


1873Behavior Research Methods (2024) 56:1863–1899 

1 3

indicating problematic (or unstable) items. Structural con-
sistency, by its turn, indicates how often an EGA dimension 
exactly replicates and can be used to verify configural (or 
structural) invariance and determine poor-functioning items 
(Golino et al., 2022). A complementary approach, called 
unique variable analysis, was developed to identify redun-
dant items and can be used to identify the reason why some 
items function poorly (Christensen, Garrido, & Golino, 
2020a).

The fit of a dimensionality structure estimated using 
EGA to the data can be verified using an innovative fit index 
termed total entropy fit index (TEFI; Golino, Moulder, et al., 
2020a), developed as an alternative to traditional fit meas-
ures used in factor analysis and structural equation modeling 
(SEM). In a comprehensive simulation study, the TEFI dem-
onstrated higher accuracy in correctly identifying the num-
ber of simulated factors than the comparative fit index (CFI), 
the root mean square error of approximation (RMSEA), 
and other indices used in SEM (Golino, Moulder, et al., 
2020a). The TEFI is based on the Von Neumann entropy 
(Von Neumann, 1927)—a measure developed to quantify 
both the amount of disorder in a system and the entangle-
ment between two subsystems (Preskill, 2018). The TEFI 
index is a relative measure of fit that can be used to compare 
two or more dimensionality structures. The dimensionality 
structure with the lowest TEFI value indicates the best fit 
for the data.

Another recent development within the EGA framework 
is the hierarchical EGA (hierEGA) technique by Jimenez 
et al. (2022). In their work, Jimenez et al. (2022) proposed 
an alternative variation to a popular clustering algorithm 
called Louvain (Blondel et al., 2008) to detect lower- and 
higher-order factors in data, and showed that this new tech-
nique is more effective than traditional factor analytic tech-
niques to estimate the structure of first- and second-order 
factors in generalized bifactor structures.

All the EGA-based techniques/metrics mentioned above 
use the free and open-source R package EGAnet (Golino 
& Christensen, 2019), which has become one of the main 
software programs in network psychometrics. In the cur-
rent paper, version 1.2.4 of the EGAnet package (Golino 
& Christensen, 2019) was used, and several strategies 
were implemented. The first strategy aimed at estimating 
the dimensionality structure of the 100 MIST items. Then, 
redundant items were identified using unique variable analy-
sis (Christensen et al., 2020a), and for every group or pair of 
redundant items the one with the higher ratio of main net-
work loadings to cross-loadings was kept in the analysis. The 
stability of the items and the structural consistency of the 
dimensions were obtained via bootstrap exploratory graph 
analysis (Christensen & Golino, 2021a) with 500 iterations 
(using parametric bootstrapping), and items with stability 
lower than 75% and network loadings lower than .15 were 

removed from subsequent steps. Once a subset of stable 
items with at least low to moderate network loadings were 
found, a subset of the best items per dimension (i.e., with 
moderate to high network loadings—with a network load-
ing of at least .23) were identified, and further item stability 
and structural consistency metrics were computed until all 
items were highly stable (with item stability greater than 
90%). The metric invariance of the final pool of best items 
per dimension (moderate to high network loadings and high 
item stability) was investigated using the EGA permutation 
test developed by Jamison et al. (2022), having as reference 
groups sex, age (above or below the median birth year), and 
education (above or below the median level of formal edu-
cation received). The fit of the EGA-estimated dimensions 
to the data was computed using the total entropy fit index 
(Golino, Moulder, et al., 2020a) and compared to the two-
factor structure of real and fake news items identified using 
EFA. CFI and RMSEA computed after fitting a confirmatory 
factor model to the EGA-estimated dimensions were also 
obtained, and compared to the CFI and RMSEA of the two-
factor structure. Additionally, the Satorra (Satorra, 2000) 
scaled difference test was implemented to verify the struc-
ture with the best fit to the data.

Results

EFA/IRT results

Item selection Using parallel analysis with the psych pack-
age (Revelle, 2021), we aimed to select a parsimonious fac-
tor structure, with each factor reflecting eigenvalues above 
the 95th percentile of corresponding eigenvalues from 500 
simulated random datasets.13 Parallel analysis (with 500 iter-
ations) suggested a total of six factors, but only five factors 
(eigenvalues:  F1 = 10.89,  F2 = 7.82,  F3 = 1.89,  F4 = 1.42, 
 F5 = 1.23,  F6 = 0.98) matched our criteria and were above 
the 95th percentile of corresponding eigenvalues from the 
500 simulated random datasets (eigenvalue 95th percentile = 
0.99).14 Two factors explained most of the variance, which is 
in line with our theoretical model of two main factors (fake 
news detection and real news detection). An EFA using the 
tetrachoric correlation matrix with unweighted least squares 

13 The factorability of the data was tested via the Kaiser–Meyer–
Olkin (KMO) measure of sampling adequacy and Bartlett’s test of 
sphericity using R and the EFAtools package (Steiner & Grieder, 
2020). Both tests indicated excellent data suitability (Bartlett’s χ2 
= 12,896.84, df = 4950, p < .001; KMO = .831) according to estab-
lished guidelines (Carpenter, 2018; Tabachnick & Fidell, 2007).
14 These five factors are in line with the criteria set out in the pre-
registration, as they have both (i) an eigenvalue > 1 and (ii) an eigen-
value larger than the simulated value (above the line of randomly 
generated data).


1874 Behavior Research Methods (2024) 56:1863–1899

1 3

(ULS) estimation without rotation using the EFAtools pack-
age (Steiner & Grieder, 2020) indicated that for both the 
two-factor structure and the five-factor structure, the first 
two factors were specifically linked to the real news items 
and the fake news items, respectively, while the other three 
factors did not show a pattern easy to interpret and in general 
showed low factor loadings (< .30).15 See Supplement S4 
for a pattern matrix.

As we set out to create a measurement instrument for 
two distinct abilities, real news detection and fake news 
detection, we continued with a two-factor EFA, employ-
ing principal axis factoring and varimax rotation using 
the psych package (Revelle, 2021).16 Theoretically we 
would expect a balancing out of positive and negative 
correlations between the two factors: positive because of 
the underlying veracity discernment ability, and negative 
because of the response biases. In line with this, we chose 
an orthogonal rotation instead of an oblique rotation to 
separate out fake news detection and real news detection 
as cleanly as possible.

Three iterations were needed to remove all items with 
a factor loading under .40 (43 items were removed). After 
this pruning, no items showed cross-loadings larger than 
.30. Communality analysis using the three-parameter logistic 
model function in the mirt package (Chalmers, 2012) with 
50% guessing chance (c = .50) indicated two items with 
communality lower than .40 after one iteration. These items 
were removed. No further iterations yielded any additional 
removals. A final list of the communalities can be found in 
Supplement S5. Cronbach’s α reliability analysis with the 
psych package was used to remove all items that had nega-
tive effects (∆α > .001) on the overall reliability of the test 
(Revelle, 2021). No items had to be removed based on this 
analysis.17 Differential item functioning using the mirt pack-
age was used to explore whether differences in gender or 
ideology would alter the functioning of the items (Chalmers, 
2012). None of the items showed differential functioning for 
gender or ideology.

Finally, using the three-parameter logistic model 
IRT functions in the mirt package (Chalmers, 2012), we 
selected the 20 best items (10 fake, 10 real) and the 8 best 
items (4 fake, 4 real), resulting in the MIST-20 and the 
MIST-8, respectively. These items were selected based on 
their discrimination and difficulty values, where we aimed 
to select a diverse set of items that have high discrimina-
tion (a ≥ 2.00 for the MIST-20, a ≥ 3.00 for the MIST-8) 
yet have a wide range of difficulties (b = [−0.50, 0.50], for 
each ability), while keeping the guessing parameter at 50% 
chance (c =.50). We also took into account the topics to 
ensure both that we covered a wide range of news areas 
and that there was no repetition of content (Flake et al., 
2017). A list of the IRT coefficients and plots can be found 
in Supplement S1 and Supplement S6, respectively. See 
Fig. 3 for a MIST-20 item trace line plot, and Fig. 4 for 
a multidimensional plot of the MIST-20 IRT model pre-
dictions. The final items that make up the MIST-20 and 
MIST-8 are shown in Table 2.18 An overview of different 
candidate sets and how they performed, as well as the full 
analysis scripts and the supplement, can be found in the 
OSF repository: https:// osf. io/ r7phc/.

Reliability Inter-item correlations show good internal con-
sistency for both the MIST-8 (IICmin = .20, IICmax = .27) 
and the MIST-20 (IICmin = .22, IICmax = .29). Item-total 
correlations also show good reliability for both the MIST-8 
(ITCmin = .44, ITCmax = .53) and the MIST-20 (ITCmin = .31, 
ITCmax = .54).

Looking further into the MIST-20, we analyze the reli-
ability of veracity discernment (V; M = 15.71, SD = 3.35), 
real news detection (r; M = 7.62, SD = 2.43), and fake news 
detection (f; M = 8.09, SD = 2.10). In line with the guidelines 
by Revelle and Condon (2019), we calculate a two-factor 
McDonald’s ω (McDonald, 1999) as a measure of internal 
consistency using the psych package (Revelle, 2021), and 
find good reliability for the general scale and the two facet 
scales (ωg = 0.79, ωF1 = 0.78, ωF2 = 0.75). Also using the 
psych package (Revelle, 2021), we calculate the variance 
decomposition metrics as a measure of stability, finding that 
F1 explains 14% of the total variance and F2 explains 12% 
of the total variance. Of all variance explained, 53% comes 
from F1 (r) and 47% comes from F2 (f), demonstrating a 
good balance between the two factors.

15 When using EFA with a promax rotation, there is some evidence 
for two factors for the fake news items and two factors for the real 
news items, bringing up a total of four factors, but its pattern and 
meaning is unclear. This alternative structure will be further explored 
in the EGA section.
16 While we chose to adhere to the more traditional methods for 
estimating and rotating factors in EFA, we acknowledge that recent 
research provides arguments for the use of ML estimation and oblique 
rotations (Goretzko et  al., 2021), and specifically ULS estimation 
(using the tetrachoric correlation matrix) for dichotomous variables 
(see Shi et al., 2018). We provide an alternative, modern approach to 
item selection based on EGA in the section below.
17 We note that some researchers argue that the focus on reliability 
can reduce the content validity of the scale, as there may be relevant 
items with weaker loadings (e.g., Flake et al., 2017). However, as no 
items were removed, this is not a concern for this study.

18 As can be glimpsed from the final set, the misinformation items 
contain certain words and topics that are more often linked to manip-
ulative content, such as “control/manipulate/cause,” “vaccine/virus,” 
and “government.” These topics were already present in the sample 
items given to the GPT-2—which led to more of these topics being 
present in the original fake news item pool than in the real news 
item pool. This thus represents a feature that was present since the 
first phase of the development and is not just a consequence of a later 
selection by the experts or elimination based on factor loadings.

https://osf.io/r7phc/


1875Behavior Research Methods (2024) 56:1863–1899 

1 3

Finally, test–retest reliability analysis indicates that MIST 
scores are moderately positively correlated over a period of 
eight to nine months (rT1,T2 = 0.58).19

Validity To assess initial validity, we examined the asso-
ciations between the MIST scales and two scales that have 
been used regularly in previous misinformation research—
the COVID-19 fact-check by Pennycook, McPhetres, et al. 
(2020) and the DEPICT task by Maertens et al. (2021)—
expecting high correlations (r > .50; concurrent valid-
ity) and additional variance explained as compared to the 
existing CMQ, BSR, and CRT scales (incremental valid-
ity; Clark & Watson, 2019; Meehl, 1978). As can be seen 
in Table 3, we found that the MIST-8 displays a medium 
to high correlation with the fact-check (rfact-check,MIST-8 = .49) 
and DEPICT task (rDEPICT,MIST-8 = .45), while the MIST-
20 shows a large positive correlation with both the fact-
check (rfact-check,MIST-20 = .58) and the DEPICT task 
(rDEPICT,MIST-20 = .50). Using a linear model, we found that the 
explained variance in the fact-check indicates that the MIST-
20 can explain 33% (adjusted R2) of variance by itself. The 
CMQ, BSR, and CRT combined account for 19%. Adding the 
MIST-20 on top provides an incremental 18% of explained 
variance (adjusted R2 = 0.37). The MIST-20 is the strongest 
predictor in the combined model (t(404) = 10.82, p < .001, 
β = 0.49, 95% CI [0.40, 0.57]). For the DEPICT task we 
found that the CMQ, BSR, and CRT combined explain 12% 

of variance in deceptive headline recognition and 26% when 
the MIST-20 is added (∆R2 = 0.14), while the MIST-20 alone 
explains 25%. For the DEPICT task we found the MIST-20 
to be the only significant predictor in the combined model 
(t(404) = 8.94, p < .001, β = 0.43, 95% CI [0.34, 0.53]).20

EGA results

In this section we re-analyze the pool of 100 MIST items using 
EGA. EGA estimated four dimensions (see Fig. 5), which can 
be identified as two dimensions of real news headlines and two 
of fake news headlines. Dimension 1 (red nodes on Fig. 5) is 
a combination of US and international real news headlines, 
with items such as MIST 96 (US Hispanic Population Reached 
New High in 2018, But Growth Has Slowed), MIST 92 (Taiwan 
Seeks to Join Fight Against Global Warming), and MIST 60 
(Hyatt Will Remove Small Bottles from Hotel Bathrooms by 
2021). Dimension 2 (blue nodes on Fig. 5) has fake news items 
about science, such as item MIST 8 (Climate Scientists’ Work 
Is “Unreliable”, a “Deceptive Method of Communication”), 
and false statements against people with a liberal world view, 
such as items MIST 16 (Left-Wingers Are More Likely to Lie 
to Get a Good Grade) and MIST 20 (New Study: Left-Wingers 
Are More Likely to Lie to Get a Higher Salary). The third 
dimension (green nodes on Fig. 5) has real news items related 
to politically charged topics in the US, such as items MIST 70 
(Majority in US Still Want Abortion Legal, with Limits), MIST 
74 (Most Americans Say It’s OK for Professional Athletes 

19 It must be noted that at T2, participants only completed the 
20-item MIST, while at T1 participants had to categorize 100 items, 
with slightly different question and response framings (see full Qual-
trics layouts and question framings in the OSF repository: https:// osf. 
io/ r7phc/). We expect the actual test–retest correlation to be higher.

20 Full model output for the MIST-8 and MIST-20 linear models can 
be found in Supplement S8. Full analysis scripts can be found in the 
OSF repository: https:// osf. io/ r7phc/.

Fig. 3  Item trace lines for MIST-20 items, for the fake news items in Panel A and real news items in Panel B. The items in the legend are ordered 
according to their difficulty level

https://osf.io/r7phc/
https://osf.io/r7phc/
https://osf.io/r7phc/


1876 Behavior Research Methods (2024) 56:1863–1899

1 3

to Speak out Publicly about Politics), and MIST 94 (United 
Nations Gets Mostly Positive Marks from People Around the 
World). Dimension 4 (orange nodes on Fig. 5) has fake news 
items related to general conspiracy beliefs, such as item MIST 
1 (A Small Group of People Control the World Economy by 
Manipulating the Price of Gold and Oil), and conspiracies 
related to the government, such as items MIST 31 (The Gov-
ernment Is Actively Destroying Evidence Related to the JFK 
Assassination) and MIST 32 (The Government Is Conducting 
a Massive Cover-Up of Their Involvement in 9/11).

The unique variable analysis technique (Christensen 
et al., 2020a) identified two redundant items: MIST 43 (UN: 
New Report Shows Shark Fin Soup as ‘the Most Important 
Source of Protein’ for World’s Poor) and MIST 17 (New 
Data Show Shark Fins Are the ‘Most Important Source of 
Protein’ for the World’s Poor). The ratio of network loadings 
(main/cross-loadings) for these items (8.47 and 6.9, respec-
tively) suggested that item MIST 43 should be kept in the 
subsequent analyses. A bootstrap exploratory graph analysis 
with 500 iterations (parametric bootstrapping) identified four 
median dimensions (95% CI: 2.11, 5.89) but with very low 

structural consistency for each dimension (0.09, 0.14, 0.07, 
and 0.43 for dimensions 1, 2, 3, and 4, respectively). The 
item stability metric (Christensen & Golino, 2021a) varied 
from 23% to 98%, with 40% of items presenting inadequate 
or moderate stability (i.e., lower than 75%, see Fig. 6).

Removing the items with item stability lower than 75% 
and repeating the parametric bootstrap EGA technique with 
500 iterations showed that the stability improved consider-
ably, leading to structural consistency between 0.61 (dimen-
sion 2) and 0.96 (dimension 4), and mean item stability of 
93%. From the 59 items selected in the steps above, a subset 
with network loadings equal to or higher than .155 were 
selected from each dimension estimated via EGA, resulting 
in 34 items. A parametric bootstrap EGA with 500 itera-
tions followed by item stability analysis was implemented 
once again, and items with stability lower than 75% were 
removed, resulting in 32 items.

The final selection of items was implemented using the 
following strategy. Out of the 32 items selected in the previ-
ous steps, only those with relatively high network loadings 
(≥ .23 or ≥ . 235) were used in the subsequent bootEGA and 

Fig. 4  Multidimensional IRT plot representing the final MIST-20 test


1877Behavior Research Methods (2024) 56:1863–1899 

1 3

item stability analysis, which identified 16 highly stable 
items (see Fig. 7). Exploratory graph analysis identified the 
same four dimensions described in the first paragraph of this 
section, but now they presented very high structural consist-
ency ranging from .982 to 1, and very high item stability 
(ranging from 98 to 100%). The network loadings of the final 
MIST-16 EGA items are presented in Table 4.

A metric invariance analysis for EGA using permutation 
tests (Jamison et al., 2022) was conducted using sex, mean age, 
and mean education as grouping variables. None of the items 
exhibited a significant (p < .05) difference in network loadings 
across the tested groups, suggesting that the 16 items selected 
using the EGA framework work similarly irrespective of sex, 
age, and education (see Supplement S19 for an overview).

The fit of the four-dimensional structure estimated via 
EGA was compared to the fit of the two-factor structure of real 
and fake news items using the total entropy fit index (Golino, 
Moulder, et al., 2020a), and two traditional factor-analytic 
fit measures (CFI and RMSEA). To compute the traditional 
factor-analytic fit indices, a confirmatory factor analysis was 
implemented using the WLSMV estimator for each structure 
(see Fig. 8). Table 5 shows that the EGA four-factor struc-
ture presented the lowest TEFI and RMSEA, and the highest 

CFI, suggesting that the four-factor first-order dimensions 
estimated via EGA fit the data better than the theoretical two-
factor structure, although the two-factor structure also has an 
acceptable fit. The Satorra (Satorra, 2000; Table 6) scaled 
difference test also showed that the EGA four-factor structure 
is preferable to the theoretical two first-order factor structure.

Two different traditions were used to select a subset of 
items, one relying on traditional techniques (EFA and IRT) 
and another relying on modern network psychometric methods 
(EGA). Looking at the item stability and structural consistency 
of the dimensions between the two, we found that the MIST-
16 EGA items are stable and consistent, indicating that the 
four dimensions estimated using exploratory graph analysis are 
robust and likely to be identified in independent samples. The 
20 items selected using EFA/IRT were less robust in terms of 
stability (see Supplement S19: EGA Metric Invariance Tests). 
The low stability for some of the items of MIST-20 might 
indicate that there are a higher or lower number of dimensions 
underlying the data. The parametric bootstrap EGA analysis 
(with 500 iterations) of the MIST-20 items indicates that 
two dimensions are estimated in 21.0% of the bootstrapped 
samples, three dimensions in 68.2%, and four dimensions in 
10.0%. The item stability of the most common structure (three 

Table 2  Final items selected for MIST-20 and MIST-8

Items in bold are items included in the short version of the test (MIST-8). a = discrimination parameter. b = difficulty parameter

Item no. a b Content

Fake news
MIST_14 3.50 0.53 Government Officials Have Manipulated Stock Prices to Hide Scandals
MIST_28 2.69 0.06 The Corporate Media Is Controlled by the Military-industrial Complex: The Major Oil Compa-

nies Own the Media and Control Their Agenda
MIST_20 3.26 −0.20 New Study: Left-Wingers Are More Likely to Lie to Get a Higher Salary
MIST_34 3.42 −0.25 The Government Is Manipulating the Public's Perception of Genetic Engineering in Order to 

Make People More Accepting of Such Techniques
MIST_15 2.34 −0.40 Left-Wing Extremism Causes 'More Damage' to World Than Terrorism, Says UN Report
MIST_7 2.57 −0.45 Certain Vaccines Are Loaded with Dangerous Chemicals and Toxins
MIST_19 2.00 −0.55 New Study: Clear Relationship Between Eye Color and Intelligence
MIST_33 5.60 −0.76 The Government Is Knowingly Spreading Disease Through the Airwaves and Food Supply
MIST_10 2.64 −1.02 Ebola Virus 'Caused by US Nuclear Weapons Testing', New Study Says
MIST_13 2.86 −1.30 Government Officials Have Illegally Manipulated the Weather to Cause Devastating Storms

Real news
MIST_50 3.12 0.38 Attitudes Toward EU Are Largely Positive, Both Within Europe and Outside It
MIST_82 2.22 0.31 One-in-Three Worldwide Lack Confidence in NGOs
MIST_87 2.25 0.14 Reflecting a Demographic Shift, 109 US Counties Have Become Majority Nonwhite Since 2000
MIST_65 2.36 −0.03 International Relations Experts and US Public Agree: America Is Less Respected Globally
MIST_60 3.39 −0.09 Hyatt Will Remove Small Bottles from Hotel Bathrooms by 2021
MIST_73 2.43 −0.14 Morocco’s King Appoints Committee Chief to Fight Poverty and Inequality
MIST_88 2.79 −0.31 Republicans Divided in Views of Trump’s Conduct, Democrats Are Broadly Critical
MIST_53 2.12 −0.37 Democrats More Supportive than Republicans of Federal Spending for Scientific Research
MIST_58 8.59 −0.60 Global Warming Age Gap: Younger Americans Most Worried
MIST_99 2.26 −0.83 US Support for Legal Marijuana Steady in Past Year


1878 Behavior Research Methods (2024) 56:1863–1899

1 3

dimensions, see Supplement S20) reveals that the items are 
relatively stable, but still not as stable as the MIST-16 EGA 
items. A comparison of the three-dimensional structure esti-
mated using EGA in the MIST-20 items with the theoretical 
two-factor structure (see Table 7) shows that the three-factor 
solution performs slightly better, since it presents lower TEFI 
and RMSEA, and higher CFI.

Discussion

In Study 1, we generated 413 news items using GPT-2 auto-
mated item generation for fake news, and trusted sources for real 
news. Through two independent expert committees, we reduced 
the item pool to 100 items (44 fake and 56 real). We then com-
bined item response theory with factor analysis to reduce the 
item set to the 20 best items for the MIST-20 and the 8 best 
items for the MIST-8. We found that the final items demonstrate 
good reliability. In an initial test of validity, we found strong 
concurrent validity for both the MIST-8 and the MIST-20 as 
evidenced by their strong associations with the COVID-19 fact-
check (a headline evaluation task) and the DEPICT deceptive 
headline recognition task (a social media post reliability judg-
ment task). Moreover, we found that both the MIST-20 and the 
MIST-8 outperformed the combined model of the CMQ, BSR, 
and CRT, when explaining variance in fact-check and DEPICT 
scores, evidencing incremental validity. This study provides the 
first indication that both the MIST-20 and MIST-8 are psycho-
metrically sound, and can explain and test misinformation sus-
ceptibility above and beyond the existing scales. Finally, we also 

presented an alternative approach to item selection, namely one 
based on EGA that uses network psychometrics to identify the 
best partition of the multidimensional space, combined with a 
bootstrap analysis of item and dimensional stability (structural 
consistency), to identify a set of highly stable items with moder-
ate or high network loadings, leading to the selection of 16 items 
measuring four dimensions of misinformation susceptibility.

Study 2: Validation—Confirmatory analyses, 
nomological net, and national norms

Study 2 sought to consolidate and extensively test the psycho-
metric soundness of the newly developed MIST-20, MIST-16, 
and MIST-8 scales. Across five large samples with nationally 
representative quotas from two countries (US, UK) and three 
different recruitment platforms (CloudResearch, Prolific, and 
Respondi) we pursued three goals. First, we used structural 
equation modeling and reliability analyses to probe the struc-
tural stability, model fit, and internal consistency of the MIST 
across different empirical settings. Second, we built an exten-
sive nomological network and examined both the correlation 
patterns and the predictive power of the MIST to demonstrate 
convergent, discriminant, and incremental validity. Third, we 
capitalized on the representativeness of our samples to derive 
national norms for the general population (UK, US) and spe-
cific demographic (UK, US) and geographical subgroups (US).

Method: MIST‑20/MIST‑8

Participants

As part of our EFA/IRT validation study, we collected data from 
four samples with nationally representative quota (Ntotal = 8310, 
Nclean = 6461).21 Sample 2A was a US sample (N = 3692) with 
interlocking age and gender quota (i.e., each category contains a 
representative relative proportion of the other category) accessed 
through Respondi, an International Organization for Standardi-
zation (ISO)-certified international organization for market and 
social science research (for previous applications see, e.g., Dür 
& Schlipphak, 2021; Heinsohn et al., 2019; Roozenbeek, Free-
man, et al., 2021a). After excluding incomplete cases and par-
ticipants outside of the quota, 3479 participants were considered 
for analysis. Sample 2B was a US sample with nationally rep-
resentative age, ethnicity, and gender quota (N = 856) recruited 
through CloudResearch (formerly TurkPrime), an online 
research platform similar to MTurk but with additional validity 
checks and more intense participant pool controls (Buhrmester 
et al., 2018; Litman et al., 2017). After excluding all participants 

21 Surveys 2A, 2C, and 2D were designed as part of a separate 
research project which featured the MIST-20 as an add-on. Survey 2B 
was designed specifically for this project.

Table 3  Incremental validity of MIST-8 and MIST-20 with existing 
measures

* p < .05, ** p < .01, *** p < .001

r Adjusted R2 ∆R2

CV19 fact-check ~
  MIST-8 .49 .24
  MIST-20 .58 .33

-
  CMQ + BSR + CRT .19
  CMQ + BSR + CRT + MIST-8 .30 .11***

-
  CMQ + BSR + CRT .19
  CMQ + BSR + CRT + MIST-20 .37 .18***

DEPICT ~
  MIST-8 .45 .20
  MIST-20 .50 .25

-
  CMQ + BSR + CRT .12
  CMQ + BSR + CRT + MIST-8 .22 .11***

-
  CMQ + BSR + CRT .12
  CMQ + BSR + CRT + MIST-20 .26 .14***


1879Behavior Research Methods (2024) 56:1863–1899 

1 3

who failed an attention check, were underage, did not reside in 
the United States, did not complete the entire study, completed 
the study in ≤ 10 minutes, or were a second-time participant, 
510 participants remained.22 Sample 2C was a UK sample 
(N = 2517) based on nationally representative interlocking age 
and gender quota recruited through Respondi. After excluding 
incomplete cases and participants outside of our quota criteria, 
1227 participants were retained. Lastly, sample 2D was a UK 
sample (N = 1396) with nationally representative age and gender 
quota recruited through Prolific. Excluding all entries that fell 
outside of our quota criteria and all incomplete entries resulted 
in an analysis sample of 1245 participants.

In line with the best practices for scale development to 
recruit at least 300 participants per sample (Boateng et al., 
2018; Clark & Watson, 1995, 2019; Comrey & Lee, 1992; 
Guadagnoli & Velicer, 1988) and for being highly powered 
(power = .90, α = .05) to detect the smallest effect size of 
interest (r = .10, needed N = 1046; Anvari & Lakens, 2021; 
Funder & Ozer, 2019; Götz, Gosling, et al., 2022), Samples 
2A, 2C, and 2D exceed the size requirements. Sample 2B 
was highly powered (power = .90, α = .05) to detect effect 
sizes r of .15 (needed N = 463). Power analyses were com-
pleted using the pwr package in R (Champely et al., 2021).

Detailed demographic breakdowns of all samples are 
shown in Table 1.

Procedure and measures

All participants were invited to take part in an online sur-
vey through the respective research platforms. After pro-
viding informed consent, all participants provided basic 

22 This is a slight deviation from the preregistration, as we added 
incomplete entries, second entries, participants that completed the 
survey in ≤ 10 minutes, and participants who failed any attention 
check (instead of both) to the exclusion criteria, thus adopting a more 
rigorous and conservative exclusion approach than we had preregis-
tered. These additional exclusions were to ensure high-quality data.

Fig. 5  Structure of the 100 MIST items estimated using exploratory graph analysis


1880 Behavior Research Methods (2024) 56:1863–1899

1 3

Fig. 6  Item stability metric of the MIST-100 items in Study 1


1881Behavior Research Methods (2024) 56:1863–1899 

1 3

demographic information and completed the MIST-20 
and—depending on their sample group—a select set of 
additional psychological measures (for a detailed descrip-
tion of all constructs assessed in each sample group, see 
Table 1). All participants received financial compensation 
in accordance with platform-specific remuneration stand-
ards and guidelines on ethical payment at the University of 

Cambridge. Participants in Samples 2A, 2B, and 2C were 
paid by the sampling platform directly, while participants in 
Sample 2D received 2.79 GBP for a 25-minute survey (6.70 
GBP per hour). All data collections were approved by the 
Psychology Research Ethics Committee of the University of 
Cambridge (PRE.2019.108, PRE.2020.034, PRE.2020.086, 
PRE.2020.120).

Table 4  Network loadings per item and dimension estimated via EGA. Network loadings of .15, .25, and .35 are equivalent to low (.40), moder-
ate (.55), and high (.70) network loadings, respectively (Christensen & Golino, 2021c)

Item Dim1 Dim2 Dim3 Dim 4 Dim Headline

MIST_73 0.35 0.04 −0.01 0.11 1 Morocco’s King Appoints Committee Chief to Fight Poverty and Inequality
MIST_96 0.33 −0.12 −0.06 0.10 1 US Hispanic Population Reached New High in 2018, But Growth Has Slowed
MIST_60 0.28 0.03 0.07 0.10 1 Hyatt Will Remove Small Bottles from Hotel Bathrooms by 2021
MIST_92 0.24 0.11 0.08 0.09 1 Taiwan Seeks to Join Fight Against Global Warming
MIST_47 0.24 0.06 −0.03 0.00 1 About a Quarter of Large US Newspapers Laid off Staff in 2018
MIST_33 0.16 0.40 0.06 0.00 2 The Government Is Knowingly Spreading Disease Through the Airwaves and Food Supply
MIST_31 0.00 0.40 0.00 0.01 2 The Government Is Actively Destroying Evidence Related to the JFK Assassination
MIST_14 −0.05 0.26 0.06 −0.04 2 Government Officials Have Manipulated Stock Prices to Hide Scandals
MIST_1 −0.06 0.22 0.05 −0.02 2 A Small Group of People Control the World Economy by Manipulating the Price of Gold and 

Oil
MIST_32 −0.10 0.31 0.13 0.00 2 The Government Is Conducting a Massive Cover-Up of Their Involvement in 9/11
MIST_20 0.09 0.05 0.44 0.01 3 New Study: Left-Wingers Are More Likely to Lie to Get a Higher Salary
MIST_8 0.08 0.09 0.26 0.00 3 Climate Scientists' Work Is 'Unreliable', a 'Deceptive Method of Communication'
MIST_16 0.01 0.10 0.39 0.05 3 Left-Wingers Are More Likely to Lie to Get a Good Grade
MIST_70 0.14 −0.04 0.00 0.38 4 Majority in US Still Want Abortion Legal, with Limits
MIST_74 0.08 0.00 0.04 0.32 4 Most Americans Say It’s OK for Professional Athletes to Speak out Publicly about Politics
MIST_94 0.06 0.02 0.02 0.30 4 United Nations Gets Mostly Positive Marks from People Around the World

Fig. 7  Final structure of the MIST-16 EGA items (left) and their stability indices (right) estimated using parametric bootstrap EGA with 500 
iterations


1882 Behavior Research Methods (2024) 56:1863–1899

1 3

Analytical strategy

We adopted a three-pronged analytical strategy. First, we com-
puted reliability estimates and conducted confirmatory factor 
analyses for each subsample, seeking to reproduce, consoli-
date, and evaluate the higher-order model derived in Study 1. 
Second, in an effort to establish construct validity (Cronbach 
& Meehl, 1955; Strauss & Smith, 2009), we pooled the con-
structs assessed across our four validation samples to build 
a comprehensive, theory-driven, and preregistered (Sample 
2B) nomological network. To this end, we cast a wide net 
and included (1) concepts that should be meaningfully posi-
tively correlated with MIST scores (convergent validity; i.e., 
DEPICT Balanced Short Form; Maertens et al., 2021; Go 
Viral! Balanced Item Set; Basol et al., 2021), expecting a high 
positive Pearson r correlation ([0.50, 0.80]), (2) concepts that 
should be clearly distinct from the MIST (discriminant valid-
ity; i.e., Bullshit Receptivity Scale; BSR; Pennycook et al., 
2015; Conspiracy Mentality Questionnaire; CMQ; Bruder 
et al., 2013), expecting a low to medium negative correlation 
with the MIST (Pearson r = [−0.50, −0.20]), and (3) an array 
of prominent psychological constructs of general interest (i.e., 
personality traits, attitudes, and cognitions including the Big 

Five, Dark Tetrad, Moral Foundations, Social Dominance 
Orientation, Ecological Dominance Orientation, religiosity, 
self-esteem, political cynicism, numeracy, and trust in vari-
ous public institutions and social agents) for which no a priori 
expectations were formulated. Third, we leveraged the size 
and representativeness of our samples to establish norm tables 
for the US and UK general populations as well as specific 
demographic and geographical subgroups.

Method: MIST‑16

Participants

We also collected a new dataset (Sample 2E; November 2022) 
with the best items per dimension that were identified using 
the EGA approach (the MIST-16). The dataset was collected 
using Respondi/Bilendi, in a nationally representative quota 
sample (N = 1213) of adults from the US. The sample compo-
sition was as follows: 54% identifying as female (44% male, 
2% nonbinary), 33% between 18 and 34 years, 31% between 
35 and 54 years, and 36% between 55 and 75 years; 24% of 
the participants reported coming from the Midwest (Illinois, 

Fig. 8  Plot of the confirmatory factor model estimated using the EGA four-factor structure (left) and the theoretical two-factor structure (right)

Table 5  Comparison of fit indices of the EGA four-factor model and 
the theoretical two-factor model

Structure TEFI CFI RMSEA

EGA four-factor −14.27 0.97 0.03
Theoretical two-factor −11.77 0.91 0.05

Table 6  The Satorra scaled difference test comparing the EGA four-
factor structure to the theoretical two first-order factor structure

Structure Df Chisq ChisqDiff DfDiff p

EGA four-factor 98 112.32
Theoretical two-factor 103 203.49 29.73 5 < .001


1883Behavior Research Methods (2024) 56:1863–1899 

1 3

Indiana, Iowa, Kansas, Michigan, Minnesota, Missouri, 
Nebraska, North Dakota, Ohio, South Dakota, and Wiscon-
sin), 17% from the Northeast (Connecticut, Maine, Massachu-
setts, New Hampshire, Rhode Island , Vermont, New Jersey, 
New York, and Pennsylvania), 40% from the South (Florida, 
Georgia, Maryland, North Carolina, South Carolina, Virginia, 
West Virginia, Delaware, Alabama, Kentucky, Mississippi, 
Tennessee, Arkansas, Louisiana, Oklahoma, and Texas), and 
20% from the West (Montana, Wyoming, Colorado, New 
Mexico, Idaho, Utah, Arizona, Nevada, Washington, Oregon, 
California, Alaska, and Hawaii) of the country.

Analytical strategy

Exploratory graph analysis—as well as hierarchical EGA 
(Jimenez et al., 2022)—was applied to the MIST items. The 
advantage of using hierarchical EGA (Jimenez et al., 2022) 
on the US representative quota sample collected (using the 
best MIST items identified in the first stage of EGA analy-
sis) is that as the sample size increases, there is a realistic 
chance of EGA estimating a structure reflecting general fac-
tors instead of first-order factors, if the dimensions are hier-
archical or form a generalized bifactor structure. Therefore, 
the item stability and structural consistency of the first-order 
factors were computed using a hierarchical EGA (Jimenez 
et al., 2022) version of bootstrap exploratory graph analysis 
(Christensen & Golino, 2021a).

We would like to note that the MIST-16 was developed 
and validated after the samples from the other validation 
(Studies 2A–2D) and application (Study 3) studies were col-
lected, due to the emergence of new psychometric methods. 
As the MIST-16 is not a subset of the MIST-20, we do not 
have the same nomological net and intervention evaluation 
data available for the MIST-16. However, as the correlation 
(in Study 1) between the MIST-20 and MIST-16 item sets is 
large, r = .81, 95% CI [.77, .84], p < .001, we can expect the 
MIST-20 results to be a close approximation.

Results: MIST‑20/MIST‑8

Internal consistency

For each sample, we employed SEM to assess model fit—
examining both a basic first-order model with two distinct 
factors (i.e., real news detection, fake news detection; 

without allowing the factors to correlate) and a theoreti-
cally derived higher-order model (Markon, 2019; Thurs-
tone, 1944; which establishes a relationship between the 
two factors) in which both first-order factors load onto a 
general second-order veracity discernment factor. We then 
calculated reliability estimates using internal consistency 
measures (inter-item correlations, item-total correlations, 
and McDonald’s ω). We used the lavaan package for SEM 
in R (Rosseel, 2012).

In keeping with our theoretical conceptualization of 
the MIST—with a general ability factor of veracity dis-
cernment, and two subordinate factors capturing real 
news and fake news detection, respectively—we fitted 
a higher-order model (Markon, 2019; Thurstone, 1944) 
in which both first-order factors load onto a general 
second-order veracity discernment factor (see Fig. 9). 
We first did this with Sample 2A (US quota sample 
from Respondi). Consistent with conventional guide-
lines (RMSEA/SRMR < .10 = acceptable; < .06 = excel-
lent; CFI/TLI > .90 = acceptable; > .95 = excellent; Clark 
& Watson, 2019; Finch & West, 1997; Hu & Bentler, 
1999; Pituch & Stevens, 2015; Schumacker et al., 2015), 
the model fits the data adequately (MIST-20: CFI = .90, 
TLI = .89, RMSEA  = .041, SRMR  = .040; MIST-8: 
CFI = .97, TLI = .95, RMSEA = .030, SRMR = .025).23 
We note that the χ2 goodness-of-fit test was significant—
signaling lack of fit (MIST-20: χ2 = 1021.86, p < .001; 
MIST-8: ; χ2 = 72.74, p < .001). However, this should be 
interpreted with caution, as the χ2 is a test of perfect fit 
and very sensitive to sample size. As such, as sample sizes 
approach 500, χ2 is usually significant even if the differ-
ences between the observed and model-implied covariance 
matrices are trivial (Bentler & Bonett, 1980; Curran et al., 
2003; Rosellini & Brown, 2021). Taken together, the find-
ings thus suggest an adequate model fit for the theoreti-
cally derived higher-order model.

Importantly, this model also yielded better fit than a tra-
ditional basic first-order model (with two distinct fake news 
and real news factors; MIST-20: χ2 = 1027.17, p < .001, 
CFI = 0.90, TLI = 0.89, RMSEA = 0.041, SRMR = 0.041; 
MIST-8: χ2 = 99.46, p < .001, CFI = 0.95, TLI = 0.93, 
RMSEA = 0.035, SRMR = 0.035). A likelihood-ratio test of 
the higher-order model versus the first-order model (which 

23 We acknowledge that there is a discussion in the literature on 
defining new (dynamic) fit values depending on the specific model 
tested (see McNeish & Wolf, 2021). For example, simulations using 
the ezCutoffs (Schmalbach et  al., 2019) package indicate we would 
need a CFI and a TLI of larger than 0.99 for excellent fit, in conjunc-
tion with an RMSEA of smaller than 0.04/0.03 (MIST-8/MIST-20) 
and an SRMR smaller than 0.03. However, as the new cutoff values 
are still under consideration and not well established, we focused on 
the conventional and—in this case also—preregistered cutoff values 
for our evaluation.

Table 7  Fit of the three- and two-dimensional structures of the MIST-
20 items

Structure TEFI CFI RMSEA

MIST-20 EGA three-factor −20.70 0.963 0.029
MIST-20 Theoretical two-factor −16.93 0.955 0.032


1884 Behavior Research Methods (2024) 56:1863–1899

1 3

did not include a correlation between the two factors) was 
significant for both the MIST-20 and the MIST-8 (MIST-
20: ∆χ2 = 5.35, p = .021, MIST-8: ∆χ2 = 26.29, p < .001), 
indicating a better fit for the higher-order model.

Sample comparison Across all four samples, we success-
fully reproduced the original higher-order model, with 
parameters indicating good fit, as well as good internal 
consistency in all four samples (see Table 8 for a com-
plete overview).24 A similar fit is found between the US 
Respondi and UK Respondi samples, indicating that the 
MIST works similarly in the UK as it does in the US.25 
Meanwhile, larger differences are found between the US 
Respondi and the US CloudResearch samples, and between 
the UK Respondi and the UK Prolific samples, indicating 
that sampling platform plays a larger role than nationality 
when administering the MIST even when using representa-
tive quota sampling.

Nomological network26

Convergent validity As preregistered, in Sample 2B27—
which was the sample we primarily relied on in constructing 
the nomological network, as it offered the widest coverage 
of psychological constructs among our validation samples—
the correlation between the general MIST-20 score and the 
DEPICT Balanced Short Form measure (Maertens et al., 
2021) was found to be positive and medium to large, with 
a significant Pearson correlation of .54 (95% CI [.48, .60], 
p < .001).28 The MIST-20 correlation with the Go Viral! 
inventory (Basol et al., 2021) was lower than the estimated 
value but was significantly correlated, with a Pearson corre-
lation of .26 (95% CI [.18, .34], p < .001). Similarly, regard-
ing incremental validity, the additional explained variance 
in the DEPICT Balanced Short Form measure above and 
beyond the CMQ and the BSR is at the upper side of our 
prediction, with an additional 20% of variance explained, 

25 We would like to stress that this does not imply measurement invar-
iance and would like to caution researchers to compare results directly 
between countries. The current data indicate that the MIST works in 
the US and the UK and likely measures the same latent construct, but 
it does not mean that the results are directly comparable. We recom-
mend researchers and practitioners keep the focus on comparisons 
within instead of between countries. For a detailed discussion about 
cross-cultural generalizability please see Deffner et al. (2022).

24 Supplement S9 includes model plots for both the MIST-20 and 
MIST-8 for all samples. 26 This section focuses on the nomological network of the general 

ability factor (veracity discernment) of the MIST-20. However, we 
have also constructed nomological networks for the subcomponents 
of the MIST as well as the MIST-8. For parsimony’s sake, these are 
reported in Supplements S10-S12.
27 Some variables were only analyzed in specific samples, as not all 
variables were present in all datasets.
28 See https:// aspre dicted. org/ nx7xu. pdf for the preregistration 
(Sample 2B).

Fig. 9  Plot of higher order MIST-8 SEM model in Sample 2A (N = 3479)

https://aspredicted.org/nx7xu.pdf


1885Behavior Research Methods (2024) 56:1863–1899 

1 3

whereas with 3% it is under the predicted value for the Go 
Viral! inventory.29 For a more detailed account, see Supple-
ment S13. In addition, in Sample 2A, we measured belief in 
COVID-19 myths, which was significantly positively cor-
related and within the preregistered strength of convergent 
validity measures (r = −.51, 95% CI [−.55, −.47], p < .001).

Discriminant validity As preregistered for Sample 2B, the 
MIST-20 was moderately negatively correlated with the BSR 
(r = −.21, [−.29, −.13], p < .001) and the CMQ (r = −.38 
[−.45, −.30], p < .001). Overall, the correlational pattern of 
our nomological network supports the construct validity of 
the MIST, with the MIST being more strongly correlated with 
the convergent measures than with the discriminant measures 
(Campbell & Fiske, 1959; Rosellini & Brown, 2021).

CRT (Sample 2A) In line with other studies finding a role for 
the CRT in misinformation detection (e.g., Pennycook & 
Rand, 2019), we found a significant correlation between the 
MIST score and the cognitive reflection test, or CRT (r = .29, 
95% CI [.26, .32], p < .001).

AOT (Sample 2A) We found an even larger significant cor-
relation between the MIST score and actively open-minded 
thinking or AOT (r = .49, 95% CI [.46, .51], p < .001).

BFI (Sample 2B) Contrary to our preregistered exploratory 
hypotheses, in Sample 2B the MIST-20 score was not sig-
nificantly correlated with openness, r = .02, 95% CI [−.06, 

.11], p = .594, and agreeableness was not negatively corre-
lated with distrust d, r = .05, 95% CI [−.04, .14], p = .255.30 
The MIST-20 score was also not significantly correlated 
with agreeableness (r = .05, 95% CI [−.04, .14], p = .271) 
or extraversion (r = −.07, 95% CI [−.15, .02], p = .141), but 
did significantly correlate with conscientiousness (r = .10, 
95% CI [.02, .19], p = .020) and neuroticism (r = −.14, 95% 
CI [−.23, −.06], p = .001).

DT (Sample 2B) The MIST-20 score was negatively correlated 
with each of the four Dark Tetrad traits: Machiavellianism 
(r = −.09, 95% CI [−.17, −.00], p = .047), narcissism (r = −.26, 
95% CI [−.34, −.18], p < .001), psychopathy (r = −.30, 95% 
CI [−.37, −.22], p < .001), and sadism (−.22, 95% CI [−.30, 
−.12], p < .001). However, contrary to our preregistered 
exploratory hypothesis, Machiavellianism was not negatively 
correlated with naïvité n, r = .16, 95% CI [.07, .24], p < .001.

Trust measures (Sample 2B) In line with our preregistered 
exploratory hypotheses, we found that the MIST score was 
correlated with trust in science, r = .33, 95% CI [.25, .41], 
p < .001, scientists, r = .36, 95% CI [.28, .43], p < .001, and 
mainstream media, r = .18, 95% CI [.09, .26], p < .001. In addi-
tion, we found that trust in doctors, r = .36, 95% CI [.28, .43], 
p < .001, journalists, r = .19, 95% CI [.11, .27], p < .001, and 
officials, r = .09, 95% CI [.00, .17], p = .049, was significantly 

29 It must be noted that the Go Viral inventory is not a validated meas-
urement instrument. Results should be interpreted in light of this.

30 The lack of a significant correlation between the MIST score and 
openness is somewhat surprising given the strong correlation between 
the MIST and the AOT score, indicating that openness as measured 
in the Big Five is not the same as open-minded thinking as measured 
by the AOT.

Table 8  Model fit overview

Total N = 6461. Samp = sample. Plat = sampling platform. Pop = sample population. CI = confidence interval; LL = lower limit; UL = upper 
limit. R = Respondi. C = CloudResearch. P = Prolific. ωtot = McDonald’s Omega. 3F reflects whether the three-factor (higher-order) model pro-
vided better fit than the two-factor (two-order) model. ⚬ = descriptively better fit but not significant; * p < .05, ** p < .01, *** p < .001

MIST-20
Samp. Plat. Pop. χ² p CFI TLI RMSEA 95% CI SRMR ωtot 3F

LL UL
2A R US 1021.86 < .001 0.90 0.89 0.041 0.039 0.044 0.040 0.76 *
2B C US 264.66 < .001 0.92 0.91 0.035 0.027 0.043 0.051 0.75 ⚬
2C R UK 473.56 < .001 0.91 0.90 0.041 0.037 0.046 0.049 0.81 ***
2D P UK 432.12 < .001 0.86 0.85 0.038 0.034 0.042 0.045 0.70 ***

MIST-8
Samp. Plat. Pop. χ² p CFI TLI RMSEA 95% CI SRMR ωtot 3F

LL UL
2A R US 72.74 < .001 0.97 0.95 0.030 0.023 0.037 0.025 0.57 ***
2B C US 30.32 .048 0.96 0.94 0.036 0.003 0.058 0.040 0.58 *
2C R UK 64.13 < .001 0.94 0.91 0.045 0.033 0.058 0.040 0.62 ***
2D P UK 46.91 < .001 0.93 0.90 0.037 0.023 0.050 0.035 0.55 ***


1886 Behavior Research Methods (2024) 56:1863–1899

1 3

positively correlated, while trust in the government, r = −.11, 
95% CI [−.20, −.02], p = .012, was significantly negatively cor-
related with the MIST-20. We found no significant correlation 
for either of the two trust-in-politicians scales, ra = −.06, 95% 
CI [−.14, .03], p = .210, rb = .07, 95% CI [−.02, .15], p = .131.

Additional associations For a summary and discussion of 
the exploratory analyses of MFQ, SDO, EDO, numeracy, 
anti-vaccination attitudes, self-esteem, religiosity, trust, ide-
ology, and demographics, please see Supplement S14.

Detailed summary figures separated by outcome category 
are available in Supplements S10-S12.

National norms

We used the Respondi samples for each country (i.e., Sam-
ple 2A for the US and Sample 2C for the UK) to generate 
norm tables for general veracity discernment as well as fake 
news and real news detection.31 As can be gleaned from 
Table 9, the norms for the two countries were very similar, 
with minor deviations of single score points, further corrob-
orating evidence for the cross-cultural validity of the MIST. 
Table 10 exhibits norms for the general US population.

Full norm tables for the US and the UK, including spe-
cific norms based on age (US, UK) and geography (US; i.e., 
9 census divisions, 4 census regions), as well as means and 
standard deviations per item, including a per-item compari-
son between Democrats (US)/liberals (UK) and Republicans 
(US)/conservatives (UK), are available in Supplement S15.

Results: MIST‑16

Exploratory graph analysis was applied to the MIST-16 
items, as well as hierarchical EGA (Jimenez et al., 2022).32 

The item stability and structural consistency of the first-
order factors were computed using a hierarchical EGA 
(Jimenez et al., 2022) version of bootstrap exploratory graph 
analysis (Christensen & Golino, 2021a).33 The traditional 
EGA technique indeed identified only two dimensions (real 
and fake news items, see Fig. 10). The hierarchical EGA 
technique, on the other hand, identified the original four-
dimensional (first-order) structure and two general factors 
(real and fake news items, see Fig. 11).

A parametric bootstrap EGA using the hierarchical EGA 
method (Jimenez et al., 2022) showed that the four dimen-
sions are very stable, being estimated in 90.8% of the 500 
bootstrapped samples. In terms of item stability, the MIST-
16 EGA items presented very high stability, except for item 
MIST 73, which was estimated on their empirical hierarchi-
cal EGA first-order dimension in 73% of the bootstrapped 
samples (see Fig. 12).

Discussion

In Study 2, we consolidated and expanded the psychomet-
ric properties of the MIST. First, we conducted confirma-
tory factor analyses across four samples with representative 
quota from the US and the UK, consistently replicating the 
higher-order structure yielding good model fit and internal 
consistency for both the MIST-8 and the MIST-20. Next, we 
constructed an extensive nomological network of the MIST 
to assess construct validity (Cronbach & Meehl, 1955). As 
preregistered, and similar to Study 1, in Sample 2B we found 
a high correlation between the MIST score and the DEPICT 
misinformation inventory, supporting convergent validity. 
Similarly, in Sample 2A we found a medium to high negative 
correlation between the MIST-20 and a COVID-19 misinfor-
mation beliefs inventory, further attesting to the measure’s 
convergent validity. In addition, we demonstrated that both 

31 We chose to create the norm tables based on the Respondi samples 
instead of pooling all samples, as through recent projects we found 
some evidence indicating that Respondi samples provide more rep-
resentative levels of numeracy, education, and ideology than Prolific, 
and our experience with CloudResearch is limited.
32 Due to an error in the Qualtrics system, only 15 items were pre-
sented to the participants. Item MIST 16 (Left-Wingers Are More Likely 
to Lie to Get a Good Grade) was left out of the data collection system.

33 As pointed out earlier, the advantage of using hierarchical EGA 
(Jimenez et  al., 2022) on the US representative quota sample (col-
lected using the MIST-16 EGA items) is that as the sample size 
increases, there is a meaningful chance that the EGA estimates a 
structure reflecting general factors instead of first-order factors, if the 
dimensions are hierarchical or form a generalized bifactor structure.

Table 9  MIST norm score comparison between US and UK samples

Scale Sample Minimum 1st Quartile Median Mean 3rd Quartile Maximum

MIST-8
US 0 4 6 6 7 8
UK 0 4 5 5 7 8

MIST-20
US 4 11 14 14 17 20
UK 4 11 13 13 16 20


1887Behavior Research Methods (2024) 56:1863–1899 

1 3

the MIST-8 and the MIST-20 explain considerable extra 
variance above the existing CMQ and BSR scales (MIST-
20: ∆R2 = 20%, MIST-8: ∆R2 = 14%), indicating substantial 
incremental validity (Clark & Watson, 2019). Surprisingly, 
however, the correlations of each of the MIST, CMQ, and 
the BSR with the Go Viral! items were all low (r < .30). Nev-
ertheless, the MIST-20 remained the single best predictor 
for the Go Viral! items, significantly improving the variance 
explained in a combined model on top of the CMQ and BSR 
measures (∆R2 = .03). In terms of discriminant validity, as 
preregistered, in Sample 2B we observed moderate negative 
associations between the MIST-20 and the BSR as well as the 
CMQ. In Sample 2A, we also found preliminary evidence for 
the role of actively open-minded thinking (AOT) as a potential 
vehicle for better distinction between fake and real news. This 
aligns with previous research showing that AOT is related to 
more critical information source evaluation (Baron, 2019) and 
decreased susceptibility to fake news (Pennycook & Rand, 
2020, Pennycook & Rand, 2021).

Within the realm of trait measures we found relatively small 
correlations with the core personality traits. Contrary to our 
expectations, openness, extraversion, and agreeableness were 
not significantly related to the MIST-20. Meanwhile, conscien-
tiousness exhibited a small positive association. This dovetails 

well with previous research finding that individuals high in 
conscientiousness are more likely to read news offline (rather 
than relying solely on social media; Sindermann et al., 2020) 
and less likely to share fake news (Lawson & Kakkar, 2021) 
and engage in conspiracist ideation (Brotherton et al., 2013). 
We also found a small negative association with neuroticism. 
As neuroticism is widely understood as a stable predisposition 
to experience anxiety and fear (Eysenck, 1967; Hofstee et al., 
1992; Soto & John, 2017), this is consistent with previous 
work identifying fear and trait anxiety as positive predictors 
of conspiracy beliefs (Grzesiak-Feldman, 2013; Swami et al., 
2016) as well as other studies finding that those high in neu-
roticism tend to rely on social media news feeds and are thus 
more likely to get caught in filter bubbles and echo chambers 
(Sindermann et al., 2020). Larger correlations were found with 
the Dark Tetrad personality traits, which were all negatively 
related to the MIST-20 score. While the links with Machi-
avellianism, psychopathy, and sadism are novel, the positive 
association with narcissism dovetails well with previous work 
demonstrating narcissists’ greater susceptibility to conspiracies 
(Cichocka et al., 2016; Kumareswaran, 2014).

Meanwhile, in Sample 2E, we successfully validated the 
psychometric strength of the EGA-based MIST-16, which also 
showed evidence for two general factors, fake news detection 
and real news detection, as well as two facets for each. While 
EGA uses an entirely different approach for item analysis and 
selection, the convergent outcome of two general factors and 
the overlap in the item sets between the two methods show that 
it is possible—using a variety of methodologies—to develop a 
psychometrically validated misinformation susceptibility test 

Table 10  MIST-20 general population norms for the United States 
(N = 3479)

V (Veracity discern-
ment)

f (Fake news detec-
tion)

r (Real news detec-
tion)

Percentile Score Percentile Score Percentile Score

0% 4 0% 0 0% 0
5% 8 5% 3 5% 2
10% 9 10% 4 10% 3
15% 10 15% 5 15% 4
20% 10 20% 5 20% 4
25% 11 25% 6 25% 5
30% 12 30% 7 30% 5
35% 12 35% 7 35% 6
40% 13 40% 7 40% 6
45% 14 45% 8 45% 7
50% 14 50% 8 50% 7
55% 15 55% 8 55% 7
60% 15 60% 9 60% 7
65% 16 65% 9 65% 8
70% 16 70% 9 70% 8
75% 17 75% 9 75% 8
80% 17 80% 10 80% 9
85% 18 85% 10 85% 9
90% 19 90% 10 90% 10
95% 19 95% 10 95% 10
100% 20 100% 10 100% 10

Fig. 10  Structure estimated via EGA using the validation sample


1888 Behavior Research Methods (2024) 56:1863–1899

1 3

with congruent results. Meanwhile, the EGA data show that 
EGA is a useful new method psychologists can use to design 
misinformation detection scales (or indeed, any scale), enlarg-
ing the toolkit available for scale development.

All in all, the nomological network largely confirmed the 
preregistered relationship patterns—thus corroborating the 
MIST’s construct validity—while at the same time demon-
strating new insights that can be gained by using the MIST-
20 measure, which may stimulate further research. Finally, 
we leveraged the large size and national representativeness 
of our validation samples to produce norm tables for the 
UK and US general populations as well as distinct demo-
graphic subgroups in the UK and the US and geographical 
subgroups in the US.

Study 3: Application—A nuanced 
effectiveness evaluation of a popular media 
literacy intervention

In Study 3, we demonstrate how the MIST can be used in 
conjunction with the Verification done framework and norm 
tables.34 We employ the MIST-8 in a simple within-groups 
pretest /post-test design with the Bad News Game, a major 
media literacy intervention played by over a million people 
(Roozenbeek & van der Linden, 2019). The Bad News Game 

is based on inoculation theory (van der Linden & Roozenbeek, 
2020), and both its theoretical mechanisms and its effects 
have been replicated multiple times (see, e.g., Maertens et al., 
2021; Roozenbeek, Maertens et al., 2021), making it a well-
established intervention in the literature as a tool to reduce 
misinformation susceptibility. We therefore hypothesized that 
the intervention would improve veracity discernment (ability 
to accurately distinguish real news from fake news), real news 
detection (ability to correctly flag real news), and fake news 
detection (ability to correctly tag fake news). In addition, we 
hypothesized that the Bad News Game would decreases both 
distrust (negative judgment bias or being hyper-skeptical) and 
naïvité (positive judgment bias or believing everything). We 
used norm tables to establish where the baseline MIST scores 
of our convenience sample lay.

Method

Participants

We collected data from an online community sample of 4024 
participants who played the Bad News Game (www.getbad-
news.com) between 7 May 2020 and 29 July 2020 and who 
agreed to participate in the in-game survey. After filtering 
out participants who did not complete the full study, did 
not have prior experience with the game, were underage, 
or entered the study multiple times, and lived outside of the 
United States, 421 participants remained.35 Based on earlier 

34 A MIST implementation guide explaining how researchers and 
practitioners can set up the MIST in their studies as well as how to 
calculate the Verification done (Vrf dn) scores can be found in Sup-
plement S17. An example Qualtrics survey and a score calculation R 
script are available in the OSF repository: https:// osf. io/ r7phc/.

35 We restricted our sample to US residents, as we did not have a UK 
filter and have not yet validated the MIST in any other country.

Fig. 11  Structure estimated via hierarchical EGA using the validation sample

https://osf.io/r7phc/


1889Behavior Research Methods (2024) 56:1863–1899 

1 3

studies evaluating the Bad News Game (Maertens et al., 
2021; Roozenbeek, Maertens et al., 2021), we aimed to be 
highly powered (power = .90, α = .05) to detect a Cohen’s 
d effect size of 0.250, which required a sample size of 338, 
which we exceed in this sample. The power was calculated 
using the R pwr package (Champely et al., 2021).

On average, participants were young (55.58% 18–29 
years, 32.30% 30–49, 12.11% over 50), 52.02% identified 
as female (41.09% male, 6.89% other), and 86% had either 
a higher education degree or some college experience (see 
Table 1 for a complete demographics overview). The median 
ideology on a scale from 1 (liberal) to 7 (conservative) was 
3 (M = 2.88, SD = 1.39), indicating a slightly left-leaning 
audience.

Procedure and measures

Individuals who played the Bad News Game (Roozenbeek 
& van der Linden, 2019) were invited to participate in the 
study. The Bad News Game (www.getbadnews.com) is a free 
online browser game in which players learn about six com-
mon misinformation techniques over the course of 15 min-
utes in a simulated social media environment (see Roozen-
beek & van der Linden, 2019, for a detailed discussion). In 
the current study, after providing informed consent, indi-
viduals completed the MIST-8 both before and after playing 
the Bad News Game. Participation was completely volun-
tary, and no rewards, monetary or otherwise, were offered. 

This study was approved by the Psychology Research Ethics 
Committee of the University of Cambridge (PRE.2020.120, 
PRE.2020.136).

Analytical strategy

After contextualizing our findings by juxtaposing the sam-
ple’s baseline findings to the US general national norms 
derived in Study 2, we conducted repeated-measures t-tests 
for veracity discernment (M = 6.23, SD = 1.53) and for 
the four subcomponents of the MIST—fake news detec-
tion (M = 3.19, SD = 0.92), real news detection (M = 3.04, 
SD = 0.95), distrust (M = 0.31, SD = 0.63), and naïvité 
(M = 0.46, SD = 0.69).

Results

Baseline

We found that our US convenience sample scored higher 
on the MIST than the US population average for verac-
ity discernment (see Study 2;  1st QuartilePopulation = 4,  1st 
QuartileSample = 6).36

36 We found similar results when looking at fake news detection  (1st 
QuartilePopulation = 2,  1st QuartileSample = 3) and real news detection 
 (1st QuartilePopulation = 2,  1st QuartileSample = 3).

Fig. 12  Item stability of the hierarchical EGA first-order structure in the validation sample


1890 Behavior Research Methods (2024) 56:1863–1899

1 3

Hypothesis tests

V—Veracity discernment Contrary to our expectations, we 
did not find a significant effect of veracity discernment 
post-intervention relative to pre-intervention (Mdiff = 0.11, 
95% CI [−0.01, 0.23], t(420) = 1.80, p = .072, d = 0.088, 
95% CI [−0.103, 0.279]). See Fig. 13, Panel A for a bar 
plot.

r—Real news detection While we found an effect of the 
intervention on real news detection, the effect was in the 
opposite direction of our prediction (Mdiff = −0.17, 95% CI 
[−0.26, −0.08], t(420) = −3.72, p < .001, d = −0.181, 95% 
CI [−0.373, 0.011]). See Fig. 13, Panel B, for a bar plot.

f—Fake news detection In line with our expectations, we did 
find a positive effect of the intervention on fake news detec-
tion (Mdiff = 0.28, 95% CI [0.20, 0.36], t(420) = 6.81, p < .001, 
d = 0.332, 95% CI [0.138, 0.525]). See Fig. 13, Panel C for 
a bar plot.

d—Distrust Contrary to our hypothesis, we observed 
an increase in distrust (Mdiff = 0.31, 95% CI [0.22, 0.40], 
t(420) = 6.94, p < .001, d = 0.338, 95% CI [0.144, 0.532]). 
See Fig. 13, Panel D for a bar plot.

n—Naïvité As hypothesized, we did find a significant reduc-
tion in naïvité after intervention (Mdiff = −0.14, 95% CI 
[−0.20, −0.07], t(420) = −4.12, p < .001, d = −0.201, 95% 
CI [−0.392, −0.008]). See Fig. 13, Panel E for a bar plot.

See Supplement S16 for a detailed summary table with 
variable descriptive statistics and difference scores.

Discussion

Traditionally, evaluators of Bad News Game (e.g., Roozen-
beek & van der Linden, 2019) only looked at a small 
amount of (ad hoc-created) real news items and focused 
on participants’ reliability ratings of a large set of fake 
news items. Study 3 showed that using the MIST in con-
junction with the Verification done framework provided 
novel insights contrary to our expectations. Although 
trending towards an effect in the expected direction, par-
ticipants did not become significantly better at general 
news veracity discernment after playing the Bad News 
Game (p = .072). Looking at the MIST facet scales, we did 
find significant differences in both fake news detection and 
real news detection. More specifically, we observed that 
while people improved in the detection of fake news, they 
also became worse at the detection of real news. Looking 
further at response biases, we can also see that the Bad 
News Game might increase general distrust in news head-
lines while also diminishing naïvité. At first sight, these 
results seem to indicate that the intervention does decrease 
people’s susceptibility to fake news and reduces general 
naïvité, but at a potential cost of increased general distrust 
(hyper-skepticism). Whether this means the intervention 
works depends on the aim: to decrease susceptibility to 
misinformation, or to increase the ability to accurately 

Fig. 13  Plot of Verification done variables applied to the Bad News Game (N = 421). T1 = pretest. T2 = post-test


1891Behavior Research Methods (2024) 56:1863–1899 

1 3

discern real news from fake news. The Verification done 
framework allows interventionists to start differentiating 
these important questions both theoretically and empiri-
cally, and we encourage researchers and practitioners to 
use the framework independently of the misinformation 
susceptibility measure used.

One reason why the pattern for the subordinate factors 
may be found is that the Bad News Game focuses mainly 
on detecting misinformation and warning people about the 
threats of misinformation, and is less focused on recogniz-
ing real news (Roozenbeek & van der Linden, 2019). In 
addition, as the evidence shows there may be counteract-
ing effects (increased distrust but also improved fake news 
detection), the lack of significant effects for the general fac-
tor (the discernment variable) may therefore also be due to 
these counteracting effects, resulting in an effect that is too 
small to measure with our sample (N = 421), especially in 
the context of a short 15-minute intervention in combina-
tion with an 8-item scale. Finally, it is also possible that the 
intervention may simply not be sufficient to make a large 
enough impact on a general susceptibility factor.

In addition, as recommended by our framework, these 
results need to be interpreted in conjunction with the norm 
tables. The general sample that was recruited was already 
highly media-literate. The first quartile of the pretest MIST 
scores was higher than the population average (verac-
ity discernment:  1st  QuartilePopulation = 50% accuracy,  1st 
 QuartileSample = 75% accuracy). Effects of the intervention 
might therefore be different with a more representative sam-
ple, or for people performing worse during the pretest phase.

The results of this study come with two caveats. First, the 
MIST-8 was used instead of the MIST-20. As is common 
for short scales (Rammstedt et al., 2021; Thalmayer et al., 
2011)—while maintaining high psychometric quality—the 
parsimonious MIST-8 is less precise and less reliable than 
the MIST-20. Since the MIST-20 only takes about 2 minutes 
to complete, we recommend researchers use the MIST-20 
whenever possible. Second, while we were sufficiently pow-
ered to detect effect sizes similar to the original evaluation 
of the intervention (Roozenbeek & van der Linden, 2019), 
with a sample of 421 participants—as is also reflected in the 
rather large confidence intervals—we did not have sufficient 
statistical power to detect smaller nuances (Anvari & Lakens, 
2021; Funder & Ozer, 2019; Götz, Gosling, et al., 2022).

The results of this study indicate the importance of look-
ing at misinformation susceptibility in a more holistic way. 
Applying the Verification done framework, we discovered 
key new theoretical dimensions that previous research had 
overlooked. Evaluators of this intervention, and other inter-
ventions, can now disentangle and accurately measure the 
five dimensions of misinformation susceptibility, thereby 
expanding our understanding of both the underlying mecha-
nisms and the intervention’s practical impact.

General discussion

We explained the necessity of having a multifaceted measure-
ment of misinformation susceptibility, and based on theoreti-
cal insights from previous research, developed the Verifica-
tion done framework. Then, in three studies and six samples 
from two countries, we developed, validated, and applied the 
Misinformation Susceptibility Test (MIST): a holistic test 
which allows the assessment of veracity discernment ability, 
its facets fake news detection ability and real news detection 
ability, and judgment biases distrust and naïvité.

In Study 1, we derived a development protocol, gener-
ated a set of fake news headlines using the GPT-2 neural 
network—an advanced language-based machine learning 
algorithm—and extracted a list of real news headlines from 
neutral and well-trusted sources. Through psychometric 
analysis using factor analysis and item response theory, we 
developed the MIST-8, MIST-16, and the MIST-20 tests.

In Study 2, we recruited five samples with nationally rep-
resentative quota, two each for the US and the UK, from three 
different recruitment platforms, and followed a multifaceted 
validation strategy with the aim of gaining insights into the 
measure’s validity and replicability. First, confirmatory factor 
analyses consistently favored the higher-order structure and 
yielded satisfactory properties that suggest high validity and 
good reliability of both the MIST-8 and the MIST-20. Second, 
adopting a wide-net approach, we constructed an extensive 
nomological network. We found the MIST-8 and MIST-20 to 
be consistently highly correlated with various fact-check tests—
the “COVID-19 fact-check” headline evaluation task (Penny-
cook, McPhetres, et al., 2020) and the and “DEPICT” social 
media post reliability judgment task (Maertens et al., 2021)—
thus signaling convergent validity—while being clearly distinct 
from the existing Conspiracy Mentality Questionnaire (CMQ) 
and the Bullshit Receptivity Scale (BSR), hence providing evi-
dence for discriminant validity. The correlation with ad hoc 
headline evaluation tasks is strong enough to show that they are 
measures of a similar construct, but it is also weak enough to 
demonstrate that they are sufficiently distinct. The MIST offers 
a reliable, standardized, and validated alternative to these ad 
hoc tests, with high predictive validity for a wide set of scales, 
as well as norm tables. However, due to the high stability of the 
MIST, it is possible that the MIST may turn out to be particu-
larly useful for subgroup analyses, and may be less sensitive for 
the measurement of (small) intervention effects. In addition, 
the MIST aims to measure generalized susceptibility to misin-
formation, which is not tailored to the skills trained in specific 
interventions. Therefore, the MIST is not meant to replace ad 
hoc measures, but can exist in conjunction with them, depend-
ing on the outcome variable of interest. Moreover, we presented 
MIST-20 and MIST-8 norm tables for both the UK and the US 
based on our large samples with nationally representative quota, 
which can be used to contextualize effects.


1892 Behavior Research Methods (2024) 56:1863–1899

1 3

Using a new, modern, psychometric method, namely 
exploratory graph analysis (EGA; Golino & Epskamp, 2017), 
we showed a proof of concept of how EGA can be used to 
help with establishing the factor structure, the item selection, 
and the validation of scales such as the MIST. In both Study 
1 and Study 2 we show how EGA can lead to potentially 
more stable item selection than when using the traditional 
EFA and IRT methods, and present an alternative version of 
the MIST: the MIST-16. Meanwhile, further analyses reveal 
that EGA can help to detect extra dimensions as facets of 
the general factors. Interestingly, the validation sample (Sam-
ple 2E) showed that a structure with two generalized factors 
and four facets had the best fit, potentially informing misin-
formation theorists on further dimensions to explore when 
researching the nature of misinformation. Meanwhile, it also 
corroborated more evidence that misinformation susceptibil-
ity can be viewed through the lens of two general factors (real 
news detection, fake news detection), and robustly measured 
as such. This congruence between these two very different 
psychometric methods shows the robustness of our psycho-
metric toolkit and the ability for it to produce reliable scales 
to measure psychological constructs.

In the third and last study, we demonstrated how Veri-
fication done and the MIST can be employed in naturalis-
tic settings, in this case to evaluate the general effects of a 
highly popular inoculation intervention. Employing a vali-
dated measure to evaluate interventions in combination with 
the norm tables—which have not been used in this field 
before—we were able to uncover new mechanisms behind 
a well-known media literacy intervention, the Bad News 
Game (Maertens et al., 2021; Roozenbeek & van der Lin-
den, 2019), and highlighted both weaknesses and strengths 
of this intervention that had not been detected before using 
the classical methods. For example, while the intervention 
is typically evaluated by looking at fake news reliability rat-
ings (e.g., Roozenbeek & van der Linden, 2019) without an 
evaluation framework or norm tables, we were now able to 
unveil important dynamics between fake news, distrust, and 
real news detection. Moreover, our approach allowed us to 
establish that the average participant who chose to partici-
pate in the intervention already scored above the norm when 
completing the pretest. Moreover, for the first time, we were 
able to disentangle the five dimensions of misinformation 
susceptibility using a validated and standardized item set, 
finding unexpected changes in judgment biases as well as in 
real news detection (which other research does not necessar-
ily find; see Roozenbeek & van der Linden, 2019), which can 
inspire further research and theoretical development. Never-
theless, we must emphasize that the MIST is a generalized 
measure of susceptibility, relevant for measuring an overarch-
ing skill, which is not the sole focus of the Bad News Game 
intervention. For example, there is a wide range of evidence 
that shows that the Bad News Game is effective at improving 

the detection of specific manipulation techniques that typi-
cally underlie misinformation that the participant was trained 
on (e.g., appeal to emotion, polarizing language; Roozenbeek 
& van der Linden, 2019; Lewandowsky & van der Linden, 
2021). Improvements in those specific skills can be best iden-
tified with a tailored measurement instrument rather than a 
“general” measure such as the MIST.

Overall, these studies show that it is feasible to develop a 
psychometrically validated measurement instrument for mis-
information susceptibility. Moreover, the evidence discussed 
in the studies, and in particular the analyses of Table 3, Sup-
plement S13, and Supplement S18, show clear evidence for 
the utility—or indeed superiority—of the new measure com-
pared to other measures in terms of predicting outcomes.

Implementation

An overview of the MIST-20, MIST-16, and MIST-8 item 
sets can be found in Supplement S21. For an implementation 
and scoring guide, please see Supplement S17. The supple-
ments can be found on the OSF repository at https:// osf. io/ 
r7phc/.

Open‑Source web application

To facilitate the implementation of the MIST, we pro-
grammed an open-source, user-friendly, online version of 
the MIST-20, called YourMIST: an interactive self-assess-
ment tool designed for easy accessibility and repurposing by 
individuals, researchers, and practitioners. Our implementa-
tion of the MIST-20 utilizes the Python programming lan-
guage and the Streamlit web development module to enable 
a web-based quiz that provides personalized feedback to 
users. The tool reports scores for each of the components of 
the Verification done framework, accompanied by detailed 
explanations and a comparison with the US and UK popula-
tion scores. Our web app and the source code are publicly 
accessible for individual use and adaptation on the OSF 
repository at https:// osf. io/ r7phc/.

Limitations and future research

While we firmly believe that the MIST and Verification done 
mark a substantial methodological advance in the field of 
misinformation research (Bago et al., 2020; Batailler et al., 
2022; Roozenbeek, Maertens et al., 2021; Rosellini & Brown, 
2021; Zickar, 2020), it is of course not without limitations. 
An inevitable challenge of doing any type of systematic and 
methodologically rigorous news headline research lies in the 
fact that what might be real news at one point in time might 
be outdated at a later point in time, while—albeit admittedly 
much less likely—what is fake news at one point in time 
might become true or more credible at a later point in time. 

https://osf.io/r7phc/
https://osf.io/r7phc/
https://osf.io/r7phc/


1893Behavior Research Methods (2024) 56:1863–1899 

1 3

Therefore, similar to an IQ test, it may be necessary to update 
the MIST over time. Nevertheless, in recent studies, the MIST 
still shows similar validity as it did 2 years ago. To illustrate, 
in a recent research project by Said et al. (2023, in prep), a 
new US quota sample was collected through Respondi with 
547 respondents, and both the MIST-8 and MIST-20 showed 
good internal and predictive validity similar to the original 
sample (see Supplement S7). For example, the fit indices of the 
MIST sample collected in August 2022 (MIST-20: CFI = 0.92, 
TLI = 0.91, RMSEA = 0.039, SRMR = 0.052) showed similar—
and for some indices better—fit relative to the sample col-
lected in September 2020 (MIST-20: CFI = 0.90, TLI = 0.89, 
RMSEA = 0.041, SRMR = 0.040). Similarly, the MIST-20 was 
an even better predictor of performance on the DEPICT decep-
tive headlines recognition task (Maertens, Roozenbeek, et al., 
2021) in the August 2022 (r = .64, p < .001) sample than it was 
in the April 2020 sample (r = .50, p < .001).

Another related limitation concerns the inherent difficulty 
in the MIST’s cross-cultural application. While we are greatly 
encouraged by our finding that the MIST appears to be an 
equally effective measure in the UK as in the US-American 
cultural context in which it was originally developed, cross-
cultural translation poses a challenge. For obvious reasons, 
a simple and direct translation may not be sufficient. At the 
same time, while trustworthy news sources from which real 
news items could be extracted can doubtlessly be identified 
in any language, at the time the MIST-20 was developed, 
the GPT-2 (Radford et al., 2019)—the advanced language-
based neural network algorithm that we employed to generate 
fake news items—was mainly trained on English language 
corpora. Meanwhile, however, an increasing amount of new 
research and applications has managed to make the GPT-2 
work in the context of other languages (see, e.g., de Vries & 
Nissim, 2020; Guillou, 2020; for promising initial applica-
tions in Dutch, Italian, and Portuguese). Moreover, the recent 
arrival of GPT-3 and GPT-4, which have support for an 
increasingly wide range of languages, now enables the field 
to develop non-English adaptations of the MIST that will 
empower researchers around the globe to capture the com-
plex and multifaceted reality of misinformation spread—and 
resistance. Even without the GPT-2, researchers can create a 
database of their own misinformation items and use the same 
psychometric techniques as outlined in this paper to come 
to a valid misinformation susceptibility test in any culture. 
Therefore, we see this paper as a proof of concept on the fea-
sibility of using psychometrics to develop a comprehensive 
misinformation susceptibility test in any culture.

One other concern that may be raised is that the MIST 
may be confounded with general news consumption, mean-
ing that those who are more aware of the news may be more 
likely to score high on the MIST and controlling for this 
may reduce the MIST’s predictive validity, and that mis-
information news engagement is often driven by partisan 

polarization and outgroup derogation (Osmundsen et al., 
2021). To investigate these concerns, we looked at data 
from a separate study that is currently being prepared, which 
contains the MIST, the CMQ, and a social media misinfor-
mation and manipulative posts discernment test (Maertens 
et al., 2022, in prep). Looking at these data (N = 2220, US 
quota sample, Respondi), we found that the MIST was the 
single best predictor for manipulative headline discernment 
above the CMQ and news consumption (not controlling 
for news consumption: β = 0.366, p < .001, controlling for 
news consumption: β = 0.362, p < .001), that general news 
consumption was only weakly correlated with MIST perfor-
mance (r = 0.218, p < .001), and that news consumption did 
not have an impact on the MIST’s predictive validity (see 
Supplement S18). In other words, the MIST discernment 
score does reflect ecologically valid discernment, and is not 
confounded by news consumption.

Finally—although based on the consistent results across 
samples and time points it is unlikely that this has confounded 
the results—it should be noted that in all studies and with all 
samples, we have excluded participants who did not com-
plete the entire study up to the analysis of interest. This means 
that in Study 1, the test–retest reliability may be influenced 
by the type of participants who participated in the follow-up 
(i.e., long-term Prolific users), in Study 2 it is possible that 
the construct validity findings were influenced by excluding 
participants who dropped out during the study, and in Study 
3 it is possible that the evaluation was influenced by some 
participants dropping out between the pretest and post-test.

We can see many more avenues for future studies using 
Verification done and the MIST. One example is the imple-
mentation of the MIST in geo-psychological studies (Ebert 
et al., 2021; Rentfrow et al., 2013, 2015) to identify misinfor-
mation hotspots and covariates with national, regional, and 
local levels of misinformation susceptibility. Another strand 
of research may further deepen our conceptual understand-
ing of media literacy. For example, in light of the current 
findings, it appears that veracity discernment may encom-
pass both a comparatively stable, trait-like component, and 
a more malleable skill component. Future studies may more 
clearly identify this distinction and find ways to best use these 
insights to devise effective interventions that foster better 
detection of both fake news and real news, and in turn ulti-
mately lead to greater genuine veracity discernment.

Finally, we identify six immediate use cases for the MIST: 
(1) to prescreen participants for studies, (2) as a covariate to 
investigate subgroups (e.g., that are highly susceptible to mis-
information), (3) as a control variable in a model, (4) to map 
geographical regions to identify misinformation susceptibility 
hotspots, (5) to identify brain regions linked to misinformation 
susceptibility, and (6) to evaluate interventions. In addition, 
we would like to encourage the use of the Verification done 
framework as a general method to look at misinformation 


1894 Behavior Research Methods (2024) 56:1863–1899

1 3

susceptibility and intervention effects more holistically, inde-
pendent of the measure used: indeed, we would encourage 
practitioners to use the framework with any tests.

Conclusion

Researchers lack a unifying conceptualization of misinfor-
mation susceptibility and too often use unvalidated measures 
of misinformation susceptibility. We therefore developed a 
new overarching, unifying and multifaceted interpretation 
framework (i.e., Verification done) and a new, thoroughly 
validated measurement instrument based on this framework 
(i.e., the Misinformation Susceptibility Test; MIST). The 
current paper acts as a blueprint of integrated theory and 
assessment development, and opens the door to standard-
ized and comparative misinformation susceptibility research. 
Both researchers and practitioners can now make a thor-
ough evaluation of media literacy interventions by compar-
ing MIST scores using the norm tables and the Verification 
done framework. The use of our standardized and psycho-
metrically validated instrument allows for a comprehensive 
evaluation, and also permits holistic comparison studies and 
tables to be compiled reporting all five Verification done 
scores. Practitioners in turn can use these scores and com-
parisons to choose interventions that best fit their needs. 
Verification done and the MIST can be employed across a 
range of psychological disciplines, ranging from cognitive 
neuroscience to social and personality psychology, to reveal 
the psychological mechanisms behind susceptibility to mis-
information or to test the outcome of interventions.

Supplementary Information The online version contains supplemen-
tary material available at https:// doi. org/ 10. 3758/ s13428- 023- 02124-2.

Author note Parts of the current article were presented at a conference 
talk given by the first author at the 2021 Annual Convention of the Society 
for Personality and Social Psychology (SPSP). A preprint of the article 
was published on PsyArXiv at https:// doi. org/ 10. 31234/ osf. io/ gk68h.

The supplements, data, and analysis scripts that support this paper’s 
findings, including Qualtrics files, analysis code, raw and clean data-
sets, and all research materials, are openly available on the Open Sci-
ence Framework (OSF) at https:// osf. io/ r7phc/. Preregistrations are 
available on AsPredicted at https:// aspre dicted. org/ m7vb3. pdf (Study 
1, T1), https:// aspre dicted. org/ js2jz. pdf (Study 1, T2), and https:// aspre 
dicted. org/ nx7xu. pdf (Study 2B).

Funding This work was financially supported by the United King-
dom Economic and Social Research Council (ESRC), the Cambridge 
Trust (CT), the Winton Centre for Risk and Evidence Communication 
(University of Cambridge), the German Academic Exchange Service 
(DAAD), and the University of Virginia’s 3 Cavaliers Fund and the 
Center for Global Inquiry and Innovation.

Declarations 

Conflicts of interest/Competing interests The authors have no con-
flicts of interest to declare.

Ethics approval All procedures performed in studies involving human 
participants were in accordance with the ethical standards of the 
institutional and/or national research committee and with the 1964 
Helsinki Declaration and its later amendments or comparable ethical 
standards. The study was reviewed and approved by the Psychology 
Research Ethics Committee of the University of Cambridge (Study 1: 
PRE.2019.108; Study 2: PRE.2019.108, PRE.2020.034, PRE.2020.086, 
PRE.2020.120; Study 3: PRE.2020.120, PRE.2020.136).

Consent to participate Informed consent was obtained from all indi-
vidual participants included in Study 1, Study 2, and Study 3.

Consent for publication The authors affirm that all research participants 
provided informed consent for the publication of the anonymized data-
sets in Study 1 and Study 2. In Study 3, no personal data was collected.

Open Access This article is licensed under a Creative Commons Attri-
bution 4.0 International License, which permits use, sharing, adapta-
tion, distribution and reproduction in any medium or format, as long 
as you give appropriate credit to the original author(s) and the source, 
provide a link to the Creative Commons licence, and indicate if changes 
were made. The images or other third party material in this article are 
included in the article's Creative Commons licence, unless indicated 
otherwise in a credit line to the material. If material is not included in 
the article's Creative Commons licence and your intended use is not 
permitted by statutory regulation or exceeds the permitted use, you will 
need to obtain permission directly from the copyright holder. To view a 
copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.

References

Aichholzer, J., & Kritzinger, S. (2016). Kurzskala politischer Zynismus 
(KPZ). [Short scale of political cynicism]. Zusammenstellung 
Sozialwissenschaftlicher Items und Skalen. https:// doi. org/ 10. 
6102/ zis245

Aird, M. J., Ecker, U. K. H., Swire, B., Berinsky, A. J., & Lewan-
dowsky, S. (2018). Does truth matter to voters? The effects of 
correcting political misinformation in an Australian sample. 
Royal Society Open Science, 5(12), Article 180593. https:// doi. 
org/ 10. 1098/ rsos. 180593

Anvari, F., & Lakens, D. (2021). Using anchor-based methods to deter-
mine the smallest effect size of interest. Journal of Experimental 
Social Psychology. Advance online publication. https:// doi. org/ 
10. 1016/j. jesp. 2021. 104159

Bago, B., Rand, D. G., & Pennycook, G. (2020). Fake news, fast and 
slow: Deliberation reduces belief in false (but not true) news 
headlines. Journal of Experimental Psychology: General, 149(8), 
1608–1613. https:// doi. org/ 10. 1037/ xge00 00729

Baron, J. (2019). Actively open-minded thinking in politics. Cogni-
tion, 188, 8–18. https:// doi. org/ 10. 1016/j. cogni tion. 2018. 10. 004

Basol, M., Roozenbeek, J., McClanahan, P., Berriche, M., Uenal, F., 
& van der Linden, S. (2021). Towards psychological herd immu-
nity: Cross-cultural evidence for two prebunking interventions 
against COVID-19 misinformation. Big Data & Society, 8(1), 
1–18. https:// doi. org/ 10. 1177/ 20539 51721 10138 68

Batailler, C., Brannon, S. M., Teas, P. E., & Gawronski, B. (2022). A 
signal detection approach to understanding the identification of 
fake news. Perspectives on Psychological Science, 17(1), 78–98. 
https:// doi. org/ 10. 1177/ 17456 91620 986135

Bentler, P. M., & Bonett, D. G. (1980). Significance tests and good-
ness of fit in the analysis of covariance structures. Psycho-
logical Bulletin, 88(3), 588–606. https:// doi. org/ 10. 1037/ 0033- 
2909. 88.3. 588

https://doi.org/10.3758/s13428-023-02124-2
https://doi.org/10.31234/osf.io/gk68h
https://osf.io/r7phc/
https://aspredicted.org/m7vb3.pdf
https://aspredicted.org/js2jz.pdf
https://aspredicted.org/nx7xu.pdf
https://aspredicted.org/nx7xu.pdf
http://creativecommons.org/licenses/by/4.0/
https://doi.org/10.6102/zis245
https://doi.org/10.6102/zis245
https://doi.org/10.1098/rsos.180593
https://doi.org/10.1098/rsos.180593
https://doi.org/10.1016/j.jesp.2021.104159
https://doi.org/10.1016/j.jesp.2021.104159
https://doi.org/10.1037/xge0000729
https://doi.org/10.1016/j.cognition.2018.10.004
https://doi.org/10.1177/20539517211013868
https://doi.org/10.1177/1745691620986135
https://doi.org/10.1037/0033-2909.88.3.588
https://doi.org/10.1037/0033-2909.88.3.588


1895Behavior Research Methods (2024) 56:1863–1899 

1 3

Block, J. (1995). A contrarian view of the five-factor approach to per-
sonality description. Psychological Bulletin, 117(2), 187–215. 
https:// doi. org/ 10. 1037/ 0033- 2909. 117.2. 187

Blondel, V. D., Guillaume, J.-L., Lambiotte, R., & Lefebvre, E. 
(2008). Fast unfolding of communities in large networks. Jour-
nal of Statistical Mechanics: Theory and Experiment, 2008(10), 
P10008. https:// doi. org/ 10. 1088/ 1742- 5468/ 2008/ 10/ P10008

Boateng, G. O., Neilands, T. B., Frongillo, E. A., Melgar-Quiñonez, 
H. R., & Young, S. L. (2018). Best practices for developing and 
validating scales for health, social, and behavioral research: A 
primer. Frontiers in Public Health, 6, 149. https:// doi. org/ 10. 
3389/ fpubh. 2018. 00149

Boker, S. M. (2018). Longitudinal multivariate psychology (E. Ferrer, 
S. M. Boker, & K. J. Grimm, Eds.). Routledge. https:// doi. org/ 
10. 4324/ 97813 15160 542

Borsboom, D. (2008). Psychometric perspectives on diagnostic sys-
tems. Journal of Clinical Psychology, 64(9), 1089–1108. https:// 
doi. org/ 10. 1002/ jclp. 20503

Borsboom, D., Cramer, A. O., Schmittmann, V. D., Epskamp, S., & 
Waldorp, L. J. (2011). The small world of psychopathology. PloS 
One, 6(11), e27407. https:// doi. org/ 10. 1371/ journ al. pone. 00274 07

Bovet, A., & Makse, H. A. (2019). Influence of fake news in Twitter 
during the 2016 US presidential election. Nature Communica-
tions, 10(1), 7. https:// doi. org/ 10. 1038/ s41467- 018- 07761-2

Brady, W. J., Wills, J. A., Jost, J. T., Tucker, J. A., & Van Bavel, J. J. 
(2017). Emotion shapes the diffusion of moralized content in 
social networks. Proceedings of the National Academy of Sci-
ences of the United States of America, 114(28), 7313–7318. 
https:// doi. org/ 10. 1073/ pnas. 16189 23114

Brick, C., Hood, B., Ekroll, V., & de-Wit, L. (2022). Illusory essences: 
A bias holding back theorizing in psychological science. Per-
spectives on Psychological Science, 17(2), 491–506. https:// doi. 
org/ 10. 1177/ 17456 91621 991838

Brotherton, R., French, C. C., & Pickering, A. D. (2013). Measuring 
belief in conspiracy theories: The generic conspiracist beliefs 
scale. Frontiers in Psychology, 4, 1–15. https:// doi. org/ 10. 3389/ 
fpsyg. 2013. 00279

Bruder, M., Haffke, P., Neave, N., Nouripanah, N., & Imhoff, R. (2013). 
Measuring individual differences in generic beliefs in conspiracy 
theories across cultures: Conspiracy mentality questionnaire. 
Frontiers in Psychology, 4(279), 1–15. https:// doi. org/ 10. 3389/ 
fpsyg. 2013. 00225

Buhrmester, M. D., Talaifar, S., & Gosling, S. D. (2018). An evaluation 
of Amazon’s Mechanical Turk, its rapid rise, and its effective use. 
Perspectives on Psychological Science, 13(2), 149–154. https:// 
doi. org/ 10. 1177/ 17456 91617 706516

Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant 
validation by the multitrait-multimethod matrix. Psychological 
Bulletin, 56(2), 81–105. https:// www. ncbi. nlm. nih. gov/ pubmed/ 
13634 291.

Carpenter, S. (2018). Ten steps in scale development and reporting: A 
guide for researchers. Communication Methods and Measures, 
12(1), 25–44. https:// doi. org/ 10. 1080/ 19312 458. 2017. 13965 83

Chalmers, R. P. (2012). mirt: A multidimensional item response theory 
package for the Renvironment. Journal of Statistical Software, 
48(6), 1–29. https:// doi. org/ 10. 18637/ jss. v048. i06

Champely, S., Ekstrom, C., Dalgaard, P., Gill, J., Weibelzahl, S., 
Anandkumar, A., ... & De Rosario, M. H. (2021). pwr: Basic 
functions for power analysis. The Comprehensive R Archive Net-
work. https:// cran.r- proje ct. org/ packa ge= pwr

Chen, J., & Chen, Z. (2008). Extended Bayesian information crite-
ria for model selection with large model spaces. Biometrika, 
95(3), 759–771. https:// doi. org/ 10. 1093/ biomet/ asn034

Chołoniewski, J., Sienkiewicz, J., Dretnik, N., Leban, G., Thel-
wall, M., & Hołyst, J. A. (2020). A calibrated measure to 
compare fluctuations of different entities across timescales. 

Scientific Reports, 10(1), Article 20673. https:// doi. org/ 10. 
1038/ s41598- 020- 77660-4

Christensen, A. P., Cotter, K. N., & Silvia, P. J. (2019). Reopening open-
ness to experience: A network analysis of four openness to expe-
rience inventories. Journal of Personality Assessment, 101(6), 
574–588. https:// doi. org/ 10. 1080/ 00223 891. 2018. 14674 28

Christensen, A. P., Garrido, L. E., & Golino, H. (2020a). Unique 
variable analysis: A novel approach for detecting redundant 
variables in multivariate data. PsyArXiv. https:// doi. org/ 10. 
31234/ osf. io/ 4kra2

Christensen, A. P., Golino, H., & Silvia, P. J. (2020b). A psychomet-
ric network perspective on the validity and validation of per-
sonality trait questionnaires. European Journal of Personality, 
34(6), 1095–1108. https:// doi. org/ 10. 1002/ per. 2265

Christensen, A. P., & Golino, H. (2021a). Estimating the stability 
of psychological dimensions via bootstrap exploratory graph 
analysis: A Monte Carlo simulation and tutorial. Psych, 3(3), 
479–500. https:// doi. org/ 10. 3390/ psych 30300 32

Christensen, A. P., & Golino, H. (2021b). Factor or network model? 
Predictions from neural networks. Journal of Behavioral Data 
Science, 1(1), 85–126. https:// doi. org/ 10. 35566/ jbds/ v1n1/ p5

Christensen, A. P., & Golino, H. (2021c). On the equivalency of 
factor and network loadings. Behavior Research Methods, 53, 
1563–1580. https:// doi. org/ 10. 3758/ s13428- 020- 01500-6

Cichocka, A., Marchlewska, M., & de Zavala, A. G. (2016). Does 
self-love or self-hate predict conspiracy beliefs? Narcissism, 
self-esteem, and the endorsement of conspiracy theories. 
Social Psychological and Personality Science, 7(2), 157–166. 
https:// doi. org/ 10. 1177/ 19485 50615 616170

Cinelli, M., Quattrociocchi, W., Galeazzi, A., Valensise, C. M., 
Brugnoli, E., Schmidt, A. L., et al. (2020). The COVID-19 
social media infodemic. Scientific Reports, 10(1), 1–10. https:// 
doi. org/ 10. 1038/ s41598- 020- 73510-5

Cinelli, M., De Francisci Morales, G., Galeazzi, A., Quattrociocchi, 
W., & Starnini, M. (2021). The echo chamber effect on social 
media. Proceedings of the National Academy of Sciences of 
the United States of America, 118(9), e2023301118. https:// 
doi. org/ 10. 1073/ pnas. 20233 01118

Clark, L. A., & Watson, D. (1995). Constructing validity: Basic issues 
in objective scale development. Psychological Assessment, 7(3), 
309–319. https:// doi. org/ 10. 1037// 1040- 3590.7. 3. 309

Clark, L. A., & Watson, D. (2019). Constructing validity: New develop-
ments in creating objective measuring instruments. Psychological 
Assessment, 31(12), 1412–1427. https:// doi. org/ 10. 1037/ pas00 00626

Cokely, E. T., Galesic, M., Schulz, E., Ghazal, S., & Garcia-Retamero, 
R. (2012). Measuring risk literacy: The Berlin numeracy test. 
Judgment and Decision Making, 7(1), 25–47. http:// journ al. sjdm. 
org/ 11/ 11808/ jdm11 808. pdf

Comrey, A. L., & Lee, H. B. (1992). A first course in factor analysis 
(2nd ed.). Erlbaum Associates

Condon, D. M., Wood, D., Mõttus, R., Booth, T., Costantini, G., Greiff, 
S., ..., Zimmermann, J. (2020). Bottom up construction of a per-
sonality taxonomy. European Journal of Psychological Assess-
ment, 36(6), 923–934. https:// doi. org/ 10. 1027/ 1015- 5759/ a0006 26

Cook, J., Lewandowsky, S., & Ecker, U. K. H. (2017). Neutralizing 
misinformation through inoculation: Exposing misleading argu-
mentation techniques reduces their influence. PloS One, 12(5), 
e0175799. https:// doi. org/ 10. 1371/ journ al. pone. 01757 99

Costello, A. B., & Osborne, J. (2005). Best practices in exploratory 
factor analysis: Four recommendations for getting the most 
from your analysis, Practical Assessment, Research, and Eval-
uation, 10(1), 7. https:// doi. org/ 10. 7275/ jyj1- 4868

Cramer, A. O. (2012). Why the item “23+ 1” is not in a depression 
questionnaire: Validity from a network perspective. Meas-
urement: Interdisciplinary Research & Perspective, 10(1-2), 
50–54. https:// doi. org/ 10. 1080/ 15366 367. 2012. 681973

https://doi.org/10.1037/0033-2909.117.2.187
https://doi.org/10.1088/1742-5468/2008/10/P10008
https://doi.org/10.3389/fpubh.2018.00149
https://doi.org/10.3389/fpubh.2018.00149
https://doi.org/10.4324/9781315160542
https://doi.org/10.4324/9781315160542
https://doi.org/10.1002/jclp.20503
https://doi.org/10.1002/jclp.20503
https://doi.org/10.1371/journal.pone.0027407
https://doi.org/10.1038/s41467-018-07761-2
https://doi.org/10.1073/pnas.1618923114
https://doi.org/10.1177/1745691621991838
https://doi.org/10.1177/1745691621991838
https://doi.org/10.3389/fpsyg.2013.00279
https://doi.org/10.3389/fpsyg.2013.00279
https://doi.org/10.3389/fpsyg.2013.00225
https://doi.org/10.3389/fpsyg.2013.00225
https://doi.org/10.1177/1745691617706516
https://doi.org/10.1177/1745691617706516
https://www.ncbi.nlm.nih.gov/pubmed/13634291
https://www.ncbi.nlm.nih.gov/pubmed/13634291
https://doi.org/10.1080/19312458.2017.1396583
https://doi.org/10.18637/jss.v048.i06
https://cran.r-project.org/package=pwr
https://doi.org/10.1093/biomet/asn034
https://doi.org/10.1038/s41598-020-77660-4
https://doi.org/10.1038/s41598-020-77660-4
https://doi.org/10.1080/00223891.2018.1467428
https://doi.org/10.31234/osf.io/4kra2
https://doi.org/10.31234/osf.io/4kra2
https://doi.org/10.1002/per.2265
https://doi.org/10.3390/psych3030032
https://doi.org/10.35566/jbds/v1n1/p5
https://doi.org/10.3758/s13428-020-01500-6
https://doi.org/10.1177/1948550615616170
https://doi.org/10.1038/s41598-020-73510-5
https://doi.org/10.1038/s41598-020-73510-5
https://doi.org/10.1073/pnas.2023301118
https://doi.org/10.1073/pnas.2023301118
https://doi.org/10.1037//1040-3590.7.3.309
https://doi.org/10.1037/pas0000626
http://journal.sjdm.org/11/11808/jdm11808.pdf
http://journal.sjdm.org/11/11808/jdm11808.pdf
https://doi.org/10.1027/1015-5759/a000626
https://doi.org/10.1371/journal.pone.0175799
https://doi.org/10.7275/jyj1-4868
https://doi.org/10.1080/15366367.2012.681973


1896 Behavior Research Methods (2024) 56:1863–1899

1 3

Cramer, A., Waldorp, L. J., Van Der Maas, H. L., & Borsboom, D. 
(2010). Comorbidity: A network perspective. Behavioral and 
Brain Sciences, 33(2-3), 137–150. https:// doi. org/ 10. 1017/ 
S0140 525X0 99915 67

Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psycho-
logical tests. Psychological Bulletin, 52(4), 281–302. https:// 
doi. org/ 10. 1037/ h0040 957

Curley, A. (2020). How to use GPT-2 in Google Colab. The Startup. 
https:// medium. com/ swlh/ how- to- use- gpt-2- in- google- colab- 
de44f 59199 c1

Curran, P. J., Bollen, K. A., Chen, F., Paxton, P., & Kirby, J. B. (2003). 
Finite sampling properties of the point estimates and confidence 
intervals of the RMSEA. Sociological Methods & Research, 
32(2), 208–252. https:// doi. org/ 10. 1177/ 00491 24103 256130

de Vries, W., & Nissim, M. (2020). As good as new: How to success-
fully recycle English GPT-2 to make models for other languages. 
ArXiv. https:// arxiv. org/ abs/ 2012. 05628. Accessed 10 Dec 2020.

Deffner, D., Rohrer, J. M., & McElreath, R. (2022). A causal frame-
work for cross-cultural generalizability. Advances in Methods 
and Practices in Psychological Science, 5(3), 1–18. https:// doi. 
org/ 10. 1177/ 25152 45922 11063 66

Dhami, M. K., Hertwig, R., & Hoffrage, U. (2004). The role of rep-
resentative design in an ecological approach to cognition. Psy-
chological Bulletin, 130(6), 959–988. https:// doi. org/ 10. 1037/ 
0033- 2909. 130.6. 959

Dür, A., & Schlipphak, B. (2021). Elite cueing and attitudes towards trade 
agreements: The case of TTIP. European Political Science Review, 
13(1), 41–57. https:// doi. org/ 10. 1017/ S1755 77392 00003 4X

Ebert, T., Götz, F. M., Gladstone, J. J., Müller, S. R., & Matz, S. C. 
(2021). Spending reflects not only who we are but also who we 
are around: The joint effects of individual and geographic per-
sonality on consumption. Journal of Personality and Social Psy-
chology, 121(2), 378–393. https:// doi. org/ 10. 1037/ pspp0 000344

Epskamp, S., & Fried, E. (2018). A tutorial on regularized partial 
correlation networks. Psychological Methods, 23(4), 617–634. 
https:// doi. org/ 10. 1037/ met00 00167

Epskamp, S., Maris, G., Waldorp, L. J., & Borsboom, D. (2018). Net-
work psychometrics. In B. Irwing Paul (Ed.), The Wiley hand-
book of psychometric testing: A multidisciplinary reference on 
survey, scale and test development (pp. 953–986). John Wiley & 
Sons Ltd.. https:// doi. org/ 10. 1002/ 97811 18489 772. ch30

Epskamp, S., Rhemtulla, M., & Borsboom, D. (2017). Generalized 
network psychometrics: Combining network and latent variable 
models. Psychometrika, 82(4), 904–927. https:// doi. org/ 10. 1007/ 
s11336- 017- 9557-x

Eysenck, H. J. (1967). The biological basis of personality. Thomas
Fabrigar, L. R., Wegener, D. T., MacCallum, R. C., & Strahan, E. 

J. (1999). Evaluating the use of exploratory factor analysis in 
psychological research. Psychological Methods, 4(3), 272–299. 
https:// doi. org/ 10. 1037/ 1082- 989X.4. 3. 272

Fazio, L. K. (2020). Pausing to consider why a headline is true or false can 
help reduce the sharing of false news. Harvard Kennedy School Mis-
information Review, 1(2), 1–8. https:// doi. org/ 10. 37016/ mr- 2020- 009

Finch, J. F., & West, S. G. (1997). The investigation of personality 
structure: Statistical models. Journal of Research in Personality, 
31(4), 439–485. https:// doi. org/ 10. 1006/ jrpe. 1997. 2194

Flake, J. K., Pek, J., & Hehman, E. (2017). Construct validation in 
social and personality research: Current practice and recom-
mendations. Social Psychological and Personality Science, 8(4), 
370–378. https:// doi. org/ 10. 1177/ 19485 50617 693063

Ford, J. K., MacCallum, R. C., & Tait, M. (1986). The application 
of exploratory factor analysis in applied psychology: A criti-
cal review and analysis. Personnel Psychology, 39(2), 291–314. 
https:// doi. org/ 10. 1111/j. 1744- 6570. 1986. tb005 83.x

Frederick, S. (2005). Cognitive reflection and decision making. The 
Journal of Economic Perspectives: A Journal of the American 

Economic Association, 19(4), 25–42. https:// doi. org/ 10. 1257/ 
08953 30057 75196 732

Funder, D. C., & Ozer, D. J. (2019). Evaluating effect size in psycho-
logical research: Sense and nonsense. Advances in Methods and 
Practices in Psychological Science, 2(2), 156–168. https:// doi. 
org/ 10. 1177/ 25152 45919 847202

Golino, H. F., & Demetriou, A. (2017). Estimating the dimensionality 
of intelligence like data using exploratory graph analysis. Intel-
ligence, 62, 54–70. https:// doi. org/ 10. 1016/j. intell. 2017. 02. 007

Golino, H. F., & Epskamp, S. (2017). Exploratory graph analysis: A 
new approach for estimating the number of dimensions in psy-
chological research. PloS One, 12(6), e0174035. https:// doi. org/ 
10. 1371/ journ al. pone. 01740 35

Golino, H. F., & Christensen, A. P. (2019). EGAnet: Exploratory graph 
analysis: A framework for estimating the number of dimensions in 
multivariate data using network psychometrics. The Comprehensive 
R Archive Network. https:// cran.r- proje ct. org/ packa ge= EGAnet

Golino, H. F., Christensen, A. P., & Garrido, L. E. (2022). Exploratory 
graph analysis in context. Revista Psicologia: Teoria e Prática, 24(3), 
ePTPPA14197. https:// doi. org/ 10. 5935/ 1980- 6906/ ePTPI C15531. en

Golino, H. F., Lillard, A. S., Becker, I., & Christensen, A. P. (2021). 
Investigating the structure of the children’s concentration and 
empathy scale using exploratory graph analysis. Psychological 
Test Adaptation and Development, 2(1), 35–49. https:// doi. org/ 
10. 1027/ 2698- 1866/ a0000 08

Golino, H. F., Moulder, R., Shi, D., Christensen, A., Garrido, L., Neto, 
M., et al. (2020a). Entropy fit indices: New fit measures for assess-
ing the structure and dimensionality of multiple latent variables. 
Multivariate Behavioral Research, 56(6), 874–902. https:// doi. org/ 
10. 1080/ 00273 171. 2020. 17796 42

Golino, H. F., Shi, D., Garrido, L. E., Christensen, A. P., Nieto, M. 
D., Sadana, R., et al. (2020b). Investigating the performance of 
exploratory graph analysis and traditional techniques to identify 
the number of latent factors: A simulation and tutorial. Psychologi-
cal Methods, 25(3), 292–230. https:// doi. org/ 10. 1037/ met00 00255

Goretzko, D., Pham, T. T. H., & Bühner, M. (2021). Exploratory fac-
tor analysis: Current use, methodological developments and 
recommendations for good practice. Current Psychology, 40(7), 
3510–3521. https:// doi. org/ 10. 1007/ s12144- 019- 00300-2

Götz, F. M., Maertens, R., Loomba, S., & van der Linden, S. (2023). 
Let the algorithm speak: How to use neural networks for auto-
matic item generation in psychological scale development. Psy-
chological Methods. Advance online publication. https:// doi. org/ 
10. 1037/ met00 00540

Götz, F. M., Gosling, S. D., & Rentfrow, P. J. (2022). Small effects: 
The indispensable foundation for a cumulative psychological 
science. Perspectives on Psychological Science, 17(1), 205–
215. https:// doi. org/ 10. 1177/ 17456 91620 9844

Graham, J., Nosek, B. A., Haidt, J., Iyer, R., Koleva, S., & Ditto, P. 
H. (2011). Mapping the moral domain. Journal of Personality 
and Social Psychology, 101(2), 366–385. https:// doi. org/ 10. 
1037/ a0021 847

Guadagnoli, E., & Velicer, W. F. (1988). Relation of sample size to 
the stability of component patterns. Psychological Bulletin, 
103(2), 265–275. https:// doi. org/ 10. 1037/ 0033- 2909. 103.2. 265

Guess, A. M., Lerner, M., Lyons, B., Montgomery, J. M., Nyhan, B., 
Reifler, J., & Sircar, N. (2020). A digital media literacy inter-
vention increases discernment between mainstream and false 
news in the United States and India. Proceedings of the National 
Academy of Sciences of the United States of America, 117(27), 
15536–15545. https:// doi. org/ 10. 1073/ pnas. 19204 98117

Grzesiak-Feldman, M. (2013). The effect of high-anxiety situations 
on conspiracy thinking. Current Psychology, 32(1), 100–118. 
https:// doi. org/ 10. 1007/ s12144- 013- 9165-6

Guillou, P. (2020). Faster than training from scratch — Fine-tuning the 
English GPT-2 in any language with Hugging Face and fastai v2 

https://doi.org/10.1017/S0140525X09991567
https://doi.org/10.1017/S0140525X09991567
https://doi.org/10.1037/h0040957
https://doi.org/10.1037/h0040957
https://medium.com/swlh/how-to-use-gpt-2-in-google-colab-de44f59199c1
https://medium.com/swlh/how-to-use-gpt-2-in-google-colab-de44f59199c1
https://doi.org/10.1177/0049124103256130
https://arxiv.org/abs/2012.05628
https://doi.org/10.1177/25152459221106366
https://doi.org/10.1177/25152459221106366
https://doi.org/10.1037/0033-2909.130.6.959
https://doi.org/10.1037/0033-2909.130.6.959
https://doi.org/10.1017/S175577392000034X
https://doi.org/10.1037/pspp0000344
https://doi.org/10.1037/met0000167
https://doi.org/10.1002/9781118489772.ch30
https://doi.org/10.1007/s11336-017-9557-x
https://doi.org/10.1007/s11336-017-9557-x
https://doi.org/10.1037/1082-989X.4.3.272
https://doi.org/10.37016/mr-2020-009
https://doi.org/10.1006/jrpe.1997.2194
https://doi.org/10.1177/1948550617693063
https://doi.org/10.1111/j.1744-6570.1986.tb00583.x
https://doi.org/10.1257/089533005775196732
https://doi.org/10.1257/089533005775196732
https://doi.org/10.1177/2515245919847202
https://doi.org/10.1177/2515245919847202
https://doi.org/10.1016/j.intell.2017.02.007
https://doi.org/10.1371/journal.pone.0174035
https://doi.org/10.1371/journal.pone.0174035
https://cran.r-project.org/package=EGAnet
https://doi.org/10.5935/1980-6906/ePTPIC15531.en
https://doi.org/10.1027/2698-1866/a000008
https://doi.org/10.1027/2698-1866/a000008
https://doi.org/10.1080/00273171.2020.1779642
https://doi.org/10.1080/00273171.2020.1779642
https://doi.org/10.1037/met0000255
https://doi.org/10.1007/s12144-019-00300-2
https://doi.org/10.1037/met0000540
https://doi.org/10.1037/met0000540
https://doi.org/10.1177/17456916209844
https://doi.org/10.1037/a0021847
https://doi.org/10.1037/a0021847
https://doi.org/10.1037/0033-2909.103.2.265
https://doi.org/10.1073/pnas.1920498117
https://doi.org/10.1007/s12144-013-9165-6


1897Behavior Research Methods (2024) 56:1863–1899 

1 3

(practical case with Portuguese). Medium. https:// medium. com/@ 
pierre_ guill ou/ faster- than- train ing- from- scrat ch- fine- tuning- the- 
engli sh- gpt-2- in- any- langu age- with- huggi ng- f2ec0 5c987 87

Hair, J. F., Anderson, R. E., Babin, B. J., & Black, W. C. (2010). Mul-
tivariate data analysis: A global perspective (7th ed.). Pearson

Haynes, S. N., Richard, D. C. S., & Kubany, E. S. (1995). Content 
validity in psychological assessment: A functional approach to 
concepts and methods. Psychological Assessment, 7(3), 238–247. 
https:// doi. org/ 10. 1037/ 1040- 3590.7. 3. 238

Heinsohn, T., Fatke, M., Israel, J., Marschall, S., & Schultze, M. 
(2019). Effects of voting advice applications during election 
campaigns: Evidence from a panel study at the 2014 European 
elections. Journal of Information Technology & Politics, 16(3), 
250–264. https:// doi. org/ 10. 1080/ 19331 681. 2019. 16442 65

Ho, A. K., Sidanius, J., Kteily, N., Sheehy-Skeffington, J., Pratto, F., 
Henkel, K. E., Foels, R., & Stewart, A. L. (2015). The nature of 
social dominance orientation: Theorizing and measuring prefer-
ences for intergroup inequality using the new  SDO7 scale. Jour-
nal of Personality and Social Psychology, 109(6), 1003–1028. 
https:// doi. org/ 10. 1037/ pspi0 000033

Hofstee, W. K., de Raad, B., & Goldberg, L. R. (1992). Integration of 
the big five and circumplex approaches to trait structure. Journal 
of Personality and Social Psychology, 63(1), 146–163. https:// 
doi. org/ 10. 1037// 0022- 3514. 63.1. 146

Holland, P. W., & Wainer, H. (Eds.). (1993). Differential item func-
tioning. Lawrence Erlbaum. https:// psycn et. apa. org/ record/ 
1993- 97193- 000

Hommel, B. E., Wollang, F. J. M., Kotova, V., Zacher, H., & Schmukle, 
S. C. (2022). Transformer-based deep neural language modeling 
for construct-specific automatic item generation. Psychometrika, 
87(2), 749–772. https:// doi. org/ 10. 1007/ s11336- 021- 09823-9

Horn, J. L. (1965). A rationale and test for the number of factors in 
factor analysis. Psychometrika, 30(2), 179–185. https:// doi. org/ 
10. 1007/ BF022 89447

Hotez, P., Batista, C., Ergonul, O., Figueroa, J. P., Gilbert, S., Gursel, 
M., Hassanain, M., Kang, G., Kim, J. H., Lall, B., Larson, H., 
Naniche, D., Sheahan, T., Shoham, S., Wilder-Smith, A., Strub-
Wourgaft, N., Yadav, P., & Bottazzi, M. E. (2021). Correcting 
COVID-19 vaccine misinformation. EClinicalMedicine, 33, 
Article 100780. https:// doi. org/ 10. 1016/j. eclinm. 2021. 100780

Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covari-
ance structure analysis: Conventional criteria versus new alterna-
tives. Structural Equation Modeling: A Multidisciplinary Jour-
nal, 6(1), 1–55. https:// doi. org/ 10. 1080/ 10705 51990 95401 18

Humphreys, L. G., & Ilgen, D. R. (1969). Note on a criterion for the 
number of common factors. Educational and Psychological 
Measurement, 29(3), 571–578. https:// doi. org/ 10. 1177/ 00131 
64469 02900 303

Jamison, L., Golino, H., & Christensen, A. P. (2022). Metric invariance 
in exploratory graph analysis via permutation testing. PsycArxiv. 
https:// doi. org/ 10. 31234/ osf. io/ j4rx9

Jimenez, M., Abad, F. J., Garcia-Garzon, E., Golino, H., Christensen, 
A. P., & Garrido, L. E. (2022). Dimensionality assessment in 
generalized bi-factor structures: A network psychometrics 
approach. PsyArXiv. https:// doi. org/ 10. 31234/ osf. io/ 2ujdk

Jolley, D., & Paterson, J. L. (2020). Pylons ablaze: Examining the role 
of 5G COVID-19 conspiracy beliefs and support for violence. 
British Journal of Social Psychology, 59(3), 628–640. https:// 
doi. org/ 10. 1111/ bjso. 12394

Konrath, S., Meier, B. P., & Bushman, B. J. (2014). Development and val-
idation of the Single Item Narcissism Scale (SINS). PloS One, 9(8), 
Article e103469. https:// doi. org/ 10. 1371/ journ al. pone. 01034 69

Kumareswaran, D. J. (2014). The psychopathological foundations of 
conspiracy theorists. Victoria University of Wellington. http:// 
hdl. handle. net/ 10063/ 3603

Lauritzen, S. L. (1996). Graphical models (Vol. 17). Clarendon Press.

Lawson, A., & Kakkar, H. (2021). Of pandemics, politics, and personal-
ity: The role of conscientiousness and political ideology in shar-
ing of fake news. PsyArXiv. https:// doi. org/ 10. 31234/ osf. io/ ves5m

Lewandowsky, S., Ecker, U. K. H., & Cook, J. (2017). Beyond misin-
formation: Understanding and coping with the “post-truth” era. 
Journal of Applied Research in Memory and Cognition, 6(4), 
353–369. https:// doi. org/ 10. 1016/j. jarmac. 2017. 07. 008

Lewandowsky, S., Smillie, L., Garcia, D., Hertwig, R., Weatherall, J., 
Egidy, S., Robertson, R. E., O’Connor, C., Kozyreva, A., Lorenz-
Spreen, P., Blaschke, Y., & Leiser, M. R. (2020). Technology and 
democracy: Understanding the influence of online technologies 
on political behaviour and decision-making. Publications Office 
of the European Union. https:// doi. org/ 10. 2760/ 709177

Lewandowsky, S., & van der Linden, S. (2021). Countering misin-
formation and fake news through inoculation and prebunking. 
European Review of Social Psychology, 32(2), 348–384. https:// 
doi. org/ 10. 1080/ 10463 283. 2021. 18769 83

Litman, L., Robinson, J., & Abberbock, T. (2017). TurkPrime.com: A 
versatile crowdsourcing data acquisition platform for the behav-
ioral sciences. Behavior Research Methods, 49(2), 433–442. 
https:// doi. org/ 10. 3758/ s13428- 016- 0727-z

Loevinger, J. (1957). Objective tests as instruments of psychological 
theory. Psychological Reports, 3(3), 635–694. https:// doi. org/ 10. 
2466/ pr0. 1957.3. 3. 635

Loomba, S., de Figueiredo, A., Piatek, S. J., de Graaf, K., & Larson, 
H. J. (2021). Measuring the impact of COVID-19 vaccine mis-
information on vaccination intent in the UK and USA. Nature 
Human Behaviour, 5(3), 337–348. https:// doi. org/ 10. 1038/ 
s41562- 021- 01056-1

Marsman, M., Borsboom, D., Kruis, J., Epskamp, S., van Bork, R., 
Waldorp, L., et al. (2018). An introduction to network psycho-
metrics: Relating Ising network models to item response theory 
models. Multivariate Behavioral Research, 53(1), 15–35.

McNeish, D., & Wolf, M. G. (2021). Dynamic fit index cutoffs for 
confirmatory factor analysis models. Psychological Methods. 
Advance online publication. https:// doi. org/ 10. 1037/ met00 00425

Maertens, R., Anseel, F., & van der Linden, S. (2020). Combatting 
climate change misinformation: Evidence for longevity of inocu-
lation and consensus messaging effects. Journal of Environmen-
tal Psychology, 70, 101455. https:// doi. org/ 10. 1016/j. jenvp. 2020. 
101455

Maertens, R., Roozenbeek, J., Basol, M., & van der Linden, S. (2021). 
Long-term effectiveness of inoculation against misinformation: 
Three longitudinal experiments. Journal of Experimental Psy-
chology: Applied, 27(1), 1–16. https:// doi. org/ 10. 1037/ xap00 
00315

Maertens, R., Roozenbeek, J., Simons, J., Lewandowsky, S., Maturo, 
V., Goldberg, B., ...,, van der Linden, S. (2022). Psychological 
booster shots targeting memory increase long-term resistance 
against misinformation. [Manuscript in preparation]

Markon, K. E. (2019). Bifactor and hierarchical models: Specifica-
tion, inference, and interpretation. Annual Review of Clinical 
Psychology, 15, 51–69. https:// doi. org/ 10. 1146/ annur ev- clinp 
sy- 050718- 095522

McDonald, R. P. (1999). Test theory: A unified treatment. Psychology 
Press. https:// doi. org/ 10. 4324/ 97814 10601 087

Meehl, P. E. (1978). Theoretical risks and tabular asterisks: Sir Karl, 
Sir Ronald, and the slow progress of soft psychology. Journal 
of Consulting and Clinical Psychology, 46(4), 806–834. https:// 
doi. org/ 10. 1037/ 0022- 006X. 46.4. 806

Nasser, M. A. (2020) Step-by-step guide on how to train GPT-2 on 
books using Google Colab. Towards Data Science. https://
towardsdatascience.com/step-by-step-guide-on-how-to-train-
gpt-2-on-books-using-google-colab-b3c6fa15fef0

Nguyen, T. H., Han, H.-R., Kim, M. T., & Chan, K. S. (2014). An intro-
duction to item response theory for patient-reported outcome 

https://medium.com/@pierre_guillou/faster-than-training-from-scratch-fine-tuning-the-english-gpt-2-in-any-language-with-hugging-f2ec05c98787
https://medium.com/@pierre_guillou/faster-than-training-from-scratch-fine-tuning-the-english-gpt-2-in-any-language-with-hugging-f2ec05c98787
https://medium.com/@pierre_guillou/faster-than-training-from-scratch-fine-tuning-the-english-gpt-2-in-any-language-with-hugging-f2ec05c98787
https://doi.org/10.1037/1040-3590.7.3.238
https://doi.org/10.1080/19331681.2019.1644265
https://doi.org/10.1037/pspi0000033
https://doi.org/10.1037//0022-3514.63.1.146
https://doi.org/10.1037//0022-3514.63.1.146
https://psycnet.apa.org/record/1993-97193-000
https://psycnet.apa.org/record/1993-97193-000
https://doi.org/10.1007/s11336-021-09823-9
https://doi.org/10.1007/BF02289447
https://doi.org/10.1007/BF02289447
https://doi.org/10.1016/j.eclinm.2021.100780
https://doi.org/10.1080/10705519909540118
https://doi.org/10.1177/001316446902900303
https://doi.org/10.1177/001316446902900303
https://doi.org/10.31234/osf.io/j4rx9
https://doi.org/10.31234/osf.io/2ujdk
https://doi.org/10.1111/bjso.12394
https://doi.org/10.1111/bjso.12394
https://doi.org/10.1371/journal.pone.0103469
http://hdl.handle.net/10063/3603
http://hdl.handle.net/10063/3603
https://doi.org/10.31234/osf.io/ves5m
https://doi.org/10.1016/j.jarmac.2017.07.008
https://doi.org/10.2760/709177
https://doi.org/10.1080/10463283.2021.1876983
https://doi.org/10.1080/10463283.2021.1876983
https://doi.org/10.3758/s13428-016-0727-z
https://doi.org/10.2466/pr0.1957.3.3.635
https://doi.org/10.2466/pr0.1957.3.3.635
https://doi.org/10.1038/s41562-021-01056-1
https://doi.org/10.1038/s41562-021-01056-1
https://doi.org/10.1037/met0000425
https://doi.org/10.1016/j.jenvp.2020.101455
https://doi.org/10.1016/j.jenvp.2020.101455
https://doi.org/10.1037/xap0000315
https://doi.org/10.1037/xap0000315
https://doi.org/10.1146/annurev-clinpsy-050718-095522
https://doi.org/10.1146/annurev-clinpsy-050718-095522
https://doi.org/10.4324/9781410601087
https://doi.org/10.1037/0022-006X.46.4.806
https://doi.org/10.1037/0022-006X.46.4.806


1898 Behavior Research Methods (2024) 56:1863–1899

1 3

measurement. The Patient, 7(1), 23–35. https:// doi. org/ 10. 1007/ 
s40271- 013- 0041-0

Norenzayan, A., & Hansen, I. G. (2006). Belief in supernatural agents 
in the face of death. Personality & Social Psychology Bulletin, 
32(2), 174–187. https:// doi. org/ 10. 1177/ 01461 67205 280251

Osmundsen, M., Bor, A., Vahlstrup, P. B., Bechmann, A., & Petersen, 
M. B. (2021). Partisan polarization is the primary psychological 
motivation behind political fake news sharing on Twitter. Ameri-
can Political Science Review, 115(3), 999–1015. https:// doi. org/ 
10. 1017/ S0003 05542 10002 90

Palan, S., & Schitter, C. (2018). Prolific.ac—A subject pool for online 
experiments. Journal of Behavioral and Experimental Finance, 
17, 22–27. https:// doi. org/ 10. 1016/j. jbef. 2017. 12. 004

Paulhus, D. L., Buckels, E. E., Trapnell, P. D., & Jones, D. N. (2020). 
Screening for dark personalities. European Journal of Psycho-
logical Assessment, 37(3), 208–222. https:// doi. org/ 10. 1027/ 
1015- 5759/ a0006 02

Peer, E., Brandimarte, L., Samat, S., & Acquisti, A. (2017). Beyond 
the Turk: Alternative platforms for crowdsourcing behavioral 
research. Journal of Experimental Social Psychology, 70, 153–
163. https:// doi. org/ 10. 1016/j. jesp. 2017. 01. 006

Pennycook, G., Binnendyk, J., Newton, C., & Rand, D. G. (2021a). A 
practical guide to doing behavioral research on fake news and 
misinformation. Collabra: Psychology, 7(1), 25293. https:// doi. 
org/ 10. 1525/ colla bra. 25293

Pennycook, G., Cheyne, J. A., Barr, N., Koehler, D. J., & Fugelsang, 
J. A. (2015). On the reception and detection of pseudo-profound 
bullshit. Judgment and Decision Making, 10(6), 549–563. http:// 
journ al. sjdm. org/ 15/ 15923a/ jdm15 923a. pdf

Pennycook, G., Epstein, Z., Mosleh, M., Arechar, A. A., Eckles, D., & 
Rand, D. G. (2021b). Shifting attention to accuracy can reduce 
misinformation online. Nature, 592, 590–595. https:// doi. org/ 10. 
1038/ s41586- 021- 03344-2

Pennycook, G., McPhetres, J., Zhang, Y., Lu, J. G., & Rand, D. G. 
(2020). Fighting COVID-19 misinformation on social media: 
Experimental evidence for a scalable accuracy-nudge interven-
tion. Psychological Science, 31(7), 770–780. https:// doi. org/ 10. 
1177/ 09567 97620 939054

Pennycook, G., & Rand, D. G. (2019). Lazy, not biased: Susceptibility 
to partisan fake news is better explained by lack of reasoning than 
by motivated reasoning. Cognition, 188, 39–50. https:// doi. org/ 
10. 1016/j. cogni tion. 2018. 06. 011

Pennycook, G., & Rand, D. G. (2020). Who falls for fake news? The 
roles of bullshit receptivity, overclaiming, familiarity, and ana-
lytic thinking. Journal of Personality, 88(2), 185–200. https:// 
doi. org/ 10. 1111/ jopy. 12476

Pennycook, G., & Rand, D. G. (2021). The psychology of fake news. 
Trends in Cognitive Sciences, 25(5), 388–402. https:// doi. org/ 10. 
1016/j. tics. 2021. 02. 007

Pituch, K. A., & Stevens, J. P. (2015). Applied multivariate statistics 
for the social sciences: Analyses with SAS and IBM’s SPSS. Rout-
ledge. https:// doi. org/ 10. 4324/ 97813 15814 919

Pons, P., & Latapy, M. (2005). Computing communities in large net-
works using random walks. In Pi. Yolum, T. Güngör, F. Gürgen, 
& C. Özturan (Eds.), Computer and information sciences - ISCIS 
2005 (pp. 284–293). Berlin, Heidelberg: Springer. https:// doi. org/ 
10. 1007/ 11569 596_ 31

Preskill, J. (2018). Quantum Shannon entropy. In J. Preskill (Ed.), 
Quantum information (p. 94). Cambridge University Press. 
https:// arxiv. org/ pdf/ 1604. 07450. pdf

Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. 
(2019). Language models are unsupervised multitask learners. 
https:// d4muc fpksy wv. cloud front. net/ better- langu age- models/ 
langu age- models. pdf

Rammstedt, B., Lechner, C. M., & Danner, D. (2021). Short forms do 
not fall short: A comparison of three (extra-)short forms of the 

Big Five. European Journal of Psychological Assessment, 37(1), 
23–32. https:// doi. org/ 10. 1027/ 1015- 5759/ a0005 74

Reise, S. P., Widaman, K. F., & Pugh, R. H. (1993). Confirmatory 
factor analysis and item response theory: Two approaches for 
exploring measurement invariance. Psychological Bulletin, 
114(3), 552–566. https:// doi. org/ 10. 1037/ 0033- 2909. 114.3. 552

Rentfrow, P. J., Gosling, S. D., Jokela, M., Stillwell, D. J., Kosinski, 
M., & Potter, J. (2013). Divided we stand: Three psychological 
regions of the United States and their political, economic, social, 
and health correlates. Journal of Personality and Social Psychol-
ogy, 105(6), 996–1012. https:// doi. org/ 10. 1037/ a0034 434

Rentfrow, P. J., Jokela, M., & Lamb, M. E. (2015). Regional personality 
differences in Great Britain. PloS One, 10(3), e0122245. https:// 
doi. org/ 10. 1371/ journ al. pone. 01222 45

Revelle, W. (2021). psych: Procedures for psychological, psychometric, 
and personality research. The Comprehensive R Archive Network. 
https:// cran.r- proje ct. org/ packa ge= psych

Revelle, W., & Condon, D. M. (2019). Reliability from α to ω: A tuto-
rial. Psychological Assessment, 31(12), 1395–1411. https:// doi. 
org/ 10. 1037/ pas00 00754

Robins, R. W., Hendin, H. M., & Trzesniewski, K. H. (2001). Measuring 
global self-esteem: Construct validation of a single-item measure and 
the Rosenberg self-esteem scale. Personality & Social Psychology 
Bulletin, 27(2), 151–161. https:// doi. org/ 10. 1177/ 01461 67201 272002

Roozenbeek, J., Culloty, E., & Suiter, J. (2023). Countering misinfor-
mation: Evidence, knowledge gaps, and implications of current 
interventions. European Psychologist. In press. https:// doi. org/ 
10. 31234/ osf. io/ b52um

Roozenbeek, J., Freeman, A. L. J., & van der Linden, S. (2021a). How 
accurate are accuracy nudges? A pre-registered direct replica-
tion of Pennycook et al. (2020). Psychological Science, 32(7), 
1169–1178. https:// doi. org/ 10. 1177/ 09567 97621 10245 35

Roozenbeek, J., Maertens, R., Herzog, S., Geers, M., Kurvers, R., 
Sultan, M., & van der Linden, S. (2022). Susceptibility to mis-
information is consistent across question framings and response 
modes and better explained by myside bias and partisanship than 
analytical thinking. Judgment and Decision Making, 17(3), 547–
573. http:// journ al. sjdm. org/ 22/ 220228/ jdm22 0228. pdf

Roozenbeek, J., Maertens, R., McClanahan, W., & van der Linden, S. 
(2021b). Disentangling item and testing effects in inoculation 
research on online misinformation: Solomon revisited. Educa-
tional and Psychological Measurement, 81(2), 340–362. https:// 
doi. org/ 10. 1177/ 00131 64420 940378

Roozenbeek, J., Schneider, C. R., Dryhurst, S., Kerr, J., Freeman, A. L. 
J., Recchia, G., van der Bles, A. M., & van der Linden, S. (2020). 
Susceptibility to misinformation about COVID-19 around the 
world. Royal Society Open Science, 7(10), 201199. https:// doi. 
org/ 10. 1098/ rsos. 201199

Roozenbeek, J., & van der Linden, S. (2019). Fake news game con-
fers psychological resistance against online misinformation. 
Palgrave Communications, 5(1), 65. https:// doi. org/ 10. 1057/ 
s41599- 019- 0279-9

Roozenbeek, J., & van der Linden, S. (2020). Breaking Harmony 
Square: A game that “inoculates” against political misinforma-
tion. Harvard Kennedy School Misinformation Review, 1(8), 
1–26. https:// doi. org/ 10. 37016/ mr- 2020- 47

Rosellini, A. J., & Brown, T. A. (2021). Developing and validating clin-
ical questionnaires. Annual Review of Clinical Psychology, 17, 
55–81. https:// doi. org/ 10. 1146/ annur ev- clinp sy- 081219- 115343

Rosseel, Y. (2012). lavaan: An R package for structural equation mod-
eling and more. Journal of Statistical Software, 48(2), 1–36. 
https:// doi. org/ 10. 18637/ jss. v048. i02

Said, N., Maertens, R., Jürgen, B., & Roozenbeek, J. (2023). The 
Manipulative Online Content Recognition Inventory (MOCRI). 
[Manuscript in preparation]

https://doi.org/10.1007/s40271-013-0041-0
https://doi.org/10.1007/s40271-013-0041-0
https://doi.org/10.1177/0146167205280251
https://doi.org/10.1017/S0003055421000290
https://doi.org/10.1017/S0003055421000290
https://doi.org/10.1016/j.jbef.2017.12.004
https://doi.org/10.1027/1015-5759/a000602
https://doi.org/10.1027/1015-5759/a000602
https://doi.org/10.1016/j.jesp.2017.01.006
https://doi.org/10.1525/collabra.25293
https://doi.org/10.1525/collabra.25293
http://journal.sjdm.org/15/15923a/jdm15923a.pdf
http://journal.sjdm.org/15/15923a/jdm15923a.pdf
https://doi.org/10.1038/s41586-021-03344-2
https://doi.org/10.1038/s41586-021-03344-2
https://doi.org/10.1177/0956797620939054
https://doi.org/10.1177/0956797620939054
https://doi.org/10.1016/j.cognition.2018.06.011
https://doi.org/10.1016/j.cognition.2018.06.011
https://doi.org/10.1111/jopy.12476
https://doi.org/10.1111/jopy.12476
https://doi.org/10.1016/j.tics.2021.02.007
https://doi.org/10.1016/j.tics.2021.02.007
https://doi.org/10.4324/9781315814919
https://doi.org/10.1007/11569596_31
https://doi.org/10.1007/11569596_31
https://arxiv.org/pdf/1604.07450.pdf
https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf
https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf
https://doi.org/10.1027/1015-5759/a000574
https://doi.org/10.1037/0033-2909.114.3.552
https://doi.org/10.1037/a0034434
https://doi.org/10.1371/journal.pone.0122245
https://doi.org/10.1371/journal.pone.0122245
https://cran.r-project.org/package=psych
https://doi.org/10.1037/pas0000754
https://doi.org/10.1037/pas0000754
https://doi.org/10.1177/0146167201272002
https://doi.org/10.31234/osf.io/b52um
https://doi.org/10.31234/osf.io/b52um
https://doi.org/10.1177/09567976211024535
http://journal.sjdm.org/22/220228/jdm220228.pdf
https://doi.org/10.1177/0013164420940378
https://doi.org/10.1177/0013164420940378
https://doi.org/10.1098/rsos.201199
https://doi.org/10.1098/rsos.201199
https://doi.org/10.1057/s41599-019-0279-9
https://doi.org/10.1057/s41599-019-0279-9
https://doi.org/10.37016/mr-2020-47
https://doi.org/10.1146/annurev-clinpsy-081219-115343
https://doi.org/10.18637/jss.v048.i02


1899Behavior Research Methods (2024) 56:1863–1899 

1 3

Sass, D. A., & Schmitt, T. A. (2010). A comparative investigation of 
rotation criteria within exploratory factor analysis. Multivariate 
Behavioral Research, 45(1), 73–103. https:// doi. org/ 10. 1080/ 
00273 17090 35048 10

Satorra, A. (2000). Scaled and adjusted restricted tests in multi-sample 
analysis of moment structures. In Innovations in multivariate 
statistical analysis (pp. 233–247). Springer. https:// doi. org/ 10. 
1007/ 978-1- 4615- 4603-0_ 17

Schmalbach, B., Irmer, J. P., & Schultze, M. (2019). ezCutoffs: Fit 
measure cutoffs in SEM. The Comprehensive R Archive Network. 
https:// cran.r- proje ct. org/ packa ge= ezCut offs

Schumacker, R. E., Lomax, R. G., & Schumacker, R. (2015). A begin-
ner’s guide to structural equation modeling (4th ed.). Routledge. 
https:// www. routl edge. com/A- Begin ners- Guide- to- Struc tural- 
Equat ion- Model ing- Fourth- Editi on/ Schum acker- Lomax- Schum 
acker- Lomax/p/ book/ 97811 38811 935. Accessed 10 Dec 2020.

Schwartz, L. M., Woloshin, S., Black, W. C., & Welch, H. G. (1997). 
The role of numeracy in understanding the benefit of screening 
mammography. Annals of Internal Medicine, 127(11), 966–972. 
https:// doi. org/ 10. 7326/ 0003- 4819- 127- 11- 19971 2010- 00003

Shi, D., DiStefano, C., McDaniel, H. L., & Jiang, Z. (2018). Examining chi-
square test statistics under conditions of large model size and ordinal 
data. Structural Equation Modeling: A Multidisciplinary Journal, 
25(6), 924–945. https:// doi. org/ 10. 1080/ 10705 511. 2018. 14496 53

Simms, L. J. (2008). Classical and modern methods of psychological 
scale construction. Social and Personality Psychology Compass, 
2(1), 414–433. https:// doi. org/ 10. 1111/j. 1751- 9004. 2007. 00044.x

Sindermann, C., Elhai, J. D., Moshagen, M., & Montag, C. (2020). Age, 
gender, personality, ideological attitudes and individual differences 
in a person’s news spectrum: How many and who might be prone 
to “filter bubbles” and “echo chambers” online? Heliyon, 6(1), 
Article e03214. https:// doi. org/ 10. 1016/j. heliy on. 2020. e03214

Soto, C. J., & John, O. P. (2017). Short and extra-short forms of the Big 
Five Inventory–2: The BFI-2-S and BFI-2-XS. Journal of Research 
in Personality, 68, 69–81. https:// doi. org/ 10. 1016/j. jrp. 2017. 02. 004

Steiner, M., & Grieder, S. (2020). EFAtools: An R package with fast 
and flexible implementations of exploratory factor analysis tools. 
Journal of Open Source Software, 5(53), 2521. https:// doi. org/ 
10. 21105/ joss. 02521

Strauss, M. E., & Smith, G. T. (2009). Construct validity: Advances in 
theory and methodology. Annual Review of Clinical Psychology, 
5, 1–25. https:// doi. org/ 10. 1146/ annur ev. clinp sy. 032408. 153639

Swami, V., Chamorro-Premuzic, T., & Furnham, A. (2010). Unanswered 
questions: A preliminary investigation of personality and individual 
difference predictors of 9/11 conspiracist beliefs. Applied Cogni-
tive Psychology, 24(6), 749–761. https:// doi. org/ 10. 1002/ acp. 1583

Swami, V., Furnham, A., Smyth, N., Weis, L., Lay, A., & Clow, A. 
(2016). Putting the stress on conspiracy theories: Examining 
associations between psychological stress, anxiety, and belief in 
conspiracy theories. Personality and Individual Differences, 99, 
72–76. https:// doi. org/ 10. 1016/j. paid. 2016. 04. 084

Swire, B., Berinsky, A. J., Lewandowsky, S., & Ecker, U. K. H. (2017). 
Processing political misinformation: Comprehending the Trump 
phenomenon. Royal Society Open Science, 4(3), 160802. https:// 
doi. org/ 10. 1098/ rsos. 160802

Tabachnick, B. G., & Fidell, L. S. (2007). Using multivariate sta-
tistics (5th ed.). Pearson. https:// psycn et. apa. org/ record/ 
2006- 03883- 000

Thalmayer, A. G., Saucier, G., & Eigenhuis, A. (2011). Compara-
tive validity of brief to medium-length Big Five and Big Six 
Personality Questionnaires. Psychological Assessment, 23(4), 
995–1009. https:// doi. org/ 10. 1037/ a0024 165

Thurstone, L. L. (1944). Second-order factors. Psychometrika, 9(2), 
71–100. https:// doi. org/ 10. 1007/ BF022 88715

Uenal, F., Sidanius, J., Maertens, R., Hudson, S. K. T., Davis, G., & 
Ghani, A. (2022). The roots of ecological dominance orientation: 
Assessing individual preferences for an anthropocentric and hier-
archically organized world. Journal of Environmental Psychol-
ogy, 81, 101783. https:// doi. org/ 10. 1016/j. jenvp. 2022. 101783

Van Bavel, J. J., Harris, E. A., Pärnamets, P., Rathje, S., Doell, K., & 
Tucker, J. A. (2020). Political psychology in the digital (mis)
information age: A model of news belief and sharing. PsyArXiv. 
https:// doi. org/ 10. 31234/ osf. io/ u5yts

van der Linden, S., Leiserowitz, A., Rosenthal, S., & Maibach, E. 
(2017). Inoculating the public against misinformation about cli-
mate change. Global Challenges, 1(2), 1600008. https:// doi. org/ 
10. 1002/ gch2. 20160 0008

van der Linden, S., & Roozenbeek, J. (2020). Psychological inoculation 
against fake news. In R. Greifeneder, M. Jaffé, E. J. Newman, 
& N. Schwarz (Eds.), The psychology of fake news: Accepting, 
sharing, and correcting misinformation. Routledge https:// www. 
routl edge. com/p/ book/ 97803 67271 831

van der Linden, S., Roozenbeek, J., Maertens, R., Basol, M., Kácha, 
O., Rathje, S., & Traberg, C. S. (2021). How can psychological 
science help counter the spread of fake news? The Spanish Jour-
nal of Psychology, 24, e25. https:// doi. org/ 10. 1017/ SJP. 2021. 23

Van Der Maas, H. L., Dolan, C. V., Grasman, R. P., Wicherts, J. M., 
Huizenga, H. M., & Raijmakers, M. E. (2006). A dynamical 
model of general intelligence: The positive manifold of intel-
ligence by mutualism. Psychological Review, 113(4), 842–861. 
https:// doi. org/ 10. 1037/ 0033- 295X. 113.4. 842

van Prooijen, J.-W., Krouwel, A. P. M., & Pollet, T. V. (2015). Political 
extremism predicts belief in conspiracy theories. Social Psycho-
logical and Personality Science, 6(5), 570–578. https:// doi. org/ 
10. 1177/ 19485 50614 567356

Von Neumann, J. (1927). Wahrscheinlichkeitstheoretischer Aufbau der 
Quantenmechanik. Nachrichten von Der Gesellschaft Der Wis-
senschaften Zu Göttingen, Mathematisch-Physikalische Klasse, 
1927, 245–272. http:// eudml. org/ doc/ 59230

Vosoughi, S., Roy, D., & Aral, S. (2018). The spread of true and false 
news online. Science, 359(6380), 1146–1151. https:// doi. org/ 10. 
1126/ scien ce. aap95 59

Weiner, I. B., Schinka, J. A., & Velicer, W. F. (2012). Handbook of 
psychology: Research methods in psychology (2nd ed., Vol. 2). 
John Wiley & Sons

Woolf, M. (2019) How to make custom AI-generated text with GPT-2. 
Max Woolf’s Blog. https:// minim axir. com/ 2019/ 09/ howto- gpt2/

Worthington, R. L., & Whittaker, T. A. (2006). Scale development 
research: A content analysis and recommendations for best prac-
tices. The Counseling Psychologist, 34(6), 806–838. https:// doi. 
org/ 10. 1177/ 00110 00006 288127

Zickar, M. J. (2020). Measurement development and evaluation. 
Annual Review of Organizational Psychology and Organiza-
tional Behavior, 7, 213–232. https:// doi. org/ 10. 1146/ annur ev- 
orgps ych- 012119- 044957

Open Practices Statement: Availability of data, code materials (data 
transparency) The supplements, data, and analysis scripts that 
support this paper’s findings, including Qualtrics files, analysis code, 
raw and clean datasets, and all research materials, are openly available 
on the Open Science Framework (OSF) at https:// osf. io/ r7phc/. 
Preregistrations are available on AsPredicted at https:// aspre dicted. 
org/ m7vb3. pdf (Study 1, T1), https:// aspre dicted. org/ js2jz. pdf (Study 
1, T2), and https:// aspre dicted. org/ nx7xu. pdf (Study 2B).

Publisher’s note Springer Nature remains neutral with regard to 
jurisdictional claims in published maps and institutional affiliations.

https://doi.org/10.1080/00273170903504810
https://doi.org/10.1080/00273170903504810
https://doi.org/10.1007/978-1-4615-4603-0_17
https://doi.org/10.1007/978-1-4615-4603-0_17
https://cran.r-project.org/package=ezCutoffs
https://www.routledge.com/A-Beginners-Guide-to-Structural-Equation-Modeling-Fourth-Edition/Schumacker-Lomax-Schumacker-Lomax/p/book/9781138811935
https://www.routledge.com/A-Beginners-Guide-to-Structural-Equation-Modeling-Fourth-Edition/Schumacker-Lomax-Schumacker-Lomax/p/book/9781138811935
https://www.routledge.com/A-Beginners-Guide-to-Structural-Equation-Modeling-Fourth-Edition/Schumacker-Lomax-Schumacker-Lomax/p/book/9781138811935
https://doi.org/10.7326/0003-4819-127-11-199712010-00003
https://doi.org/10.1080/10705511.2018.1449653
https://doi.org/10.1111/j.1751-9004.2007.00044.x
https://doi.org/10.1016/j.heliyon.2020.e03214
https://doi.org/10.1016/j.jrp.2017.02.004
https://doi.org/10.21105/joss.02521
https://doi.org/10.21105/joss.02521
https://doi.org/10.1146/annurev.clinpsy.032408.153639
https://doi.org/10.1002/acp.1583
https://doi.org/10.1016/j.paid.2016.04.084
https://doi.org/10.1098/rsos.160802
https://doi.org/10.1098/rsos.160802
https://psycnet.apa.org/record/2006-03883-000
https://psycnet.apa.org/record/2006-03883-000
https://doi.org/10.1037/a0024165
https://doi.org/10.1007/BF02288715
https://doi.org/10.1016/j.jenvp.2022.101783
https://doi.org/10.31234/osf.io/u5yts
https://doi.org/10.1002/gch2.201600008
https://doi.org/10.1002/gch2.201600008
https://www.routledge.com/p/book/9780367271831
https://www.routledge.com/p/book/9780367271831
https://doi.org/10.1017/SJP.2021.23
https://doi.org/10.1037/0033-295X.113.4.842
https://doi.org/10.1177/1948550614567356
https://doi.org/10.1177/1948550614567356
http://eudml.org/doc/59230
https://doi.org/10.1126/science.aap9559
https://doi.org/10.1126/science.aap9559
https://minimaxir.com/2019/09/howto-gpt2/
https://doi.org/10.1177/0011000006288127
https://doi.org/10.1177/0011000006288127
https://doi.org/10.1146/annurev-orgpsych-012119-044957
https://doi.org/10.1146/annurev-orgpsych-012119-044957
https://osf.io/r7phc/
https://aspredicted.org/m7vb3.pdf
https://aspredicted.org/m7vb3.pdf
https://aspredicted.org/js2jz.pdf
https://aspredicted.org/nx7xu.pdf

	The Misinformation Susceptibility Test (MIST): A psychometrically validated measure of news veracity discernment
	Abstract
	Inconsistent interpretation and the need for a new measurement instrument
	The present research
	Towards a universal conceptualization and measurement: The Verification done framework
	The Misinformation Susceptibility Test

	Study 1: Development—Scale construction, exploratory analyses, and psychometric properties
	Method
	Preparatory steps
	Phase 1: Item generation
	Implementation
	Procedure, measures, transparency, and openness
	Analytical strategy 1: Exploratory factor analysis (EFA) and item response theory (IRT)
	Analytical strategy 2: Exploratory graph analysis (EGA)

	Results
	EFAIRT results

	EGA results
	Discussion

	Study 2: Validation—Confirmatory analyses, nomological net, and national norms
	Method: MIST-20MIST-8
	Participants
	Procedure and measures
	Analytical strategy

	Method: MIST-16
	Participants
	Analytical strategy

	Results: MIST-20MIST-8
	Internal consistency
	Nomological network26
	National norms
	Results: MIST-16

	Discussion

	Study 3: Application—A nuanced effectiveness evaluation of a popular media literacy intervention
	Method
	Participants
	Procedure and measures
	Analytical strategy

	Results
	Baseline
	Hypothesis tests

	Discussion

	General discussion
	Implementation
	Open-Source web application
	Limitations and future research
	Conclusion

	Author note 
	References