Abbreviations used in the markdown/comments of this script

  • LR: Logistic regression
  • FLR: Factorial Logistic regreesion
  • OLR: Ordinal logistic regression
  • KW: Kruskal-Wallis test
  • OR: Odds ratio
  • CI: Confidence interval (two-sided 95% confidence)
  • glm: Generalised linear model
  • CS: career stage
  • PNTA: Prefer not to answer

A quick introduction to the statistical tests and choices

This section intends to give a high level explanation to the code that follows. The information provided in this section does NOT supercede the methods section of the report - please always regard to the information provided in the report as final.

Independent variables

We checked mainly for the effect of 2 independent variables on the responses: - Gender - men vs women (variable name: All_Gender_clean) - Career stages - PhDs vs Postdocs vs Early career PIs vs mid-career PIs vs late-career PIs (variable: Supporter_CareerStage_clean) We had respondents of other genders and career stages, but for this report, we chose to only analyse these groups because they had a reasonable and comparable number of samples. (If you’d like to look into other groups specifically, you’re more than welcomed!)

For some questions we also checked the effect of whether someone is a supporter on their response (variable name: Provided_Support_clean). Supporter status is determined based on their response to the statemnet “I have provided support to someone who was doing research and who was struggling with their mental health”. We compared respondents who answered “No” and one of the “yes” options to this statement. For some questions we also wanted to see if early career PI stands up from other career stages. For this, we run either (1) Kruskal-Wallis test between early PI and others pooled together; or (2) pairwise comparison between early PIs and all other groups. In the second case, Sidak correction was applied posthoc. ### Dependent variables, i.e. the response A wide variety of questions were asked - we focussed our statistical analyses on two types of responses (or responses that could be logically recoded to one of the following two types): - Categorical, and in most cases, binary, i.e. Yes/No - Ordinal, i.e. Likert scale type responses

For categorical responses, we modelled response using (binomial) logistic regression. We ran 3 seperate models - one with genders as predictors, the second with career stages (CS) as a predictors, and the final with both as predictors. For gender, men is always the baseline indicator variable, and for CS, PhD is always the baseline indicator variable. The first two models provide the reported p-values, odds ratios (OR, obtained with exp(coef()) and confidence intervals of the OR (two-sided 95% confidenece, obtained with exp(confint())). This final factorial logistic regression allows us to check for interactions between gender and CS.

For ordinal responses, we modelled responses using ordinal logistic regression (OLR). We ran 2 seperate models - one with genders as predictors, the second with career stages (CS) as a predictors. For gender, men is always the baseline indicator variable, and for CS, PhD is always the baseline indicator variable. These give the ORs and CIs reported. As the ordinal responses we’ve obtained are usually not normally distributed (code not reported here, but normality was tested using Shapiro.test()), we chose to use the Kruskal-Wallis test to test for the effect of gender/CS on the ordinal responses. To check for any interactions between the two independent variables we also split the data into groups by one independent variable and used the chi-square test to check for the effect of the other independent variable (i.e. within all women, is there an effect of CS, etc).

Initialising

Load and inspect data structure

data=read.csv("../mh-data/cleandata2604 REDACTED.csv")
# colnames(data)

##Filtering the data Looking at only 2 independent variables:

  1. Gender (label: “All_Gender_clean”): Man -1, Woman -3

  2. Career stage (label “Supporter_CareerStage_clean”):

  • A PhD student-3;
  • A postdoctoral researcher (postdoc) - 4;
  • A group leader (less than 5 years’ experience)- 6
  • A group leader (5 to 10 years’ experience) -7
  • A group leader (10 or more years’ experience) -8
genderCS=data[data$All_Gender_clean %in% c(1,3) & data$Supporter_CareerStage_clean %in% c(3,4,6,7,8), ]
genderCS$All_Gender_clean=as.factor(genderCS$All_Gender_clean)
genderCS$Supporter_CareerStage_clean=as.factor(genderCS$Supporter_CareerStage_clean)
dim(genderCS)
## [1] 1255  196
levels(genderCS$All_Gender_clean)=c("Men", "Women")
levels(genderCS$Supporter_CareerStage_clean)=c("PhD students", "Postdocs", "Group leaders (<5yr)", "Group leaders (5-10yr)", "Group leaders (>10yr)")

the eLife colour palette

eGreen="#346A2D"
eLime="#7DB441"
eBlue="#06589C"
eSky="#2997D4"
ePurple="#881350"
eFuschia="#D81F62"
eGrey="#666B6E"

ePalette=c(eGreen, eLime, eGrey, eSky, eBlue)

Q28xQ04 Gender x Career stage (CS) (Section 2, Panel 1)

chi-square

gender=genderCS[,c("Supporter_CareerStage_clean","All_Gender_clean")]
dim(gender)[1] #this gives N
## [1] 1255
table(gender)
##                            All_Gender_clean
## Supporter_CareerStage_clean Men Women
##      PhD students           160   374
##      Postdocs               123   233
##      Group leaders (<5yr)    66    89
##      Group leaders (5-10yr)  48    43
##      Group leaders (>10yr)   77    42
chisq.test(table(gender))
## 
##  Pearson's Chi-squared test
## 
## data:  table(gender)
## X-squared = 62.364, df = 4, p-value = 9.236e-13

Q28xQ04 Gender x Career stage (CS) (Graph 2.1.2)

gender$Supporter_CareerStage_clean=factor(gender$Supporter_CareerStage_clean) #be careful when rerunning this line

# This als all other eps images will be output to your code directory inside the images sub-folder
#eps("images/CareerStage_gender.eps", width=1000, height=578)
graphdata = gender %>%
  group_by(Supporter_CareerStage_clean, All_Gender_clean) %>%
  summarize(n=n()) %>%
  mutate(perc=n*100/sum(n))
## `summarise()` regrouping output by 'Supporter_CareerStage_clean' (override with `.groups` argument)
ggplot(graphdata, aes(x=Supporter_CareerStage_clean, y=perc, fill=All_Gender_clean)) +
  geom_bar(stat="identity") +
  theme_minimal() +
    theme(plot.title = element_text(hjust = 0.5), text=element_text(size=20))+
  geom_text(aes(label=round(perc, digit=1)), size=8, position=position_stack(vjust=0.5), color="white") +
  labs(x="", y="Percentage", size=3, title="During my supporting role, I was:") +
    guides(fill=guide_legend(reverse=TRUE)) +
  guides(fill=guide_legend(reverse=TRUE)) +
  coord_flip() +
  scale_fill_manual(name="", values=c(eFuschia, ePurple))