This section intends to give a high level explanation to the code that follows. The information provided in this section does NOT supercede the methods section of the report - please always regard to the information provided in the report as final.
We checked mainly for the effect of 2 independent variables on the responses: - Gender - men vs women (variable name: All_Gender_clean) - Career stages - PhDs vs Postdocs vs Early career PIs vs mid-career PIs vs late-career PIs (variable: Supporter_CareerStage_clean) We had respondents of other genders and career stages, but for this report, we chose to only analyse these groups because they had a reasonable and comparable number of samples. (If you’d like to look into other groups specifically, you’re more than welcomed!)
For some questions we also checked the effect of whether someone is a supporter on their response (variable name: Provided_Support_clean). Supporter status is determined based on their response to the statemnet “I have provided support to someone who was doing research and who was struggling with their mental health”. We compared respondents who answered “No” and one of the “yes” options to this statement. For some questions we also wanted to see if early career PI stands up from other career stages. For this, we run either (1) Kruskal-Wallis test between early PI and others pooled together; or (2) pairwise comparison between early PIs and all other groups. In the second case, Sidak correction was applied posthoc. ### Dependent variables, i.e. the response A wide variety of questions were asked - we focussed our statistical analyses on two types of responses (or responses that could be logically recoded to one of the following two types): - Categorical, and in most cases, binary, i.e. Yes/No - Ordinal, i.e. Likert scale type responses
For categorical responses, we modelled response using (binomial) logistic regression. We ran 3 seperate models - one with genders as predictors, the second with career stages (CS) as a predictors, and the final with both as predictors. For gender, men is always the baseline indicator variable, and for CS, PhD is always the baseline indicator variable. The first two models provide the reported p-values, odds ratios (OR, obtained with exp(coef()) and confidence intervals of the OR (two-sided 95% confidenece, obtained with exp(confint())). This final factorial logistic regression allows us to check for interactions between gender and CS.
For ordinal responses, we modelled responses using ordinal logistic regression (OLR). We ran 2 seperate models - one with genders as predictors, the second with career stages (CS) as a predictors. For gender, men is always the baseline indicator variable, and for CS, PhD is always the baseline indicator variable. These give the ORs and CIs reported. As the ordinal responses we’ve obtained are usually not normally distributed (code not reported here, but normality was tested using Shapiro.test()), we chose to use the Kruskal-Wallis test to test for the effect of gender/CS on the ordinal responses. To check for any interactions between the two independent variables we also split the data into groups by one independent variable and used the chi-square test to check for the effect of the other independent variable (i.e. within all women, is there an effect of CS, etc).
Load and inspect data structure
data=read.csv("../mh-data/cleandata2604 REDACTED.csv")
# colnames(data)
##Filtering the data Looking at only 2 independent variables:
Gender (label: “All_Gender_clean”): Man -1, Woman -3
Career stage (label “Supporter_CareerStage_clean”):
genderCS=data[data$All_Gender_clean %in% c(1,3) & data$Supporter_CareerStage_clean %in% c(3,4,6,7,8), ]
genderCS$All_Gender_clean=as.factor(genderCS$All_Gender_clean)
genderCS$Supporter_CareerStage_clean=as.factor(genderCS$Supporter_CareerStage_clean)
dim(genderCS)
## [1] 1255 196
levels(genderCS$All_Gender_clean)=c("Men", "Women")
levels(genderCS$Supporter_CareerStage_clean)=c("PhD students", "Postdocs", "Group leaders (<5yr)", "Group leaders (5-10yr)", "Group leaders (>10yr)")
eGreen="#346A2D"
eLime="#7DB441"
eBlue="#06589C"
eSky="#2997D4"
ePurple="#881350"
eFuschia="#D81F62"
eGrey="#666B6E"
ePalette=c(eGreen, eLime, eGrey, eSky, eBlue)
chi-square
gender=genderCS[,c("Supporter_CareerStage_clean","All_Gender_clean")]
dim(gender)[1] #this gives N
## [1] 1255
table(gender)
## All_Gender_clean
## Supporter_CareerStage_clean Men Women
## PhD students 160 374
## Postdocs 123 233
## Group leaders (<5yr) 66 89
## Group leaders (5-10yr) 48 43
## Group leaders (>10yr) 77 42
chisq.test(table(gender))
##
## Pearson's Chi-squared test
##
## data: table(gender)
## X-squared = 62.364, df = 4, p-value = 9.236e-13
gender$Supporter_CareerStage_clean=factor(gender$Supporter_CareerStage_clean) #be careful when rerunning this line
# This als all other eps images will be output to your code directory inside the images sub-folder
#eps("images/CareerStage_gender.eps", width=1000, height=578)
graphdata = gender %>%
group_by(Supporter_CareerStage_clean, All_Gender_clean) %>%
summarize(n=n()) %>%
mutate(perc=n*100/sum(n))
## `summarise()` regrouping output by 'Supporter_CareerStage_clean' (override with `.groups` argument)
ggplot(graphdata, aes(x=Supporter_CareerStage_clean, y=perc, fill=All_Gender_clean)) +
geom_bar(stat="identity") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5), text=element_text(size=20))+
geom_text(aes(label=round(perc, digit=1)), size=8, position=position_stack(vjust=0.5), color="white") +
labs(x="", y="Percentage", size=3, title="During my supporting role, I was:") +
guides(fill=guide_legend(reverse=TRUE)) +
guides(fill=guide_legend(reverse=TRUE)) +
coord_flip() +
scale_fill_manual(name="", values=c(eFuschia, ePurple))