_{2.5}concentration and rate of change in COVID-19 infection in provincial capital cities in China

This study investigates thoroughly whether acute exposure to outdoor PM_{2.5} concentration, P, modifies the rate of change in the daily number of COVID-19 infections (R) across 18 high infection provincial capitals in China, including Wuhan. A best-fit multiple linear regression model was constructed to model the relationship between P and R, from 1 January to 20 March 2020, after accounting for meteorology, net move-in mobility (NM), time trend (T), co-morbidity (CM), and the time-lag effects. Regression analysis shows that P ^{3} increase in P gives a 1.5% increase in R (

These authors contributed equally: Yang Han, Jacqueline C. K. Lam and Victor O. K. Li.

COVID-19 was first reported in Wuhan, China in December 2019. Since then, more than 116-million infections have been reported, resulting in 2-million deaths globally.

Recent COVID-19 studies have investigated whether demography (D), co-morbidity (CM), meteorology, and lockdown have effects on viral infection^{1–4}. Consistent with studies in SARS and MERS, depressed temperatures and rising humidity have been found to increase COVID-19 transmission^{5,6}. Furthermore, influenza studies have suggested that exposure to PM_{2.5} (P) with and without interacting with meteorology may increase the risks of influenza infection^{7}. In the US and Europe, chronic exposure to P and NO_{2} are linked to COVID-19 mortality^{8,9}. Air pollution is considered to heighten the severity of COVID-19 infection, given that pollutants, such as P, may increase the risk of Vitamin-D deficiency and decrease immunity^{10}. Increasingly, evidence suggests that air pollution is a significant contributor to COVID-19 infection^{11–16}. Studies undertaken in China have concluded that P, NO_{2}, and O_{3} associate with increased incidence of COVID-19 infections^{17}, with significant interaction between air quality index (AQI) and rising temperature identified^{18}. However, these studies have failed to fully account for the change in testing capacity and the inconsistency in COVID-19 case definition, as well as the confounding effects of D and CM. A few studies in Italy have explored the correlation relationship between the COVID-19 cases and the PM_{2.5} and PM_{10} levels without controlling potential confounders, such as mobility^{19,20}. A more sophisticated and rigorous study recently conducted in Italy has utilized doubling-time derived from a fitted epidemic curve to measure COVID-19 transmission while reducing the noise of the observed data^{21}. Without taking into account potential confounders, this study concludes that P alone does not facilitate COVID-19 transmission within the most affected regions^{21}. However, one UK study has argued for a positive relationship between P and COVID-19 infection, after controlling for confounders, including population density, age, sex, diabetes, smoking-status, and cancer^{22}. These indicate potential challenges in assessing acute P exposure effects on COVID-19 infection in China, given the existence of noise and irregularities underlying the epidemic trends, the lack of control of confounders to P exposure, and the lack of sophisticated models to control these data challenges. More rigorous statistical modelling and control methodologies are needed to reduce (1) the noise underlying the epidemic trends due to the lack of testing capacity and redefinition of confirmed cases, (2) the confounding biases that affect the causal link between P and COVID-19 infection, and (3) the collinearities across different meteorology, D, and CM variables.

In this study, we will examine the effect of P on the rate of change in the daily number of COVID-19 confirmed infections (R), across 18 high infection provincial capital cities in China, while addressing inadequacies in official case reporting due to the lack of testing capacity and inconsistencies in case definition, and taking into account confounders, including D, CM, meteorology, net move-in mobility (NM), time lag due to the incubation period, trends over time (T), and day-of-the-week (DOW) to reflect the recurrent weekly effect (see Table

Outdoor P is chosen as the focus of our study given the assumption that R may be increased due to the potential deposition of viral droplets on P^{23}. A recent rigorous study on COVID-19 aerodynamics has ascertained that viral aerosol droplets 0.25–1 µm in size can remain suspended in the air^{24}. When such viral droplets are combined with suspending particles, P, they can travel greater distances, remain viable in the air for hours, and be inhaled deeply into the lungs, thus increasing the potential of airborne viral infection^{25}.

Our study sheds new light on the effect of P in an outdoor environment, the interaction effect between P and absolute humidity (AH), and the effect of NM (lockdown), on R (the dependent variable). Our work reinforces the observation that COVID-19 droplets are airborne^{24,26}, can suspend in the air and combine with the particulates, promoting infection via the airborne transmission pathway^{27}.

We collected data, including the number of confirmed COVID-19 cases, PM_{2.5} pollution, meteorology, mobility, demographics, and co-morbidities, in 31 cities in China, covering the period from 1 January to 20 March 2020 (see “Data collection and procedure” for more details). The spatial distributions of population and COVID-19 infection in these cities are shown in Fig.

source Python library, pyecharts (version number: 1.9.0, URL:

The sample data selection methodology is as follows: after data pre-processing, the number of data points became 412. For the 18 provincial capital cities, any city having more than 50 observations would then be selected. After adjusting the infection data, the rate of change in daily COVID-19 infections, R, was calculated, leading to a total of 1440 data points (80 days × 18 cities). 426 valid data points were obtained, given that many daily R values were unavailable due to (1) the number of infections at Day_{t} or Day_{t-1} were not reported (e.g., all cities except Wuhan have started reporting COVID-19 infections from late January 2020) and (2) zero infection was reported at Day_{t-1} (i.e., zero denominator). The adjusted R demonstrated a less significant Shapiro–Wilk normality test statistic (

After accounting for a one-day time-lag variable representing R of the previous day and the important confounding factors, including D, CM, meteorology, NM, and T, the best-fit stepwise regression model was constructed using data from all 18 high infection provincial capital cities in China. The results of the statistically significant independent variables (

Statistically significant independent variables that associate with dependent variable R across all 18 high infection provincial capital cities in China from 1 January to 20 March 2020.

Dependent variable: R | Number of observations: | ||
---|---|---|---|

Number of independent variables: 8 | Adjusted R | ||

Independent variable | Coefficient with 95% CI | Standardized coefficient (β) | |

Intercept | −6.846 × 10 (−1.970 × 10 | 0.2957 | |

R | 2.510 × 10 (1.700 × 10 | 0.2725 | 2.52 × 10 |

NM | 1.470 × 10 (5.133 × 10 | 0.1383 | 0.0027** |

P | 2.208 × 10 (1.244 × 10 | 0.4309 | 8.77 × 10 |

AH | 1.751 × 10 (6.243 × 10 | 0.2476 | 0.0024** |

T | −6.599 × 10 (−8.091 × 10 | −0.3870 | < 2*10 |

GDP | 5.545 × 10 (1.075 × 10 | 0.1115 | 0.0152* |

Asthma | 9.024 × 10 (2.677 × 10 | 0.1273 | 0.0054** |

P | −3.779 × 10 (−5.903 × 10 | −0.2237 | 0.0005*** |

1. P, AH, and NM are lagged and averaged by

2. The standardized coefficient (also referred to as β coefficient) is calculated by multiplying the original regression coefficient by the ratio of the independent variable’s standard deviation to the dependent variable’s standard deviation.

3. *

Significant P, AH, and P x AH determining R across 18 high infection provincial capital cities in China.

As shown in Table ^{3} increase in P is associated with a 1.5% increase in R (Coefficient = 0.0015, ^{–2}, ^{–2}, ^{–3}, ^{–7}, ^{–4}, _{2.5} cut-off value set by China’s National Ambient Air Quality Standard^{28}, in the days with a higher P (≥ 75 µg/m^{3}) result in a 12.8% increase in R compared to the days with a lower P (< 75 µg/m^{3}), after controlling for the important confounding factors including AH, NM, and T (see Table

Moreover, the interaction between P and AH is significant across 18 high infection provincial capital cities in China (Coefficient = −3.779 × 10^{–4}, ^{3}, a higher P and AH gave a higher R. When AH is ≥ 5.8 g/m^{3}, a higher P and AH result in a lower R. As shown in the left part of Fig. ^{3} and P is 170 µg/m^{3}; whereas R (the rate of change in COVID-19 infection) is maximized when AH is 0.9 g/m^{3} and P is 170 µg/m^{3} (see Fig.

Relationship between P, AH, and R, based on the observed range of P and AH and the predicted R from the best-fit regression model, for 18 high infection provincial capital cities in China.

Furthermore, when looking at the strength of the statistical relationship using standardized coefficient, for 18 high infection provincial capital cities in China, P, T, AH, P × AH, and NM are more important determinants of R (in descending order) when compared to D and CM, based on the values of β. Specifically, our regression analysis has shown that P

Finally, when examining the multivariate normality assumption for the regression model listed in Table ^{29}.

Recent COVID-19 studies have investigated whether D, CM, meteorology, and lockdown affect viral infection and have ascertained that meteorological events can alter COVID-19 transmission^{2}. Earlier studies have suggested that exposure to P can also increase influenza infection rates and identified PM_{10} and meteorological effects as risk-factors for SARS/MERS. In the US and Europe, long-term exposures to P and NO_{2} have been reported as the determinants of COVID-19 mortality, and evidence from China and Italy implicate air pollution as an attributor to COVID-19 infection. While previous research in China has concluded that P is associated with COVID-19 infection, it has yet to fully account for the changes in testing capacity, the inadequacy in confirmed case definition, and the confounding effects of D and CM. Recent studies have pointed towards the significant potential of COVID-19 transmission through the airborne pathway^{30}.

To identify whether P affects R across 18 high infection provincial cities in China, including Wuhan, our regression model has accounted for all high potential confounders, including meteorological variables, NM, D at the provincial or city level, and CM at the provincial level, including eight major diseases that potentially decrease immunities and increase the risks of COVID-19 infection^{31,32}. In addition, the time-lag effect on P, meteorology, and NM, have been addressed.

In particular, P with a lagged time of 14 days determines R, for all 18 high infection provincial capital cities in China, after accounting for the confounders/covariates. The higher the P value, the higher the R value. This implies that for one to reduce the COVID infection rate of change (R), the outdoor PM_{2.5} pollution concentration (P) across 18 Chinese provincial cities should be reduced. A 10 µg/m^{3} reduction in P will lead to 0.022 reduction in R after accounting for the covariates (see Table ^{33}, both indoors and outdoors, and improving air ventilation^{34}, can help reduce P and reduce R. In particular, we recommend different methods of mechanical ventilation, including the installation of fans along with HEPA filters on the windows or within the air ducts to purify outdoor and ambient air. In this ventilation scheme, a slight negative pressure can be maintained to reduce the level of humidity and PM condensation, which in turn deters the viral load. If mechanical ventilation is less likely or not possible, then wind-driven natural ventilation is preferred for windows and other openings, alongside the use of pollution filters. Further, cross and stack ventilation will facilitate the smooth inflow of pollutant-free fresh clean air^{35}. Moreover, putting P aside, given that AH and P x AH are important determinants of R, adjusting AH and P appropriately within a reasonable range (0 µg/m^{3} < P < 170 µg/m^{3} and 5.8 g/m^{3} < AH < 11.5 g/m^{3}) can help reduce R substantially (see Fig.

Further, NM and T are significant determinants of R in China. An increase in mobility within the provincial capital cities would increase R, whilst a decrease in mobility can reduce R. Finally, D and CM are less significant determinants of R when compared to P, AH, P × AH and NM. Having said so, GDP per capita is singled out as a significant D determinant of R whilst Asthma is singled out as a significant CM determinant of R. This implies that provincial cities having a higher GDP per capita (the more affluent cities) have a higher R (more infectious), whilst provincial cities having a higher burden of asthma (in DALY) are also more vulnerable to COVID-19 attacks, as asthma is often linked to airway inflammation and may increase COVID-19 susceptibility. Currently, only aggregate and annual D and CM data have been used for our regression analysis. Future study can make good use of D and CM data of higher temporal-spatial resolutions to provide us with better insights on how D and CM affect R across 18 high provincial capital cities in China.

Our model offers numerous advantages over those proposed in the previous literature covering air-pollution related COVID-19 epidemiological studies in four ways. First, instead of observing the absolute number of infections, which may be inaccurate due to possible human or systemic deficiency (related to testing methods and changes in case definition), our study examines R, the rate of change in COVID-19 infection (see “Data collection and procedure”). R can more sufficiently reflect the relative change in infection numbers, if the adjusted COVID-19 infection trends are consistent. Our focus on R instead of the actual infection number thereby provides much greater resolving power when compared to the previous air pollution and COVID-19 infection/mortality studies, which focus on the absolute number of infections^{17,18} instead^{36}.

Second, our study addresses a wide spectrum of confounders that can affect observations concerning the effect of P on R, including key meteorological, NM, D, and CM variables. This stands in contrast to the existing works that explore the effects of air pollution on COVID-19 infection/mortality by controlling for only the meteorological variables^{17,37}, or the meteorological variables and simple D variables, without considering the lockdown and CM variables^{38}. Furthermore, while taking into account the confounding effects, our work also addresses the issues of non-linearity, collinearity, and time-lag (see “Data pre-processing”). This is particularly critical for precision modelling when (1) the statistical relationships between meteorology and R can be non-linear, (2) certain covariates among the meteorology, demographics, or co-morbidity variables can be collinear, and (3) the short-term effects of P, meteorology, and NM on R can be time-lagged due to the incubation period for COVID-19. By testing non-linearity and collinearity, and by accounting for the time-lag between some of our confounders and R, our model provides a more reliable and rigorous scientific explanation concerning how and when P will determine R across 18 high infection provincial capital cities in China, including Wuhan, in contrast to other prior air pollution-related COVID-19 infection/mortality models which have yet accounted for these confounding/covariate issues^{8,38}.

Third, this is the first study that pursues the individual effects of P and AH on R, as well as the interaction effect of P and AH on R, covering 18 high infection provincial capital cities in China. Our study ascertains that a higher P increases R, and a higher AH decreases R. A 10 µg/m^{3} increase in P is associated with a 1.5% increase in R in China on average (^{3}, a higher P still leads to a higher R (see Fig. ^{3}, the effect of a higher P on R is counteracted by the effect of AH on R. When AH is 11.5 g/m^{3} and P is 170 µg/m^{3}, a minimum R is achieved. When AH is 0.9 g/m^{3} and P is 170 µg/m^{3}, a maximum R is achieved (see Fig.

Finally, to the best of our understanding, this is the first international study that demonstrates a causal relationship between P and R across 18 high infection Chinese provincial capital cities, via matching. Each high P exposure day is matched with a low P exposure day sharing similar background covariates such as AH and NM to estimate the causal effect (Appendix p 8). This causal relationship between immediate P exposure and R (i.e. a higher P can increase R, see Table ^{23}, further substantiates the recent observations concerning the risks of airborne infection^{24,26}.

Although PM_{2.5} levels are low globally, they remain high in China. It has been estimated that the reduction of PM_{2.5} concentration due to lockdown during the specified period across the provincial capital cities of China is 9.7%, which remains small as compared to the 15.4% reduction of NO_{2} concentration^{39}. With such reduction, PM_{2.5} level in most of these cities will still fail to meet the WHO standards. For example, the daily PM_{2.5} level in Shanghai is more than four times over the WHO threshold limit (10 µg/m^{3})^{40}. The high PM_{2.5} concentration during the lockdown period confirms that the contribution of PM_{2.5} from the transportation sector is small, while the PM_{2.5} level generated from industrial production and residential coal combustion are much larger, and should be properly controlled if we want to reduce the PM_{2.5} level and hence COVID-infection in China^{39}.

The COVID-19 transmission is primarily human-driven and the previous day infection along with human mobility are important factors for predicting R. Based on β, our results suggest that P is the most significant one predicting R during the first wave of COVID-19 in China, within the data range collected for this study (i.e., daily city-level PM_{2.5} concentration ranging from 2.6 μg/m^{3} to 208.4 μg/m^{3}). Such findings are consistent with current studies that examine the effects of air pollution on R during the initial stage of outbreak. A cross-county study in US suggests that PM_{2.5} pollution is a more significant contributor to R during the early outbreak, when compared to population density^{41}. A cross-country study suggests that PM_{2.5} is one of the most significant factors that associates with R during the early-stage outbreak across the world^{42}. Non-pharmaceutical interventions that target to reduce human-to-human contact, such as school closure and stay-at-home order, are less significant as compared to R during the early-stage outbreak^{42}. Nevertheless, when the number of COVID-19 infections reaches a certain threshold, the impact of PM_{2.5} on R is likely to be reduced to the minimal, when compared to factors such as the number of current infection cases.

All in all, increasing the risk of airborne COVID-19 viral infection is too high a cost to be ignored. Proper public health measures, such as mandating citizens to wear masks, are highly recommended to protect one from contracting COVID-19 via the viral-particulate transmission pathway, especially for countries of high population densities and mobilities, and high ambient particulate concentrations. Further, reducing the ambient PM_{2.5} particulate concentrations can substantially reduce the chance of COVID-19 infection. The installation of air purifiers and air ventilation improvement are recommended to reduce the effect of P on R. Meanwhile, after taking in account the number of days required for official reporting, given that the best fit linear regression model is yielded at the 14-day time-lag interval, P, AH, P × AH and NM values obtained 14 days prior to COVID-19 infection of the day can serve as the best determinants of R of the day. A 14-day time lag for best determining R suggests a 14-day incubation period is needed for any COVID-19 patient to become symptomatic in China, based on the COVID-19 data obtained during the first wave of COVID-19 infection in China. This shall serve as an important piece of public health information, regarding the number of days needed for quarantine for rigorous COVID-19 detection and control.

The current study presents certain limitations, which can be addressed in future studies: First, study that explore the causal relationship of the variables cannot be done properly when observational data with potential confounding biases are being used^{43}. Spurious positive correlations are more likely found in non-stationary epidemiologic time series data^{44}. The current study has incorporated the relevant confounders as much as possible and has adopted the matching method to further reduce the confounding effects. However, our preliminarily determined causal relationship may deserve further verification given that relevant epidemiological variables included in the regression model are yet to exhaustive. In the future, advanced causal inference techniques, such as instrumental variables estimation, can be used to further account for any unobserved confounding factors. Second, when analysing a wide variety of phenomena, it is possible to run into the look-elsewhere effect (also known as the multiple comparison problem)^{45}. The current study adopts a stepwise regression approach in search of a set of significant variables for the best-fit model. The selection of significant variables involves multiple statistical tests and may be less robust due to the look-elsewhere effect^{46}. In the future, bootstrap cross-validation techniques can be adopted to improve the robustness of model selection^{47}. Finally, our study considers the incubation period as an interval ranging from 1 to 14 days, based on a uniform probability distribution. Given that the incubation period could have a more sophisticated distribution, more advanced statistical models using the Bayesian framework^{48} could be investigated to better account for the non-uniform distribution of the incubation period.

We collected data covering the daily P and the daily number of confirmed infections across 31 provincial capital cities in China, covering the period from 1 January to 20 March 2020 (see Fig.

Research objectives and procedures.

Primary objective | 1. Explore the statistical relationship, and determine the causal effects, if any, between daily outdoor P (PM 2. To achieve this objective, we built two statistical models that can best address the following challenges in statistical analysis: (a) Redefinitions and potential delays in infection case reporting (b) Incubation period (c) Confounders and confounding biases, including meteorology, mobility/lockdown, demographic, co-morbidity, and time-trends (d) Collinearity (e) Linear relationship (f) Interaction between P and meteorology |

Secondary objective | 3. Highlight the conditions under which R can be reduced, and effective public health measures that can be employed to facilitate this 4. Add weight to the current observations that COVID-19 can be airborne and that particulates can be carriers of the viral droplets |

Earlier COVID-19 studies expressed reservations concerning the number of infection cases reported, given inadequate testing capacity, the change in confirmed case definition, and undiscovered and undocumented asymptomatic cases^{3,49,50}. In order to address the delay in testing capacity and the change in case definition and their effects on reported cases, we used R, rate of change, as the dependent variable, in order to capture the relative change in COVID-19 infection during the study period. By using R, even if the number of reported infections might deviate, the relative change in infection could still be accounted for, provided that the reporting trends remain consistent.

Moreover, to remove the potential errors due to outliers and irregularities observed from the COVID-19 reported trends, a four-step data cleaning procedure was applied. First, 13 cities with a cumulative number of confirmed cases less than 50 were removed due to small sample size. This cut-off value was based the assumption that at least five types of independent variables should be taken into account in our model (including P, meteorology, NM, D, and CM) and that each independent variable requires at least ten samples for valid statistical analysis. As a result, only 18 high infection provincial capital cities had been selected for our statistical study. Second, for each city, to address the potential delay between the onset and the confirmation of COVID-19 infection, the adjusted daily confirmed COVID-19 infection cases were calculated by a rolling window of the observed daily confirmed cases reported in the following ^{50}) and to account for the day-of-week fluctuations in case reporting. Further, any reported COVID-19 cases of zero value were removed, with the assumption that during the period of COVID-19 spread in China, the number of infection cases added per day would be greater than zero. Finally, for each selected city, daily R values were calculated throughout the study period (see Eq.

We conducted statistical analysis in three steps. First, using stepwise multiple linear regression, a main effects model (i.e., without any interaction terms), including only the statistically significant variables in determining R, was constructed to model the relationship between daily outdoor P and daily R across 18 high infection provincial capital cities in China, while addressing the issues of collinearity and confounding brought by other independent variables. Second, to take into account the potential interaction effects between P and other significant meteorological and NM variables, the significant interaction terms were incorporated into the main effects model. A final regression models was developed for China (see Eq. _{2.5} concentration. NM denotes the net move in mobility. AH denotes the absolute humidity. T is a variable representing the number of days since 1 January 2020, reflecting the time trend during the period of study. GDP represents gross domestic product per capita. Asthma represents the disability-adjusted life-year (DALY) numbers per 100,000 population. Two-sided

Due to the lengthy asymptomatic incubation period before the onset of COVID-19 symptoms, the corresponding time-lag in P, meteorology, and NM was accounted for by our statistical analysis, using the multi-day average lag model, based on previous air-pollution related epidemiological studies^{7}. We determined the best fit lag-time from day 1 to day 14, with the assumptions that the lag-time follows a uniform probability distribution and the mean incubation period could cover a maximum of 14 days^{50}.

To estimate the causal effect of P on R, our model for China had to cover the potential confounders. Independent variables, including meteorology (AH, temperature (TEMP), air pressure (AP), and wind speed (WS)), and NM, were included in the statistical analysis for China as the confounders. Moreover, D (population density, age, sex, income, GDP per capita, and education) and CM (high blood pressure, diabetes, chronic obstructive pulmonary disease (COPD), stroke, obesity, asthma, Alzheimer’s disease (AD), and HIV/AIDS) were included in the statistical analysis to control for the provincial/city-level fixed effects in our model for China. T and day of week were included in the statistical analysis to control for the time-varying fixed effects and the recurrent fixed effects. The statistically significant variables were kept in the final fitted regression model. Furthermore, matching was adopted to further reduce the confounding biases in our model for China, by matching a high P day with a low P day, based on the similarities of corresponding confounders, thereby helping one more accurately estimate the causal relationship between P and R in China (see Appendix p 8).

To address the potential collinearity between the independent variables in our model for China, Spearman correlation analysis and variance inflation factor (VIF) analysis were performed. Before stepwise regression analysis, a Spearman correlation analysis was conducted to select a subset of variables that presented low collinearity in the meteorological, D, and CM data. The absolute Spearman Coefficient threshold was set to be 0.5 to detect the collinearity between any variables before the regression analysis, and to prevent the highly correlated variables from being included in the regression model^{51,52}. First, AH and WS were selected as the meteorological variables for stepwise regression analysis. We tested the collinearity between TEMP, AP, WS, and AH, and removed TEMP and AP, due to their high collinearity with AH, which would be capable of accounting for the transmission of a flu virus^{53}, and hence could also be used to account for R (|Spearman coefficient|> 0.5; see Table ^{54,55} and (2) old age could be linked to lower immunity^{56,57}, making one more vulnerable to COVID-19 infection^{58}. The correlation between population density and COVID-19 transmission was also ascertained in related studies conducted in Bangladesh and Italy^{54,55}. Hence, urban disposable income and education level were removed, due to their high collinearity with population density and age (|Spearman coefficient|> 0.5; Table ^{21}. Therefore, diabetes, obesity, AD, and HIV/AIDS were removed, due to their high collinearity with high blood pressure and COPD (|Spearman coefficient|> 0.5; Table ^{51,59}. No collinearity was identified from the main effect model.

To account for the potential non-linear relationship between the meteorological variables and R, a second-order polynomial transformation was applied to the selected meteorological variables, including AH and WS, during stepwise regression analysis. In addition to the original meteorological variables, a quadratic term of each selected meteorological variable was included in stepwise regression model to address non-linearity. Based on the final stepwise regression model that achieved the best fit, we decided to use the first-order meteorological variables.

To examine the interaction effects between P and other significant meteorological and NM variables, three interaction terms consisting of the statistically significant variables were included in stepwise regression model for determining R. Three interaction terms, including P × AH, P × NM, and NM × AH, were added to the main effects model for China. P × AH, the statistically significant interaction term that associated with R, was included in the final stepwise regression model.

Finally, the multivariate normality assumption was examined by investigating the residuals of the main regression model shown in Eq. (

This article was submitted to an online preprint archive^{60}.

This research is supported in part by the Theme-based Research Scheme of the Research Grants Council of Hong Kong, under Grant No. T41-709/17-N. We acknowledge Peiyang Guo and Andong Wang for data collection, Joseph Hui and K.W. Wu for their comments on the linearity between R and P, and the inspiration of Tushar Kaistha for investigating the relative significance of the independent variables, including P and other co-variates, on the dependent variable R.

J.L. and V.L. were responsible for conceptualization and initial framework development. Y.H. collected the statistical data. J.L., V.L. and Y.H. developed the methodology. Y.H. processed the data and conducted the statistical analysis. Y.H., J.L. and V.L. interpreted the results and wrote the full manuscript. J.C., J.F., J.D., I.G., Q.Z., S.W. and Z.G. provided valuable suggestions and comments on data input/research design/methodology. Q.Z. collected the mobility data. J.D. edited the manuscript. J.L. and V.L. applied for funding. Y.H., J.L. and V.L. contributed equally.

The dataset used in this study will be made available upon request to the corresponding authors.

The data processing and statistical analysis code for this study will be made available upon request to the corresponding authors.

The authors declare no competing interests.

Supplementary Information.

The online version contains supplementary material available at

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.