1Scientific Data |           (2022) 9:462  | https://doi.org/10.1038/s41597-022-01517-w

www.nature.com/scientificdata

The United States COVID-19 
Forecast Hub dataset
Estee Y. Cramer  1,200, Yuxin Huang1,200, Yijin Wang  1,200, Evan L. Ray1, Matthew Cornell1, 
Johannes Bracher2,3, Andrea Brennen4, Alvaro J. Castro Rivadeneira1, Aaron Gerding1, Katie House1, 
Dasuni Jayawardena1, Abdul Hannan Kanji1, Ayush Khandelwal1, Khoa Le1, Vidhi Mody1, 
Vrushti Mody1, Jarad Niemi  5, Ariane Stark  1, Apurv Shah1, Nutcha Wattanchit1, Martha W. Zorn1, 
Nicholas G. Reich1 ✉ & US COVID-19 Forecast Hub Consortium*

Academic researchers, government agencies, industry groups, and individuals have produced forecasts 
at an unprecedented scale during the COVID-19 pandemic. To leverage these forecasts, the United 
States Centers for Disease Control and Prevention (CDC) partnered with an academic research lab at 
the University of Massachusetts Amherst to create the US COVID-19 Forecast Hub. Launched in April 
2020, the Forecast Hub is a dataset with point and probabilistic forecasts of incident cases, incident 
hospitalizations, incident deaths, and cumulative deaths due to COVID-19 at county, state, and 
national, levels in the United States. Included forecasts represent a variety of modeling approaches, 
data sources, and assumptions regarding the spread of COVID-19. The goal of this dataset is to establish 
a standardized and comparable set of short-term forecasts from modeling teams. These data can be 
used to develop ensemble models, communicate forecasts to the public, create visualizations, compare 
models, and inform policies regarding COVID-19 mitigation. These open-source data are available via 
download from GitHub, through an online API, and through R packages.

Introduction
To understand how the COVID-19 pandemic would progress in the United States, dozens of academic research 
groups, government agencies, industry groups, and individuals produced probabilistic forecasts for COVID-19 
outcomes starting in March 20201. We collected forecasts from over 90 modeling teams in a data repository, thus 
making forecasts easily accessible for COVID-19 response efforts and forecast evaluation. The data repository 
is called the US COVID-19 Forecast Hub (hereafter, Forecast Hub) and was created through a partnership 
between the United States Centers for Disease Control and Prevention (CDC) and an academic research lab at 
the University of Massachusetts Amherst.

The Forecast Hub was launched in early April 2020 and contains real-time forecasts of reported COVID-19 
cases, hospitalizations, and deaths. As of May 3rd, 2022, the Forecast Hub had collected over 92 million individ-
ual point or quantile predictions contained within over 6,600 submitted forecast files from 110 unique models. 
The forecasts submitted each week reflected a variety of forecasting approaches, data sources, and underlying 
assumptions. There were no restrictions in place regarding the underlying information or code used to generate 
real-time forecasts. Each week, the latest forecasts were combined into an ensemble forecast (Fig. 1), and all 
recent forecast data were updated on an official COVID-19 Forecasting page hosted by the US CDC (https://
www.cdc.gov/coronavirus/2019-ncov/science/forecasting/mathematical-modeling.html). The ensemble models 
were also used in the weekly reports that are posted on the Forecast Hub website, https://covid19forecasthub.
org/doc/reports/.

Forecasts are quantitative predictions about data that will be observed at a future time. Forecasts differ from 
scenario-based projections, which examine feasible outcomes conditional on a variety of future assumptions. 
Because forecasts are unconditional estimates of data that will be observed in the future, they can be evalu-
ated against eventual observed data. An important feature of the Forecast Hub is that submitted forecasts are 

1Department of Biostatistics and Epidemiology, University of Massachusetts Amherst, Amherst, MA, 01003, USA. 
2Chair of Econometrics and Statistics, Karlsruhe Institute of Technology, Karlsruhe, 76185, Germany. 3computational 
Statistics Group, Heidelberg Institute for Theoretical Studies, Heidelberg, 69118, Germany. 4IQT Labs, Waltham, MA, 
02451, USA. 5Department of Statistics, Iowa State University, Ames, IA, 50011, USA. 200These authors contributed 
equally: Estee Y. Cramer, Yuxin Huang, Yijin Wang. *A list of authors and their affiliations appears at the end of the 
paper. ✉e-mail: nick@umass.edu

ARTICLE

OPEN

https://doi.org/10.1038/s41597-022-01517-w
https://www.cdc.gov/coronavirus/2019-ncov/science/forecasting/mathematical-modeling.html
https://www.cdc.gov/coronavirus/2019-ncov/science/forecasting/mathematical-modeling.html
https://covid19forecasthub.org/doc/reports/
https://covid19forecasthub.org/doc/reports/
http://crossmark.crossref.org/dialog/?doi=10.1038/s41597-022-01517-w&domain=pdf


2Scientific Data |           (2022) 9:462  | https://doi.org/10.1038/s41597-022-01517-w

www.nature.com/scientificdatawww.nature.com/scientificdata/

time-stamped so the exact time at which a forecast was made public can be verified. In this way, the Forecast Hub 
serves as a public, independent registration system for these forecast model outputs. Data from the Forecast Hub 
have served as the basis for research articles for forecast evaluation2 and forecast combination3–5. These studies 
can be used to determine how well models have performed at various points during the pandemic, which can, in 
turn, guide best practices for utilizing forecasts in practice and inform future forecasting efforts2.

Teams submitted predictions in a structured format to facilitate data validation, storage, and analysis. 
Teams also submitted a metadata file and license for their model’s data. Forecast data, ground truth data from 
the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE)6, New York Times 
(NYTimes)7, USA Facts8, and HealthData.gov9 and model metadata were stored in the public Forecast Hub 
GitHub repository10.

The forecasts were automatically synchronized with an online database called Zoltar via calls to a representa-
tional State Transfer (REST) application programming interface (API)11 every six hours (Fig. 2). Raw forecast 
data may be downloaded directly from GitHub or Zoltar via the covidHubUtils R package12, the zoltr R package13 
or zoltpy Python library14.

This dataset of real-time forecasts created during the COVID-19 pandemic can provide insights into the 
shortcomings and successes of predictions and improve forecasting efforts in years to come. Although these data 
are restricted to forecasts for COVID-19 in the United States, the structure of this dataset has been used to create 
datasets of COVID-19 forecasts in the EU and the UK, and longer-term scenario projections in the US15–18. The 
general structure of this data collection could be applied to additional diseases or forecasting outcomes in the 
future11.

This large collaborative effort has provided data on short-term forecasts for over two years of forecasting 
efforts. Nearly all data were collected in real time and therefore are not subject to retrospective biases. The data 
are also openly available to the public, thus fostering a transparent, open science approach to support public 
health efforts.

Results
Data acquisition. Beginning in April 2020, the Reich Lab at the University of Massachusetts, Amherst, in 
partnership with the US CDC, began collecting probabilistic forecasts of key COVID-19 outcomes in the United 
States (Table 1). The effort began by collecting forecasts of deaths and hospitalizations at the weekly and daily 
scales for the 50 US states and 5 territories (Washington DC, Puerto Rico, US Virgin Islands, Guam, and the 
Northern Mariana Islands) as well as the aggregated US national level. In July 2020, daily resolution-level fore-
casts for COVID-19 deaths were discontinued, and the effort expanded to include forecasts of weekly incident 

0

500,000

1,000,000

1,500,000

2,000,000

2,500,000

F
eb

−
20

20

M
ar

−
20

20

A
pr

−
20

20

M
ay

−
20

20

Ju
n−

20
20

Ju
l−

20
20

A
ug

−
20

20

S
ep

−
20

20

O
ct

−
20

20

N
ov

−
20

20

D
ec

−
20

20

Ja
n−

20
21

F
eb

−
20

21
M

ar
−

20
21

A
pr

−
20

21

M
ay

−
20

21

Ju
n−

20
21

Ju
l−

20
21

A
ug

−
20

21

S
ep

−
20

21

Incident CasesA

5,000

10,000

15,000

20,000

25,000

A
ug

−
20

20

S
ep

−
20

20

O
ct

−
20

20

N
ov

−
20

20

D
ec

−
20

20

Ja
n−

20
21

F
eb

−
20

21

M
ar

−
20

21

A
pr

−
20

21

M
ay

−
20

21

Ju
n−

20
21

Ju
l−

20
21

A
ug

−
20

21

S
ep

−
20

21

Incident HospitalizationsB

0

10,000

20,000

30,000

F
eb

−
20

20

M
ar

−
20

20

A
pr

−
20

20

M
ay

−
20

20

Ju
n−

20
20

Ju
l−

20
20

A
ug

−
20

20

S
ep

−
20

20

O
ct

−
20

20

N
ov

−
20

20

D
ec

−
20

20

Ja
n−

20
21

F
eb

−
20

21
M

ar
−

20
21

A
pr

−
20

21

M
ay

−
20

21

Ju
n−

20
21

Ju
l−

20
21

A
ug

−
20

21

S
ep

−
20

21

Incident DeathsC

0

200,000

400,000

600,000

F
eb

−
20

20

M
ar

−
20

20

A
pr

−
20

20

M
ay

−
20

20

Ju
n−

20
20

Ju
l−

20
20

A
ug

−
20

20

S
ep

−
20

20

O
ct

−
20

20

N
ov

−
20

20

D
ec

−
20

20

Ja
n−

20
21

F
eb

−
20

21
M

ar
−

20
21

A
pr

−
20

21

M
ay

−
20

21

Ju
n−

20
21

Ju
l−

20
21

A
ug

−
20

21

S
ep

−
20

21

Cumulative DeathsD

Fig. 1 Time series of weekly incident deaths at the national level and forecasts from the COVID-19 Forecast 
Hub ensemble model for selected weeks in 2020 and 2021. Ensemble forecasts (blue) with 50%, 80% and 95% 
prediction intervals shown in shaded regions and the ground-truth data (black) for incident cases (A), incident 
hospitalizations (B), incident deaths (C) and cumulative deaths (D). The truth data come from JHU CSSE 
(panels A, C, D) and HealthData.gov (panel B).

https://doi.org/10.1038/s41597-022-01517-w


3Scientific Data |           (2022) 9:462  | https://doi.org/10.1038/s41597-022-01517-w

www.nature.com/scientificdatawww.nature.com/scientificdata/

cases at the county, state, and national levels. Forecasts may include a point prediction and/or quantiles of a pre-
dictive distribution.

Any team was eligible to submit data to the Forecast Hub provided they used the correct formatting. Upon 
initial submission of forecast data, teams were required to upload a metadata file that briefly described the meth-
ods used to create the forecasts and specified a license under which their forecast data were released. Individual 
model outputs are available under different licenses as specified in the GitHub data repository. No model code 
was stored in the Forecast Hub.

During the first month of operation, members of the Forecast Hub team downloaded forecasts made avail-
able by teams publicly online, transformed these forecasts into the correct format (see Forecast format section), 
and pushed them into the Forecast Hub repository. Starting in May 2020, all teams were required to format and 
submit their own forecasts.

Repository structure. The dataset containing forecasts is stored in two locations, and all data can be 
accessed through either source. The first is the COVID-19 Forecast Hub GitHub repository, https://github.com/
reichlab/covid19-forecast-hub, and the second is an online database, Zoltar, which can be accessed via a REST 
API11. Details about data access and format are documented in the subsequent sections.

When accessing data through the Zoltar forecast repository REST API, subsets of submitted forecasts can be 
queried directly from a PostgreSQL database. This eliminates the need to access individual CSV files and facili-
tates access to versions of forecasts in cases when they were updated.

Forecast outcomes. The Forecast Hub dataset stores forecasts for four different outcomes: incident cases, 
incident hospitalizations, incident deaths, and cumulative deaths (Table 1). Incident case forecasts were first 
introduced as a forecast outcome several months after the Forecast Hub started and have several key differences 
from other predicted outcomes. They are the only outcomes for which the Forecast Hub accepts county-level 
forecasts, as well as state and national level forecasts. Since there are over 3,000 counties in the US, this required 
some compromises on the scale of data collected for these forecasts in other ways. Specifically, case forecasts may 
only be submitted for up to 8 weeks into the future instead of up to 20 weeks for deaths and are required to have 
fewer quantiles (seven quantiles) compared to other outcomes, which can have up to twenty-three quantiles. This 
gives a coarser representation of the forecast (see the section on Forecast format below).

Forecast target dates. Weekly targets follow the standard of epidemiological weeks (EW) used by the 
CDC, which defines a week as starting on Sunday and ending on the following Saturday19. Forecasts of cumu-
lative deaths target the number of cumulative deaths reported by Saturday ending a given week. Forecasts of 
weekly incident cases or deaths target the difference between reported cumulative cases or deaths on consecutive 

Team 1 Metadata

Team 2 Metadata

Team 3 Metadata

Team 2 Forecast

Team 1 Forecast

Team 3 Forecast
zoltr/zoltpy

covidHubUtils

COVID-19 
Forecast Hub 

Github Repository

COVID-19 
Forecast Hub 
Zoltar Project

covid19-
validations

covidEnsembles
Data pulled for various uses
● Visualization
● Model Evaluations
● CDC communications

ltpy

COVID-19
Forecast Hub 
Zoltar Project

covidData

Ground Truth Data: 
● JHU CSSE
● NYTimes
● USAFacts
● HealthData.gov

A

B

C

D E

covidHubUtils

Ensemble Forecast

Fig. 2 Schematic of the data storage and related infrastructure surrounding the COVID-19 Forecast Hub.  
(A) Forecasts are submitted to the COVID-19 Forecast Hub GitHub repository and undergo data format 
validation before being accepted into the system. (B) A continuous integration service ensures that the 
GitHub repository and PostgreSQL database stay in sync with mirrored versions of the data. (C) Truth data for 
visualization, evaluation, and ensemble building are retrieved once per week using both the covidHubUtils and 
the covidData R packages. Truth data are stored in both repositories. (D) Once per week, an ensemble forecast 
submission is made using the covidEnsembles R package. It is submitted to the GitHub repository and undergoes 
the same validation as other submissions. (E) Using the covidHubUtils R package, forecast and truth data may 
be extracted from either the GitHub or PostgreSQL database in a standard format for tasks such as scoring or 
plotting.

https://doi.org/10.1038/s41597-022-01517-w
https://github.com/reichlab/covid19-forecast-hub
https://github.com/reichlab/covid19-forecast-hub


4Scientific Data |           (2022) 9:462  | https://doi.org/10.1038/s41597-022-01517-w

www.nature.com/scientificdatawww.nature.com/scientificdata/

Saturdays. As an example of a forecast and the corresponding observation, forecasts submitted between Tuesday, 
October 6, 2020 (day 3 of EW41) and Monday, October 12, 2020 (day 2 of EW42) contained a “1 week ahead” 
forecast of incident deaths that corresponded to the change in cumulative reported deaths observed in EW42 (i.e., 
the difference between the cumulative reported deaths on Saturday, October 17, 2020, and Saturday, October 10, 
2020), a “2 week ahead” forecast that corresponded to the change in cumulative reported deaths in week EW43. In 
this paper, we refer to the “forecast week” of a submitted forecast as the week corresponding to a “0-week ahead” 
horizon. In the example above, the forecast week would be EW41. Daily incident hospitalization horizons are for 
the number of reported hospitalizations a specified number of days after the forecast was generated.

Summary of forecast data collected. In the initial weeks of submission, fewer than 10 models provided 
forecasts. As the pandemic spread, the number of teams submitting forecasts increased; as of May 3rd, 2022, 93 
primary, 9 secondary models, and 17 models with the designation “other” had been submitted to the Forecast 
Hub. As of May 3rd, 2022, across all weeks, a median of 30 primary models (range: 14 to 39) contributed incident 
case forecasts (Fig. 3a), a median of 11 primary models (range: 1 to 16) contributed incident hospitalizations 
(Fig. 3b), a median of 37 primary models (range 1 to 49) contributed incident death forecasts (Fig. 3c), and a 
median of 35 primary models (range 3 to 46) contributed cumulative death forecasts each week (Fig. 3d). As of 
May 3rd, 2022, the dataset contained 6,633 forecast files with 92,426,015 point or quantile predictions for unique 
combinations of targets and locations.

Ensemble and baseline forecasts. Alongside the models submitted by individual teams, there are also 
baseline and ensemble models generated by the Forecast Hub and CDC.

The COVIDhub-baseline model was created by the Forecast Hub in May 2020 as a benchmarking model. Its 
point forecast is the most recent observed value as of the forecast creation date with a probability distribution 
around that based on weekly differences in previous observations2. The baseline model initially produced fore-
casts for case and death outcomes. Hospitalization baseline forecasts were added in September 2021.

The COVIDhub-ensemble model creates a combination of submitted forecasts to the Forecast Hub. The 
ensemble produces forecasts of incident cases at a horizon of 1 week ahead, forecasts of incident hospitalizations 
at horizons up to 14 days ahead, and forecasts of incident and cumulative deaths at horizons up to 4 weeks ahead. 

Outcome Scale

Locations

Horizons 
Stored

Number of 
quantiles for 
probabilistic 
forecasts

Earliest 
Forecast Date

First date of 
standardized 
truth data

Date of first 
ensemble 
forecastCounty State National

Incident Cases Weekly X X X 1 - 8 weeks 7 2020-07-05 2020-03-15 2020-07-18

Incident Hospitalizations Daily X X 1 - 130 days 23 2020-03-27 2020-11-16 2020-12-05

Incident Deaths Daily X X 1 - 130 days 23 2020-03-15 2020-03-15 NA

Incident Deaths Weekly X X 1-20 weeks 23 2020-03-15 2020-03-15 2020-06-20

Cumulative Deaths Daily X X 1 - 130 days 23 2020-03-15 2020-03-15 NA

Cumulative Deaths Weekly X X 1-20 weeks 23 2020-03-15 2020-03-15 2020-04-13

Table 1. Forecast characteristics for all four outcomes. The table shows the temporal scale, spatial scale of 
locations, horizons stored, number of quantiles, and dates of the earliest forecast, earliest standardized truth 
data, and earliest ensemble build.

Incident Deaths Cumulative Deaths

Incident Cases Incident Hospitalizations

M
ay

−
20

20
Ju

n−
20

20
Ju

l−
20

20
A

ug
−

20
20

S
ep

−
20

20
O

ct
−

20
20

N
ov

−
20

20
D

ec
−

20
20

Ja
n−

20
21

F
eb

−
20

21
M

ar
−

20
21

A
pr

−
20

21
M

ay
−

20
21

Ju
n−

20
21

Ju
l−

20
21

A
ug

−
20

21
S

ep
−

20
21

O
ct

−
20

21
N

ov
−

20
21

D
ec

−
20

21
Ja

n−
20

22
F

eb
−

20
22

M
ar

−
20

22
A

pr
−

20
22

M
ay

−
20

22

M
ay

−
20

20
Ju

n−
20

20
Ju

l−
20

20
A

ug
−

20
20

S
ep

−
20

20
O

ct
−

20
20

N
ov

−
20

20
D

ec
−

20
20

Ja
n−

20
21

F
eb

−
20

21
M

ar
−

20
21

A
pr

−
20

21
M

ay
−

20
21

Ju
n−

20
21

Ju
l−

20
21

A
ug

−
20

21
S

ep
−

20
21

O
ct

−
20

21
N

ov
−

20
21

D
ec

−
20

21
Ja

n−
20

22
F

eb
−

20
22

M
ar

−
20

22
A

pr
−

20
22

M
ay

−
20

22

0
10
20
30
40
50

0
10
20
30
40
50

Forecast Submission Date

N
um

be
r 

of
 M

od
el

s 
P

ro
vi

di
ng

 F
or

ec
as

ts

Fig. 3 Number of primary forecasts submitted for each outcome per week from April 27th, 2020 through May 
3rd, 2022. In the initial weeks of submission, fewer than 10 models provided forecasts. Over time, the number of 
teams submitting forecasts for each forecasted outcome increased into early 2021 and then saw a small decline 
through the end of 2021, with some renewed interest in 2022.

https://doi.org/10.1038/s41597-022-01517-w


5Scientific Data |           (2022) 9:462  | https://doi.org/10.1038/s41597-022-01517-w

www.nature.com/scientificdatawww.nature.com/scientificdata/

Initially the ensemble produced forecasts of incident cases at horizons of 1 to 4 weeks and incident hospitaliza-
tions at 1 to 28 days. However, in September 2021, due to the unreliability of incident case and hospitalization 
forecasts at horizons greater than 1 week (for cases) and 14 days (for hospitalizations), horizons past those 
respective thresholds were excluded from the COVIDhub-ensemble model, although they were still included 
in the COVIDhub-4_week_ensemble20. Other work details the methods used for determining the appropriate 
combination approach3,4. Starting in February 2021, GitHub tags were created to document the exact version 
of the repository used each week to create the COVIDhub-ensemble forecast. This creates an auditable trail in 
the repository so the correct version of the forecasts used could be recovered even in cases when some forecasts 
were subsequently updated.

The Forecast Hub also collaborates with the CDC on the production of three additional ensemble forecasts 
each week. These are the COVIDhub-4_week_ensemble, COVIDhub-trained_ensemble, and the COVIDhub_
CDC-ensemble. The COVIDhub-4_week_ensemble produces forecasts of incident cases, incident deaths, and 
cumulative deaths at horizons of 1 through 4 weeks ahead, and forecasts of incident hospitalizations at hori-
zons of 1 through 28 days ahead and uses the equally-weighted median of all component forecasts at each 
location, forecast horizon, and quantile level. The COVIDhub-trained_ensemble uses the same targets as the 
COVIDhub-4_week_ensemble but computes the models as a weighted median of the ten component forecasts 
with the best performance as measured by their weighted interval score (WIS) in the 12 weeks prior to the fore-
cast date. The COVIDhub_CDC-ensemble pulls forecasts of cases and hospitalizations from the COVIDhub-4_
week_ensemble and forecasts of deaths from the COVIDhub-trained_ensemble. The set of horizons that are 
included is updated regularly using rules developed by the CDC based on recent forecast performance.

Several other models are also combinations of some or all models submitted to the Forecast Hub. As of May 
3rd, 2022, these models are FDANIHASU-Sweight, JHUAPL-SLPHospEns, and KITmetricslab-select_ensemble. 
These models are flagged in the metadata using the Boolean metadata field, “ensemble_of_hub_models”.

Use scenarios. R package covidHubUtils. We have developed the covidHubUtils R package at https://github.
com/reichlab/covidHubUtils to facilitate bulk retrieval of forecasts for analysis and evaluation. Examples of how 
to use the covidHubUtils package and its functions can be found at https://reichlab.io/covidHubUtils/. The pack-
age supports loading forecasts from a local clone of the GitHub repository or by querying data from Zoltar. The 
package supports common actions for working with the data, such as loading specific subsets of forecasts, plotting 
forecasts, scoring forecasts, retrieving ground truth data, and many other utility functions to simplify working 
with the data.

Visualization of forecasts in the COVID-19 Forecast Hub. In addition to viewing forecasts in an R package, 
forecasts can also be viewed through our public website, https://viz.covid19forecasthub.org/. Through this tool, 
viewers can select the outcome, location, prediction interval, issue date of the truth data, and the models of 
interest to view forecasts. This tool can be used to see forecasts for the upcoming weeks, qualitatively evaluate 
model performance in past weeks, or visualize past performance based on available data at the time of forecast-
ing (Fig. 4).

Communicating results from the COVID-19 Forecast Hub. Communication of probabilistic forecasts to the pub-
lic is challenging21,22, and the best practices regarding the communication of outbreaks are still developing23. 
Starting in April 2020, the CDC published weekly summaries of these forecasts on their public website24, and 
these forecasts were occasionally used in public briefings by the CDC Director25. Additional examples of the com-
munication of Forecast Hub data can be viewed through weekly reports generated by the Forecast Hub team for 
dissemination to the general public, including state and local departments of health(https://covid19forecasthub.
org/doc/reports/). On December 22nd, 2021, the CDC ceased communication of case forecasts due to low relia-
bility of these forecasts (https://www.cdc.gov/coronavirus/2019-ncov/science/forecasting/forecasts-cases.html).

Discussion
We present here the US COVID-19 Forecast Hub, a data repository that stores structured forecasts of COVID-19 
cases, hospitalizations, and deaths in the United States. The Forecast Hub is an important asset for visualizing, 
evaluating, and generating aggregate forecasts. It also demonstrates the highly collaborative effort that has gone 
into COVID-19 modeling efforts. This open-source data repository is beneficial for researchers, modelers, and 
casual viewers interested in forecasts of COVID-19. The website was viewed over half a million times in the first 
two years of the pandemic.

The US COVID-19 Forecast Hub is a unique, large-scale, collaborative infectious disease modeling effort. 
The Forecast Hub emerged from years of collaborative modeling efforts that started as government sponsored 
forecasting “challenges”. These collaborations are distinct from modeling efforts of individual teams, as the 
Forecast Hub has created open collaborative systems that facilitate model collection, curation, comparison, and 
combination, often in direct collaboration with governmental public health agencies26–28. The Forecast Hub built 
on these past efforts by developing a new quantile-based data format as well as automated data submission and 
validation procedures. Additionally, the scale of the collaborative effort for the US COVID-19 Forecast Hub has 
exceeded prior COVID-19 forecasting efforts by an order of magnitude in terms of the number of participating 
teams and forecasts collected. Finally, the infrastructure developed for the US COVID-19 Forecast Hub has been 
adapted for use by a number of other modeling hubs, including the US COVID-19 Scenario Modeling Hub17, the 
European COVID-19 Forecast Hub15, the German/Polish COVID-19 Forecasting Hub16, the German COVID-
19 Hospitalization Nowcasting Hub29, and the 2022 US CDC Influenza Hospitalization Forecasting challenge30.

The Forecast Hub has played a critical role in collecting forecasts in a single format from over 100 different 
prediction models and making these data available to a wide variety of stakeholders during the COVID-19 

https://doi.org/10.1038/s41597-022-01517-w
https://github.com/reichlab/covidHubUtils
https://github.com/reichlab/covidHubUtils
https://reichlab.io/covidHubUtils/
https://viz.covid19forecasthub.org/
https://covid19forecasthub.org/doc/reports/
https://covid19forecasthub.org/doc/reports/
https://www.cdc.gov/coronavirus/2019-ncov/science/forecasting/forecasts-cases.html


6Scientific Data |           (2022) 9:462  | https://doi.org/10.1038/s41597-022-01517-w

www.nature.com/scientificdatawww.nature.com/scientificdata/

pandemic. While some of these teams register their forecasts in other publicly available locations, many teams 
do not. Thus the Forecast Hub is the only location where many teams’ forecasts are available. In addition to 
curating data from other models, the Forecast Hub has also played a central role in synthesizing the outputs of 
models together. The Forecast Hub has generated an ensemble forecast, which has been used in official commu-
nications by the CDC, every week since April 2020. The ensemble model for incident deaths, a median aggregate 
of all other eligible models, was consistently the most accurate model when aggregated across forecast targets, 
weeks, and locations, even though it was rarely the single most accurate forecast for any single prediction2.

The US COVID-19 Forecast Hub has built a specific set of open-source tools that have facilitated the devel-
opment of operational stand-alone and ensemble forecasts for the pandemic. However, the structure of the tools 
is quite general and could be adapted for use in other real-time prediction efforts. Additionally, the Forecast Hub 
infrastructure and data described represent best practices for collecting, aggregating, and disseminating fore-
casts31. The US COVID-19 Forecast Hub has developed and operationalized one standardized forecast format, 
time-stamped submissions, open access, and a collection of tools to facilitate working with the data.

The data in this hub will be useful in the future for continuing analysis and comparisons of forecasting meth-
ods. The data can also be used as an exploratory dataset for creating and testing novel models and methods for 
model analysis (e.g., new ways to create an ensemble or post hoc forecast calibration methods). Because the data 
serve as an open repository of the state of the art in infectious disease forecasting, they will also be helpful as a 
retrospective reference point for comparison when new forecasting models are developed.

Model coordination efforts occur in many fields –including climate science32, ecology33, and space weather34, 
among others– to inform policy decisions by curating many models and synthesizing their outputs and uncer-
tainties. Such efforts ensure that individual model outputs may indeed be easily compared to and assimilated 
with one another, and thus play a role in making scientific research more rigorous and transparent. As the use of 
advanced computational models becomes more commonplace in a wide range of scientific fields, model coor-
dination projects and model output standardization efforts will play an increasingly important role in ensuring 
that policy makers can be provided with a unified set of model outputs.

Fig. 4 Visualization tool updated weekly by the US COVID-19 Forecast Hub displays model forecasts and 
truth data at selected forecast dates, locations, forecast outcomes and PI levels. US national level incident death 
forecasts from 39 models are shown with point values and a 50% PI. These forecasts are for 1 through 4 week 
ahead horizons. Data used for forecasting were generated on July 24th, 2021. The visualization tool is available 
at: https://viz.covid19forecasthub.org.

https://doi.org/10.1038/s41597-022-01517-w
https://viz.covid19forecasthub.org


7Scientific Data |           (2022) 9:462  | https://doi.org/10.1038/s41597-022-01517-w

www.nature.com/scientificdatawww.nature.com/scientificdata/

Methods
Forecast assumptions. Forecasters used a variety of assumptions to build models and generate predictions. 
Forecasting approaches include statistical or machine learning models, mechanistic models incorporating disease 
transmission dynamics, and combinations of multiple approaches2. Teams have also included varying assump-
tions regarding future changes in policies and social distancing measures, the transmissibility of COVID-19, 
vaccination rates, and the spread of new virus variants throughout the United States.

Weekly submissions. A forecast submission consists of a single comma-separated value (CSV) file sub-
mitted via pull request to the GitHub repository. Forecast submissions are validated for technical accuracy and 
formatting (see below) using automated checks implemented by continuous integration servers before being 
merged. To be included in the weekly ensemble model, teams were required to submit their forecast on Sunday 
or prior to a deadline on Monday. The majority of teams contributing to the dataset submitted forecasts to the 
Forecast Hub repository on Sunday or Monday, although some teams submitted at other times depending on 
their model production schedule.

Exclusion criteria. No forecasts were excluded from the dataset due to the forecast values or the background 
experience of the forecasters. Forecast files were only rejected if they did not meet the automatic formatting crite-
ria implemented through automatic GitHub checks35. These included checks to ensure that, among other criteria:

•	 A forecast file is submitted no more than two days after it has been created (to ensure forecasts submitted 
were truly prospective). The creation date is based on the date in the filename created by the submitting team.

•	 The forecast dates in the content of the file are in the format YYYY-MM-DD and must match the creation 
date.

•	 Quantile forecasts do not contain any quantiles at probability levels other than the required levels (see Fore-
cast Format section below).

Updates to files. To ensure that forecasting is done in real-time, all forecasts are required to be submitted to 
the Forecast Hub within 2 days of the forecast date, which is listed in a column within each forecast file. Although 
occasional late submissions were accepted through January 2021, the policy was updated to not accept late fore-
casts due to missed deadlines, updated modeling methods, or other reasons.

Exceptions to this policy were made if there was a bug that affected the forecasts in the original submission 
or if a new team joined. If there was a bug, teams were required to submit a comment with their updated sub-
mission affirming that there was a bug and that the forecast was only produced using data that were available at 
the time of the original submission. In the case of updates to forecast data, both the old and updated versions 
of the forecasts can be accessed either through the GitHub commit history or through time-stamped queries of 
the forecasts in the Zoltar database. Note that an updated forecast can include “retracting” a particular set of 
predictions in the case when an initial forecast was not able to be updated. When new teams join the Forecast 
Hub, they can submit late forecasts if they can provide publicly available evidence that the forecasts were made 
in real-time (e.g., GitHub commit history).

Ground truth data. Data from the JHU CSSE dataset36 are used as the ground truth data for cases and 
deaths. Data from the HealthData.gov system for state-level hospitalizations are used for the hospitalization out-
come. JHU CSSE obtained counts of cases and deaths by collecting and aggregating reports from state and local 
health departments. HealthData.gov contains reports of hospitalizations assembled by the U.S. Department of 
Health and Human Services. Teams were encouraged to use these sources to build models. Although hospitaliza-
tion forecasts were collected starting in March 2020, hospitalization data from HealthData.gov were only available 
later, and we started encouraging teams to target these data in November 2020. Some teams used alternate data 
sources, including the NYTimes, USAFacts, US Census data, and other signals2. Versions of truth data from JHU 
CSSE, USAFacts, and the NYTimes are stored in the GitHub repository.

Previous reports of ground truth data for past time points were occasionally updated as new records became 
available, definitions of reportable cases, deaths, or hospitalizations changed, or errors in data collection were 
identified and corrected. These revisions to the data are sometimes quite substantial35,36, and for purposes such 
as retrospective ensemble construction, it is necessary to use the data that would have been available in real-time. 
The historically versioned data can be accessed either through GitHub commit records, data versions released on 
HealthData.gov, or third-party tools such as the covidcast API provided by the Delphi group at Carnegie Mellon 
University or the covidData R package37.

Model designation. Each model stored in the repository must have a classification of “primary,” “second-
ary”, or “other”. Each team must only have one “primary” model. Teams submitting multiple models with similar 
forecasting approaches can use the designations “secondary” or “other” for their models. Models with the desig-
nation “primary” are included in evaluations, the weekly ensemble, and the visualization. The “secondary” label 
is designed for models that have a substantive methodological difference than a team’s “primary” model. Models 
with the designation “secondary” are included only in the ensemble and the visualization. The “other” label is 
designed for models that are small variations on a team’s “primary” model. Models with the designation “other” 
are not included in evaluations, the ensemble build, or the visualization.

GitHub repository data structure. Forecasts in the GitHub repository are available in subfolders organ-
ized by model. Folders are named with a team name and model name, and each folder includes a metadata file and 

https://doi.org/10.1038/s41597-022-01517-w


8Scientific Data |           (2022) 9:462  | https://doi.org/10.1038/s41597-022-01517-w

www.nature.com/scientificdatawww.nature.com/scientificdata/

forecast files. Forecast CSV files are named using the format “<YYYY-MM-DD>-<team abbreviation>-<model 
abbreviation>.csv”. In these files, each row contains data for a single outcome, location, horizon, and point or 
quantile prediction as described above.

The metadata file for each team, named using the format “metadata-<team abbreviation>-<model abbre-
viation>.txt”, contains relevant information about the team and the model that the team is using to generate 
forecasts.

Forecast format. Forecasts were required to be submitted in the format of point predictions and/or quantile 
predictions. Point predictions represented single “best” predictions with no uncertainty, typically representing a 
mean or median prediction from the model. Quantile predictions are an efficient format for storing predictive 
distributions of a wide range of outcomes.

Quantile representations of predictive distributions lend themselves to natural computations of, for exam-
ple, pinball loss or a weighted interval score, both proper scoring rules that can be used to evaluate forecasts38. 
However, they do not capture the structure of the tails of the predictive distribution beyond the reported quan-
tiles. Additionally, the quantile format does not preserve any information on correlation structures between 
different outcomes.

The forecast data in this dataset are stored in seven columns:

 1. forecast_date - the date the forecast was made in the format YYYY-MM-DD.
 2. target - a character string giving the number of days/weeks ahead that are being forecasted (horizon) and 

the outcome. Horizons must be one of the following:

 a. “N wk ahead cum death” where N is a number between 1 and 20
 b. “N wk ahead inc death” where N is a number between 1 and 20
 c. “N wk ahead inc case” where N is a number between 1 and 8
 d. “N day ahead inc hosp” where N is a number between 0 and 130

 3. target_end_date - a character string representing the date for the forecast target in the format YYYY-
MM-DD. For “k day-ahead” targets, target_end_date will be k days after forecast_date. For “k week ahead” 
targets, target_end_date will be the Saturday at the end of the specified epidemic week, as described above.

 4. location - character string of Federal Information Processing Standard Publication (FIPS) codes identify-
ing U.S. states, counties, territories, and districts as well as “US” for national forecasts. The values for the 
FIPS codes are available in a CSV file in the repository and as a data object in the covidHubUtils R package 
for convenience.

 5. type - character value of “point” or “quantile” indicating whether the row corresponds to a point forecast or 
a quantile forecast.

 6. quantile - the probability level for a quantile forecast. For death and hospitalization forecasts, forecasters 
can submit quantiles at 23 probability levels: 0.01, 0.025, 0.05, 0.10, 0.15…, 0.95, 0.975, and 0.99. For cases, 
teams can submit up to 7 quantiles at levels .025, 0.100, 0.250, 0.5, 0.750, 0.900 and 0.975. If the forecast 
“type” is equal to “point”, the value in the quantile column is equal to “NA”.

 7. value – non-negative numbers indicating the “point” or “quantile” prediction for the row. For a “point” pre-
diction, the value is simply the value of that point prediction for the target and location associated with that 
row. For a “quantile” prediction, the model predicts that the eventual observation will be less than or equal 
to this value with the probability given by the quantile probability level.

Metadata format. Each team documents their model information in a metadata file which is required along 
with the first forecast submission. Each team is asked to record their model’s design and assumptions, the model 
contributors, the team’s website, information regarding the team’s data sources, and a brief model description. 
Teams may update their metadata file periodically to keep track of minor changes to a model.

A standard metadata file should be a YAML file with the following required fields in a specific order:

 1. team_name - the name of the team (less than 50 characters).
 2. model_name - the name of the model (less than 50 characters).
 3. model_abbr - an abbreviated and uniquely identified name for the model that is less than 30 alphanumeric 

characters. The model abbreviation must be in the format of ‘[team_abbr]-[model_abbr]‘ where each of the 
‘[team_abbr]‘ and ‘[model_abbr]‘ are text strings that are each less than 15 alphanumeric characters that 
do not include a hyphen or whitespace.

 4. model_contributors - a list of all individuals involved in the forecasting effort, affiliations, and email 
addresses. At least one contributor needs to have a valid email address. The syntax of this field should be 
name1 (affiliation1) <user@address>, name2 (affiliation2) <user2@address2>

 5. website_url* - a URL to a website that has additional data about the model. We encourage teams to submit 
the most user-friendly version of the model, e.g., a dashboard, or similar, that displays the model forecasts. If 
there is an additional data repository where forecasts and other model code are stored, this can be included in 
the methods section. If only a more technical site, e.g., GitHub repo, exists, that link should be included here.

 6. license - one of the acceptable license types in the Forecast Hub. We encourage teams to submit as a “cc-
by-4.0” to allow the broadest possible use, including private vaccine production (which would be excluded 

https://doi.org/10.1038/s41597-022-01517-w


9Scientific Data |           (2022) 9:462  | https://doi.org/10.1038/s41597-022-01517-w

www.nature.com/scientificdatawww.nature.com/scientificdata/

by the “cc-by-nc-4.0” license). If the value is “LICENSE.txt”, then a LICENSE.txt file must exist within the 
model folder and provide a license.

 7. team_model_designation - upon initial submission this field should be one of “primary”, “secondary” or 
“other”.

 8. methods - a brief description of the forecasting methodology that is less than 200 Characters.
 9. ensemble_of_hub_models - a Boolean value (‘true‘ or ‘false‘) that indicates whether a model combines 

multiple hub models into an ensemble.

*in earlier versions of the metadata files, this field was named model_output.
Teams are also encouraged to add model information with optional fields described below:

 1. institution_affil - University or company names, if relevant.
 2. team_funding - Like an acknowledgement in a manuscript, teams can acknowledge funding here.
 3. repo_url - A GitHub repository url or something similar.
 4. twitter_handles - one or more Twitter handles (without the @) separated by commas.
 5. data_inputs - A description of the data sources used to inform the model and the truth data targeted by 

model forecasts. Common data sources are NYTimes, JHU CSSE, COVIDTracking, Google mobility, HHS 
hospitalization, etc. An example description could be “case forecasts use NYTimes data and target JHU 
CSSE truth data, hospitalization forecasts use and target HHS hospitalization data”

 6. citation - a url (doi link preferred) to an extended description of the model, e.g., blog post, website, pre-
print, or peer-reviewed manuscript.

 7. methods_long - An extended description of the methods used in the model. If the model is modified, this 
field can be used to provide the date of the modification and a description of the change.

Technical Validations
Two similar but distinct validation processes were used to validate data on the GitHub repository and on Zoltar.

Validations during data submission. Validations were set up using GitHub Actions to manage the con-
tinuous integration and automated data checking35. Teams submitted their metadata files and forecasts through 
pull requests on GitHub. Each time a new pull request was submitted, a validation script ran on all new or updated 
files in the pull request to test for their validity. Separate checks ran on metadata file changes and forecast data 
file changes.

The metadata file for each team was required to be in a valid YAML format, and a set of specific checks were 
required before a new metadata file could be merged into the repository. Checks included ensuring that all 
metadata files are using the rules outlined in the Metadata Format section, that the proposed team and model 
names do not conflict with existing names, that a valid license for data reuse is specified, and that a valid model 
designation was present. Additionally, each team must have their files under a folder named consistently with 
their model_abbr, and they must only have one primary model.

New or changed forecast data files for each team were required to pass a series of checks for data formatting 
and validity. These checks also ensured that the forecast data files did not meet any of the exclusion criteria (see 
the Methods section for specific rules). Each forecast file is subject to the validation rules documented at: https://
github.com/reichlab/covid19-forecast-hub/wiki/Forecast-Checks.

Validations on Zoltar. When a new forecast file is uploaded to Zoltar, unit tests are run on the file to ensure 
that forecast elements contain a valid structure. (For a detailed specification of the structure of forecast elements, 
see https://docs.zoltardata.com/validation/.) If a forecast file does not pass all unit tests, the upload will fail and 
the forecast file will not be added to the database; only when all tests pass will the new forecast be added to Zoltar. 
The validations in place on GitHub ensure that only valid forecasts will be uploaded to Zoltar.

truth data. Raw truth data from multiple sources including JHU, NYTimes, USAFacts, and Healthdata.
gov, were downloaded and reformatted using the scripts in the R packages covidHubUtils (https://github.com/
reichlab/covidHubUtils) and covidData (https://github.com/reichlab/covidData. This data generating process 
is automated by GitHub Actions every week, and the results (called “truth data”) are directly uploaded to the 
Forecast Hub repository and Zoltar. Specifically, case and death raw truth data were aggregated to a weekly level, 
and all three outcomes (cases, deaths, and hospitalization) are reformatted for use within the Forecast Hub.

Data availability
The datasets generated and/or analyzed during the current study are available in the reichlab/covid19-forecast-
hub GitHub repository, https://github.com/reichlab/covid19-forecast-hub. A permanent DOI for the GitHub 
repository for the Forecast Hub is available as https://doi.org/10.5281/zenodo.520821010 Forecast data are also 
available through our Zoltar forecast repository at https://zoltardata.com/project/44.

https://doi.org/10.1038/s41597-022-01517-w
https://github.com/reichlab/covid19-forecast-hub/wiki/Forecast-Checks
https://github.com/reichlab/covid19-forecast-hub/wiki/Forecast-Checks
https://docs.zoltardata.com/validation/
https://github.com/reichlab/covidHubUtils
https://github.com/reichlab/covidHubUtils
https://github.com/reichlab/covidData
https://github.com/reichlab/covid19-forecast-hub
https://doi.org/10.5281/zenodo.5208210
https://zoltardata.com/project/44


1 0Scientific Data |           (2022) 9:462  | https://doi.org/10.1038/s41597-022-01517-w

www.nature.com/scientificdatawww.nature.com/scientificdata/

Code availability
All code for forecast data validation and storage associated with the current submission is available in the Forecast 
Hub GitHub repository, https://github.com/reichlab/covid19-forecast-hub-validations. Ensemble models are 
built with code in the covidEnsembles R package, https://github.com/reichlab/covidEnsembles. The code for 
forecast analysis is at https://doi.org/10.5281/zenodo.520794012 (covidHubUtils R package) and https://doi.
org/10.5281/zenodo.52082247 (covidData R package). Any updates will also be published on Zenodo.

Received: 17 January 2022; Accepted: 29 June 2022;
Published: xx xx xxxx

References
 1. Haghani, M. & Bliemer, M. C. J. Covid-19 pandemic and the unprecedented mobilisation of scholarly efforts prompted by a health 

crisis: Scientometric comparisons across SARS, MERS and 2019-nCoV literature. Scientometrics 125, 2695–2726 (2020).
 2. Cramer, E. Y. et al. Evaluation of individual and ensemble probabilistic forecasts of COVID-19 mortality in the United States. Proc. 

Natl. Acad. Sci. U. S. A. 119, e2113561119 (2022).
 3. Brooks, L. C. et al. Comparing ensemble approaches for short-term probabilistic COVID-19 forecasts in the U.S. International 

Institute of Forecasters (2020).
 4. Ray, E. L. et al. Comparing trained and untrained probabilistic ensemble forecasts of COVID-19 cases and deaths in the United 

States. arXiv [stat.ME] (2022).
 5. Taylor, J. W. & Taylor, K. S. Combining Probabilistic Forecasts of COVID-19 Mortality in the United States. Eur. J. Oper. Res. https://

doi.org/10.1016/j.ejor.2021.06.044 (2021).
 6. CSSEGISandData/COVID-19. GitHub https://github.com/CSSEGISandData/COVID-19.
 7. Ray, E. et al. reichlab/covidData: repository release for Zenodo. Zenodo https://doi.org/10.5281/zenodo.5208224 (2021).
 8. US COVID-19 cases and deaths by state. https://usafacts.org/visualizations/coronavirus-covid-19-spread-map/ (2021).
 9. HealthData.gov. healthdata.gov https://healthdata.gov/. (2022).
 10. Cramer, E. et al. reichlab/covid19-forecast-hub: release for Zenodo, 20210816. Zenodo https://doi.org/10.5281/zenodo.5208210 

(2021).
 11. Reich, N. G., Cornell, M., Ray, E. L., House, K. & Le, K. The Zoltar forecast archive, a tool to standardize and store interdisciplinary 

prediction research. Sci Data 8, 59 (2021).
 12. Wang, S. Y. et al. reichlab/covidHubUtils: repository release for Zenodo. Zenodo https://doi.org/10.5281/zenodo.5207940 (2021).
 13. Cornell, M., Gruson, H., Wang, S. Y. & Ray, E. reichlab/zoltr: Release for Zenodo, 20210816. Zenodo https://doi.org/10.5281/

zenodo.5207856 (2021).
 14. Cornell, M. et al. reichlab/zoltpy: Release for Zenodo, 20210816. Zenodo https://doi.org/10.5281/zenodo.5207932 (2021).
 15. covid19-forecast-hub-europe: European Covid-19 Forecast Hub. (Github).
 16. covid19-forecast-hub-de: German and Polish COVID-19 Forecast Hub. (Github).
 17. Borchering, R. K. et al. Modeling of Future COVID-19 Cases, Hospitalizations, and Deaths, by Vaccination Rates and 

Nonpharmaceutical Intervention Scenarios - United States, April-September 2021. MMWR Morb. Mortal. Wkly. Rep. 70, 719–724 
(2021).

 18. COVID 19 scenario model hub. https://covid19scenariomodelinghub.org/.
 19. MMWR Week Fact Sheet. National Notifiable Diseases Surveillance System, Division of Health Informatics and Surveillance, 

National Center for Surveillance, Epidemiology and Laboratory Services. Downloaded from http://wwwn.cdc.gov/nndss/
document/MMWR_Week_overview.pdf.

 20. Nicholas G. Reich, Ryan J. Tibshirani, Evan L. Ray, Roni Rosenfeld. On the predictability of COVID-19. International Institute of 
Forecasters https://forecasters.org/blog/2021/09/28/on-the-predictability-of-covid-19/ (2021).

 21. Gigerenzer, G., Hertwig, R., van den Broek, E., Fasolo, B. & Katsikopoulos, K. V. ‘A 30% chance of rain tomorrow’: how does the 
public understand probabilistic weather forecasts? Risk Anal 25, 623–629 (2005).

 22. Raftery, A. E. Use and Communication of Probabilistic Forecasts. Stat. Anal. Data Min 9, 397–410 (2016).
 23. Tracy L. Rouleau, L. U. Risk Communication and Behavior Best Practices and Research Findings. National Oceanic and 

Atmospheric Administration. 1-66.(2016).
 24. CDC. COVID-19 Forecasts: Deaths. https://www.cdc.gov/coronavirus/2019-ncov/covid-data/forecasting-us.html (2021).
 25. Waldrop, T., Andone, D. & Holcombe, M. CDC warns new Covid-19 variants could accelerate spread in US. CNN (2021).
 26. Johansson, M. A. et al. An open challenge to advance probabilistic forecasting for dengue epidemics. Proc. Natl. Acad. Sci. U. S. A. 

116, 24268–24274 (2019).
 27. Reich, N. G. et al. Accuracy of real-time multi-model ensemble forecasts for seasonal influenza in the U.S. PLoS Comput. Biol. 15, 

e1007486 (2019).
 28. Viboud, C. et al. The RAPIDD ebola forecasting challenge: Synthesis and lessons learnt. Epidemics 22, 13–21 (2018).
 29. hospitalization-nowcast-hub: Collecting nowcasts of the 7-day hospitalization incidence in Germany. https://github.com/

KITmetricslab/hospitalization-nowcast-hub (2022).
 30. CDC. FluSight: Flu Forecasting. Centers for Disease Control and Prevention https://www.cdc.gov/flu/weekly/flusight/index.html 

(2021).
 31. Reich, N. G. et al. Collaborative hubs: making the most of predictive epidemic modeling. Am. J. Public Health e1–e4 (2022).
 32. IPCC — Intergovernmental Panel on Climate Change. https://www.ipcc.ch/ (2022).
 33. The Inter-Sectoral Impact Model Intercomparison Project. https://www.isimip.org/about/marine-ecosystems-fisheries/ (2022).
 34. CCMC: Community Coordinated Modeling Center. https://ccmc.gsfc.nasa.gov/index.php (2022).
 35. Hannan, A., Huang, Y. D. & Wang, S. Y. reichlab/covid19-forecast-hub-validations: Release for Zenodo, 20210816. Zenodo https://

doi.org/10.5281/zenodo.5207934 (2021).
 36. Dong, E., Du, H. & Gardner, L. An interactive web-based dashboard to track COVID-19 in real time. Lancet Infect. Dis. 20, 533–534 

(2020).
 37. Reinhart, A. et al. An open repository of real-time COVID-19 indicators. Proc. Natl. Acad. Sci. USA. 118, (2021).
 38. Bracher, J., Ray, E. L., Gneiting, T. & Reich, N. G. Evaluating epidemic forecasts in an interval format. PLoS Comput. Biol. 17, 

e1008618 (2021).

Acknowledgements
This work has been supported in part by the US Centers for Disease Control and Prevention (1U01IP001122) 
and the National Institutes of General Medical Sciences (R35GM119582). The content is solely the responsibility 
of the authors and does not necessarily represent the official views of the CDC, FDA, NIGMS or the National 
Institutes of Health. Johannes Bracher was supported by the Helmholtz Foundation via the SIMCARD 
Information & Data Science Pilot Project. For teams that reported receiving funding for their work, we report 

https://doi.org/10.1038/s41597-022-01517-w
https://github.com/reichlab/covid19-forecast-hub-validations
https://github.com/reichlab/covidEnsembles
https://doi.org/10.5281/zenodo.5207940
https://doi.org/10.5281/zenodo.5208224
https://doi.org/10.5281/zenodo.5208224
https://doi.org/10.1016/j.ejor.2021.06.044
https://doi.org/10.1016/j.ejor.2021.06.044
https://github.com/CSSEGISandData/COVID-19
https://doi.org/10.5281/zenodo.5208224
https://usafacts.org/visualizations/coronavirus-covid-19-spread-map/
https://healthdata.gov/
https://doi.org/10.5281/zenodo.5208210
https://doi.org/10.5281/zenodo.5207940
https://doi.org/10.5281/zenodo.5207856
https://doi.org/10.5281/zenodo.5207856
https://doi.org/10.5281/zenodo.5207932
https://covid19scenariomodelinghub.org/
http://wwwn.cdc.gov/nndss/document/MMWR_Week_overview.pdf
http://wwwn.cdc.gov/nndss/document/MMWR_Week_overview.pdf
https://forecasters.org/blog/2021/09/28/on-the-predictability-of-covid-19/
https://www.cdc.gov/coronavirus/2019-ncov/covid-data/forecasting-us.html
https://github.com/KITmetricslab/hospitalization-nowcast-hub
https://github.com/KITmetricslab/hospitalization-nowcast-hub
https://www.cdc.gov/flu/weekly/flusight/index.html
https://www.ipcc.ch/
https://www.isimip.org/about/marine-ecosystems-fisheries/
https://ccmc.gsfc.nasa.gov/index.php
https://doi.org/10.5281/zenodo.5207934
https://doi.org/10.5281/zenodo.5207934


1 1Scientific Data |           (2022) 9:462  | https://doi.org/10.1038/s41597-022-01517-w

www.nature.com/scientificdatawww.nature.com/scientificdata/

the sources and disclosures below. AIpert-pwllnod: Natural Sciences and Engineering Research Council of 
Canada. Caltech-CS156: Gary Clinard Innovation Fund. CEID-Walk: University of Georgia. CMU-TimeSeries: 
CDC Center of Excellence, gifts from Google and Facebook. Covid19Sim: National Science Foundation awards 
2035360 and 2035361, Gordon and Betty Moore Foundation, and Rockefeller Foundation to support the work 
of the Society for Medical Decision Making COVID-19 Decision Modeling Initiative. COVIDhub: This work 
has been supported by the US Centers for Disease Control and Prevention (1U01IP001122) and the National 
Institutes of General Medical Sciences (R35GM119582). The content is solely the responsibility of the authors 
and does not necessarily represent the official views of the CDC, NIGMS or the National Institutes of Health. 
Johannes Bracher was supported by the Helmholtz Foundation via the SIMCARD Information & Data Science 
Pilot Project. Tilmann Gneiting gratefully acknowledges support by the Klaus Tschira Foundation. CUBoulder, 
CUB-PopCouncil: The Population Council, and the University of Colorado Population Center (CUPC) funded 
by Eunice Kennedy Shriver National Institute of Child Health & Human Development of the National Institutes 
of Health (P2CHD066613). CU-select: NSF DMS-2027369 and a gift from the Morris-Singer Foundation. 
DDS-NBDS: NSF III-1812699. epiforecasts-ensemble1: Wellcome Trust (210758/Z/18/Z). FDANIHASU: 
supported by the Intramural Research Program of the NIH/NIDDK. GT_CHHS-COVID19: William W. 
George Endowment, Virginia C. and Joseph C. Mello Endowment, NSF DGE-1650044, NSF MRI 1828187, 
research cyberinfrastructure resources and services provided by the Partnership for an Advanced Computing 
Environment (PACE) at Georgia Tech, and the following benefactors at Georgia Tech: Andrea Laliberte, Joseph C. 
Mello, Richard “Rick” E. & Charlene Zalesky, and Claudia & Paul Raines, CDC MInD-Healthcare U01CK000531-
Supplement. GT-DeepCOVID: This work was supported in part by the NSF (Expeditions CCF-1918770, CAREER 
IIS-2028586, RAPID IIS-2027862, Medium IIS-1955883, Medium IIS-2106961, CCF-2115126), CDC MInD 
program, ORNL, faculty research award from Facebook and funds/computing resources from Georgia Tech. BA 
was supported by CDC-MIND U01CK000594 and start-up funds from University of Iowa. IHME: This work 
was supported by the Bill & Melinda Gates Foundation, as well as funding from the state of Washington and 
the National Science Foundation (award nocoviddata. FAIN: 2031096). Imperial-ensemble1: SB acknowledges 
funding from the Wellcome Trust (219415). Institute of Business Forecasting: IBF. IowaStateLW-STEM: NSF DMS-
1916204, Iowa State University Plant Sciences Institute Scholars Program, NSF CCF-1934884, Laurence H. Baker 
Center for Bioinformatics and Biological Statistics. IUPUI CIS: NSF. JHU_CSSE-DECOM: JHU CSSE: National 
Science Foundation (NSF) RAPID “Real-time Forecasting of COVID-19 risk in the USA”. 2021-2022. Award ID: 
2108526. National Science Foundation (NSF) RAPID “Development of an interactive web-based dashboard to 
track COVID-19 in real-time”. 2020. Award ID: 2028604. JHU_IDD-CovidSP: State of California, US Dept of 
Health and Human Services, US Dept of Homeland Security, Johns Hopkins Health System, Office of the Dean at 
Johns Hopkins Bloomberg School of Public Health, Johns Hopkins University Modeling and Policy Hub, Centers 
for Disease Control and Prevention. (5U01CK000538-03), University of Utah Immunology, Inflammation, & 
Infectious Disease Initiative (26798 Seed Grant). JHU_UNC_GAS-StatMechPool: NIH NIGMS: R01GM140564. 
JHUAPL-Bucky: US Dept of Health and Human Services. KITmetricslab-select_ensemble: Daniel Wolffram was 
supported by the Klaus Tschira Foundation as well as the Helmholtz Association under the joint research school 
“HIDSS4Health – Helmholtz Information and Data Science School for Health”. Moreover, his work was funded 
by the German Federal Ministry of Education and Research (BMBF) and the Baden-Württemberg Ministry of 
Science as part of the Excellence Strategy of the German Federal and State Governments. LANL-GrowthRate: 
LANL LDRD 20200700ER. LosAlamos_NAU-CModel_SDVaxVar: NIH/NIGMS grant R01GM111510; LANL-
Directed Research and Development Program, Defense Threat Reduction Agency; Laboratory-Directed 
Research and Development Program project 20220268ER. LU-compUncertLab: UMass Amherst Center of 
Excellence for Influenza, Institute for Data Intelligent Systems and Computation. MIT-Cassandra: MIT Quest for 
Intelligence. MOBS-GLEAM_COVID: COVID Supplement CDC-HHS-6U01IP001137-01; CA NU38OT000297 
from the Council of State and Territorial Epidemiologists (CSTE). NCSU-COVSIM: Cooperative Agreement 
NU38OT000297 from the CSTE and the CDC. NotreDame-FRED: NSF RAPID DEB 2027718. NotreDame-
mobility: NSF RAPID DEB 2027718. PSI-DRAFT: NSF RAPID Grant # 2031536. QJHong-Encounter: NSF DMR-
2001411 and DMR-1835939. SDSC_ISG-TrendModel: The development of the dashboard was partly funded by 
the Fondation Privée des Hôpitaux Universitaires de Genève. UA-EpiCovDA: NSF RAPID Grant # 2028401. 
UChicagoCHATTOPADHYAY-UnIT: Defense Advanced Research Projects Agency (DARPA) #HR00111890043/
P00004 (I. Chattopadhyay, University of Chicago). UCSB-ACTS: NSF RAPID IIS 2029626. UCSD_NEU-
DeepGLEAM: Google Faculty Award, W31P4Q-21-C-0014. UMass-MechBayes: NIGMS #R35GM119582, NSF 
#1749854, NIGMS #R35GM119582. UMich-RidgeTfReg: This project is funded by the University of Michigan 
Physics Department and the University of Michigan Office of Research. USC-SikJalpha: This material is based 
upon work supported by the National Science. Foundation RAPID under Grant No. 2135784 with support 
from Centers for Disease Control and Prevention (CDC). UVA-Ensemble: National Institutes of Health (NIH) 
Grant 1R01GM109718, NSF BIG DATA Grant IIS-1633028, NSF Grant No.: OAC-1916805, NSF Expeditions 
in Computing Grant CCF-1918656, CCF-1917819, NSF RAPID CNS-2028004, NSF RAPID OAC-2027541, 
US Centers for Disease Control and Prevention 75D30119C05935, a grant from Google, University of Virginia 
Strategic Investment Fund award number SIF160, Defense Threat Reduction Agency (DTRA) under Contract 
No. HDTRA1-19-D-0007, and Virginia Dept of Health Grant VDH-21-501-0141. Wadnwani_AI-BayesOpt: This 
study is made possible by the generous support of the American People through the United States Agency for 
International Development (USAID). The work described in this article was implemented under the TRACETB 
Project, managed by WIAI under the terms of Cooperative Agreement Number 72038620CA00006. The contents 
of this manuscript are the sole responsibility of the authors and do not necessarily reflect the views of USAID or 
the United States Government. WalmartLabsML-LogForecasting: Team acknowledges Walmart to support this study.

https://doi.org/10.1038/s41597-022-01517-w


1 2Scientific Data |           (2022) 9:462  | https://doi.org/10.1038/s41597-022-01517-w

www.nature.com/scientificdatawww.nature.com/scientificdata/

Competing interests
AV, MC, and APP report grants from Metabiota Inc outside the submitted work.

Additional information
Correspondence and requests for materials should be addressed to N.G.R.
Reprints and permissions information is available at www.nature.com/reprints.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and 
institutional affiliations.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International 
License, which permits use, sharing, adaptation, distribution and reproduction in any medium or 

format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Cre-
ative Commons license, and indicate if changes were made. The images or other third party material in this 
article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the 
material. If material is not included in the article’s Creative Commons license and your intended use is not per-
mitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the 
copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
 
© The Author(s) 2022

US COVID-19 Forecast Hub Consortium
Tilmann Gneiting3,6, Anja Mühlemann7, Youyang Gu8, Yixian Chen9, Krishna Chintanippu9, 
Viresh Jivane9, Ankita Khurana9, Ajay Kumar9, Anshul Lakhani9, Prakhar Mehrotra9, Sujitha 
Pasumarty9, Monika Shrivastav9, Jialu You9, Nayana Bannur10, Ayush Deva10, Sansiddh Jain10, 
Mihir Kulkarni10, Srujana Merugu10, Alpan Raval10, Siddhant Shingi10, Avtansh Tiwari10, 
Jerome White10, Aniruddha Adiga11, Benjamin Hurt11, Bryan Lewis11, Madhav Marathe11,12, 
Akhil Sai Peddireddy13, Przemyslaw Porebski11, Srinivasan Venkatramanan11, Lijing 
Wang14,15, Maytal Dahan16, Spencer Fox17, Kelly Gaither16, Michael Lachmann18, Lauren Ancel 
Meyers17, James G. Scott19, Mauricio Tec20, Spencer Woody17, Ajitesh Srivastava21, Tianjian 
Xu22, Jeffrey C. Cegan23, Ian D. Dettwiller24, William P. England24, Matthew W. Farthing24, 
Glover E. George24, Robert H. Hunter24, Brandon Lafferty24, Igor Linkov23, Michael L. Mayo24, 
Matthew D. Parno25, Michael A. Rowland24, Benjamin D. Trump23, Samuel Chen26, Stephen V. 
Faraone27, Jonathan Hess27, Christopher P. Morley28, Asif Salekin29, Dongliang Wang28, Yanli 
Zhang-James27, Thomas M. Baer30, Sabrina M. Corsetti31, Marisa C. Eisenberg32, Karl Falb31, 
Yitao Huang31, Emily T. Martin33, Ella McCauley31, Robert L. Myers31, Tom Schwarz31, Graham 
Casey Gibson34, Daniel Sheldon35, Liyao Gao36, Yian Ma37, Dongxia Wu38, Rose Yu39,40, 
Xiaoyong Jin41, Yu-Xiang Wang41, Xifeng Yan41, YangQuan Chen38, Lihong Guo42, Yanting 
Zhao43, Jinghui Chen44, Quanquan Gu44, Lingxiao Wang44, Pan Xu44, Weitong Zhang44, Difan 
Zou44, Ishanu Chattopadhyay45, Yi Huang45, Guoqing Lu46, Ruth Pfeiffer47, Timothy Sumner48, 
Dongdong Wang49, Liqiang Wang49, Shunpu Zhang48, Zihang Zou49, Hannah Biegel50, Joceline 
Lega50, Fazle Hussain51, Zeina Khan51, Frank Van Bussel51, Steve McConnell52,53, Stephanie L 
Guertin54, Christopher Hulme-Lowe55, V. P. Nagraj54, Stephen D. Turner54, Benjamín Bejar56, 
Christine Choirat56, Antoine Flahault57, Ekaterina Krymova56, Gavin Lee56, Elisa Manetti57, 
Kristen Namigai57, Guillaume Obozinski56, Tao Sun56, Dorina Thanou58, Xuegang Ban59, 
Yunfeng Shi60, Robert Walraven61, Qi-Jun Hong62,63, Axel van de Walle64, Michal Ben-Nun65, 
Steven Riley66, Pete Riley65, James Turtle65, Duy Cao67, Joseph Galasso67, Jae H. Cho68, Areum 
Jo68, David DesRoches69, Pedro Forli70, Bruce Hamory71, Ugur Koyluoglu72, Christina 
Kyriakides73, Helen Leis74, John Milliken72, Michael Moloney72, James Morgan72, Ninad 
Nirgudkar75, Gokce Ozcan72, Noah Piwonka74, Matt Ravi75, Chris Schrader74, Elizabeth 
Shakhnovich74, Daniel Siegel72, Ryan Spatz75, Chris Stiefeling76, Barrie Wilkinson77, Alexander 
Wong73, Sean Cavany78, Guido España78, Sean Moore78, Rachel Oidtman78, Alex Perkins78, 
Julie S. Ivy79, Maria E. Mayorga79, Jessica Mele79, Erik T. Rosenstrom79, Julie L. Swann79, 
Andrea Kraus80, David Kraus80, Jiang Bian81, Wei Cao81, Zhifeng Gao81, Juan Lavista Ferres81, 
Chaozhuo Li81, Tie-Yan Liu81, Xing Xie81, Shun Zhang81, Shun Zheng81, Matteo Chinazzi82, 
Alessandro Vespignani82,83, Xinyue Xiong82, Jessica T. Davis82, Kunpeng Mu82, Ana Pastore y 
Piontti82, Jackie Baek84, Vivek Farias85, Andreea Georgescu84, Retsef Levi85, Deeksha Sinha84, 
Joshua Wilde84, Andrew Zheng84, Omar Skali Lami84, Amine Bennouna84, David Nze Ndong85, 

https://doi.org/10.1038/s41597-022-01517-w
http://www.nature.com/reprints
http://creativecommons.org/licenses/by/4.0/


13Scientific Data |           (2022) 9:462  | https://doi.org/10.1038/s41597-022-01517-w

www.nature.com/scientificdatawww.nature.com/scientificdata/

Georgia Perakis84,85, Divya Singhvi86, Ioannis Spantidakis84, Leann Thayaparan84, Asterios 
Tsiourvas84, Shane Weisberg84, Ali Jadbabaie87, Arnab Sarker87, Devavrat Shah87, Leo A. Celi88, 
Nicolas D. Penna88, Saketh Sundar89, Abraham Berlin90, Parth D. Gandhi91, Thomas 
McAndrew92, Matthew Piriya90, Ye Chen93, William Hlavacek94, Yen Ting Lin95, Abhishek 
Mallela96, Ely Miller97, Jacob Neumann98, Richard Posner97, Russ Wolfinger99, Lauren 
Castro100, Geoffrey Fairchild100, Isaac Michaud101, Dave Osthus101, Daniel Wolffram3,102, Dean 
Karlen103,104, Mark J. Panaggio105, Matt Kinsey105, Luke C. Mullany105, Kaitlin Rainwater-
Lovett105, Lauren Shin105, Katharine Tallaksen105, Shelby Wilson105, Michael Brenner106,107, 
Marc Coram106, Jessie K. Edwards108, Keya Joshi109, Ellen Klein106, Juan Dent Hulse110, Kyra H. 
Grantz110, Alison L. Hill111, Kathryn Kaminsky112, Joshua Kaminsky110, Lindsay T. Keegan113, 
Stephen A. Lauer110, Elizabeth C. Lee110, Joseph C. Lemaitre114, Justin Lessler110,115,116, 
Hannah R. Meredith110, Javier Perez-Saez110, Sam Shah117, Claire P. Smith110, Shaun A. 
Truelove118, Josh Wills119, Lauren Gardner120, Maximilian Marshall120, Kristen Nixon120, John C. 
Burant121, Jozef Budzinski122, Wen-Hao Chiang123, George Mohler123, Junyi Gao124, Lucas 
Glass125, Cheng Qian126, Justin Romberg127, Rakshith Sharma126, Jeffrey Spaeder128, Jimeng 
Sun129, Cao Xiao130, Lei Gao131, Zhiling Gu132, Myungjin Kim132, Xinyi Li133, Yueying Wang134, 
Guannan Wang135, Lily Wang132, Shan Yu136, Chaman Jain137, Sangeeta Bhatia138, Pierre 
Nouvellet139,140, Ryan Barber141, Emmanuela Gaikedu141, Simon Hay141, Steve Lim141, Chris 
Murray141, David Pigott141, Robert C. Reiner141, Prasith Baccam142, Heidi L. Gurung142, Steven 
A. Stage143, Bradley T. Suchoski142, Chung-Yan Fong144, Dit-Yan Yeung144, Bijaya Adhikari145, 
Jiaming Cui146, B. Aditya Prakash146, Alexander Rodríguez146, Anika Tabassum147,148, Jiajia 
Xie146, John Asplund149, Arden Baxter150, Pinar Keskinocak150, Buse Eylul Oruc150, Nicoleta 
Serban150, Sercan O. Arik151, Mike Dusenberry151, Arkady Epshteyn151, Elli Kanal151, Long T. 
Le151, Chun-Liang Li151, Tomas Pfister151, Rajarishi Sinha151, Thomas Tsai152, Nate Yoder151, 
Jinsung Yoon151, Leyou Zhang151, Daniel Wilson153, Artur A. Belov154, Carson C. Chow155, 
Richard C. Gerkin156, Osman N. Yogurtcu154, Mark Ibrahim157, Timothee Lacroix158, Matthew 
Le157, Jason Liao159, Maximilian Nickel157, Levent Sagun158, Sam Abbott160, Nikos I. Bosse160, 
Sebastian Funk160, Joel Hellewell161, Sophie R. Meakin160, Katharine Sherratt160, Rahi 
Kalantari162, Mingyuan Zhou163, Morteza Karimzadeh164, Benjamin Lucas165, Thoai Ngo166, 
Hamidreza Zoraghein166, Behzad Vahedi165, Zhongying Wang165, Sen Pei167, Jeffrey 
Shaman167, Teresa K. Yamana167, Dimitris Bertsimas85, Michael L. Li84, Saksham Soni84, 
Hamza Tazi Bouardi84, Madeline Adee168, Turgay Ayer169,170, Jagpreet Chhatwal168,171, Ozden 
O. Dalgic172, Mary A. Ladd168, Benjamin P. Linas173, Peter Mueller168, Jade Xiao170, Jurgen 
Bosch174,175, Austin Wilson175, Peter Zimmerman175, Qinxia Wang176, Yuanjia Wang176, 
Shanghong Xie176, Donglin Zeng177, Jacob Bien178, Logan Brooks179, Alden Green179, Addison 
J. Hu179, Maria Jahja179, Daniel McDonald180, Balasubramanian Narasimhan181, Collin 
Politsch182, Samyak Rajanala183, Aaron Rumack182, Noah Simon184, Ryan J. Tibshirani179, Rob 
tibshirani183, Valerie Ventura179, Larry Wasserman179, John M. Drake185, Eamon B. O’Dea185, 
Yaser Abu-Mostafa186, Rahil Bathwal186, Nicholas A. Chang186, Pavan Chitta187, Anne 
Erickson186, Sumit Goel186, Jethin Gowda188, Qixuan Jin186, HyeongChan Jo186, Juhyun Kim186, 
Pranav Kulkarni186, Samuel M. Lushtak186, Ethan Mann186, Max Popken186, Connor Soohoo189, 
Kushal Tirumala186, Albert Tseng186, Vignesh Varadarajan186, Jagath Vytheeswaran186, 
Christopher Wang186, Akshay Yeluri190, Dominic Yurk186, Michael Zhang186, Alexander 
Zlokapa191, Robert Pagano192, Chandini Jain193, Vishal Tomar194, Lam Ho195, Huong 
Huynh196,197, Quoc Tran196,198, Velma K. Lopez199, Jo W. Walker199, Rachel B. Slayton199, 
Michael A. Johansson199, Matthew Biggerstaff199 & Nicholas G. Reich1

6Institute of Stochastics, Karlsruhe Institute of Technology, Karlsruhe, Germany. 7institute of Mathematical Statistics 
and Actuarial Science, University of Bern, 3012, Bern, Switzerland. 8Unaffiliated, New York, NY, 10016, USA. 
9Walmart, Sunnyvale, CA, 94086, USA. 10Wadhwani Institute of Artificial Intelligence, Mumbai, Maharashtra, 
400093, India. 11Biocomplexity Institute, University of Virginia, Charlottesville, Virginia, 22904-4298, USA. 
12Department of Computer Science, University of Virginia, Charlottesville, Virginia, 22904-4298, USA. 13Discreet 
Labs, Raleigh, North Carolina, USA. 14Boston Children’s Hospital, Boston, Massachusetts, 02115, USA. 15Harvard 
Medical School, Boston, Massachusetts, USA. 16Texas Advanced Computing Center, Austin, Texas, 78758, USA. 
17Department of Integrative Biology, University of Texas at Austin, Austin, TX, 78712, USA. 18Santa fe institute, 
Santa Fe, NM, 87501, USA. 19Department of Information, Risk, and Operations Management, University of Texas at 
Austin, Austin, TX, 78712, USA. 20Department of Statistics and Data Sciences, University of texas at Austin, Austin, 
TX, 78712, USA. 21Ming Hsieh Department of Electrical and Computer Engineering, University of Southern California, 
Los Angeles, California, 90089, USA. 22Department of computer Science, University of Southern california, Los 

https://doi.org/10.1038/s41597-022-01517-w


1 4Scientific Data |           (2022) 9:462  | https://doi.org/10.1038/s41597-022-01517-w

www.nature.com/scientificdatawww.nature.com/scientificdata/

Angeles, California, 90089, USA. 23US Army Engineer Research and Development Center, Concord, MA, 01742, USA. 
24US Army Engineer Research and Development Center, Vicksburg, MS, 39180, USA. 25US Army Engineer Research 
and Development Center, Hanover, NH, 03755, USA. 26School of Medicine, State University of New York Upstate 
Medical University, Syracuse, NY, 13210, USA. 27Department of Psychiatry and Behavioral Sciences, State University 
of New York Upstate Medical University, Syracuse, NY, 13210, USA. 28Department of Public Health & Preventive 
Medicine, State University of New York Upstate Medical University, Syracuse, NY, 13210, USA. 29Department of 
Electrical Engineering and Computer Science, Syracuse University, Syracuse, NY, 13210, USA. 30Department of 
Physics, Trinity University, San Antonio, TX, 78212, USA. 31Department of Physics, University of Michigan - Ann 
Arbor, Ann Arbor, MI, 48109, USA. 32Departments of epidemiology, complex Systems, and Mathematics, University 
of Michigan - Ann Arbor, Ann Arbor, MI, 48109, USA. 33School of Public Health, University of Michigan - Ann Arbor, 
Ann Arbor, MI, 48109, USA. 34School of Public Health and Health Sciences, University of Massachusetts Amherst, 
Amherst, MA, 01003, USA. 35college of information and computer Sciences, University of Massachusetts Amherst, 
Amherst, MA, 01003, USA. 36Department of Statistics, University of Washington, Seattle, WA, 98195, USA. 
37Halıcıoğlu Data Science Institute, University of California, San Diego, San Diego, CA, 92093, USA. 38Mechatronics, 
Embedded Systems and Automation Lab, Department of Mechanical Engineering, University of California Merced, 
Merced, CA, 95301, USA. 39Northeastern University, Boston, MA, 02115, USA. 40Department of computer Science 
and Engineering, University of California, San Diego, San Diego, CA, 93106, USA. 41Department of computer 
Science, University of California at Santa Barbara, Santa Barbara, CA, 92093, USA. 42Jilin University, changchun city, 
Jilin Province, PR China. 43University of Science and Technology of China, Hefei, Anhui, China. 44Department of 
computer Science, University of california, Los Angeles, cA, USA. 45Department of Medicine, University of chicago, 
Chicago, IL, 60637, USA. 46University of Nebraska Omaha, Omaha, NE, 68182, USA. 47national cancer institute 
(NCI), NIH, Rockville, MD, 20850, USA. 48Department of Statistics and Data Science, University of central florida, 
Orlando, FL, 32816, USA. 49Department of Computer Science, University of Central Florida, Orlando, FL, 32816, USA. 
50Department of Mathematics, University of Arizona, Tucson, AZ, 85721, USA. 51Department of Mechanical 
Engineering, Texas Tech University, Lubbock, Texas, 79409, USA. 52Construx Software, Bellevue, WA, 98004, USA. 
53Construx, Bellevue, WA, 98004, USA. 54Quality Assurance and Data Science, Signature Science, LLc, charlottesville, 
Virginia, 22911, USA. 55Quality Assurance and Data Science, Signature Science, LLC, Austin, Texas, 78759, USA. 
56Swiss Data Science Center, EPFL & ETHZ, 1015, Lausanne, Switzerland. 57Institute of Global Health, Faculty of 
Medicine, University of Geneva, 1202, Geneva, Switzerland. 58Center for Intelligent Systems, EPFL, 1015, Lausanne, 
Switzerland. 59Department of Civil and Environmental Engineering, University of Washington, Seattle, WA, 98195, 
USA. 60Department of Materials Science and Engineering, Rensselaer Polytechnic Institute, Troy, NY, 12309, USA. 
61Unaffiliated, Davis, California, 95616, USA. 62Brown University, Providence, RI, 02912, USA. 63School for 
Engineering of Matter, Transport and Energy, Arizona State University, Tempe, Arizona, 85287, USA. 64School of 
Engineering, Brown University, Providence, RI, 02912, USA. 65Infectious Disease Group, Predictive Science, Inc, San 
Diego, California, 92116, USA. 66Department of infectious Disease epidemiology, imperial college, London, 
Westminster, London, W2 1PG, UK. 67University of Dallas, Irving, TX, 75062, USA. 68Unaffiliated, Seattle, WA, USA. 
69Oliver Wyman Digital, Oliver Wyman, Boston, MA, 02110, USA. 70Oliver Wyman Digital, Oliver Wyman, Sao Paolo, 
04711-904, Brazil. 71Health & Life Sciences, Oliver Wyman, Boston, MA, 2110, USA. 72Financial Services, Oliver 
Wyman, New York, NY, 10036, USA. 73Oliver Wyman Digital, Oliver Wyman, New York, NY, 10036, USA. 74Health & 
Life Sciences, Oliver Wyman, New York, NY, 10036, USA. 75Core Consultant Group, Oliver Wyman, New York, NY, 
10036, USA. 76Financial Services, Oliver Wyman, Toronto, ON, M5J 0A1, Canada. 77Financial Services, Oliver Wyman, 
Marylebone, London, W1U 8EW, UK. 78Department of Biological Sciences, University of notre Dame, notre Dame, 
IN, 46556, USA. 79Department of Industrial and Systems Engineering, North Carolina State University, Raleigh, NC, 
27695, USA. 80Department of Mathematics and Statistics, Masaryk University, Brno, 61137, Czech Republic. 
81Microsoft, Redmond, WA, 98029, USA. 82Laboratory for the Modeling of Biological and Socio-technical Systems, 
northeastern University, Boston, MA, USA. 83iSi foundation, turin, italy. 84Operations Research Center, 
Massachusetts Institute of Technology, Cambridge, MA, 02139, USA. 85Sloan School of Management, Massachusetts 
Institute of Technology, Cambridge, MA, 02142, USA. 86Leonard N Stern School of Business, New York University, 
nY, USA. 87Institute for Data, Systems, and Society, Massachusetts Institute of Technology, Cambridge, MA, 02139, 
USA. 88Laboratory for Computational Physiology, Massachusetts Institute of Technology, Cambridge, MA, 02139, 
USA. 89River Hill High School, Clarksville, MD, USA. 90Department of computer Science and engineering, Lehigh 
University, Bethlehem, PA, 18015, USA. 91Department of industrial and Systems engineering, Lehigh University, 
Bethlehem, PA, 18015, USA. 92College of Health, Lehigh University, Bethlehem, PA, 18015, USA. 93Department of 
Mathematics and Statistics, Northern Arizona University, Flagstaff, AZ, 86011, USA. 94theoretical Division, Los 
Alamos National Laboratory, Los Alamos, NM, 87545, USA. 95Information Sciences Group, Los Alamos National 
Laboratory, Los Alamos, NM, 87545, USA. 96Theoretical Biology and Biophysics Group (T-6), Theoretical Division, 
Los Alamos National Laboratory, Los Alamos, NM, 87545, USA. 97Department of Biological Sciences, northern 
Arizona University, Flagstaff, AZ, 86011, USA. 98Department of Chemistry and chemical biology, Cornell University, 
Ithaca, NY, 14850, USA. 99Life Sciences, JMP, LLC, Cary, NC, 27513, USA. 100information Systems and Modeling 
Group, Los Alamos National Laboratory, Los Alamos, NM, 87545, USA. 101Statistical Sciences Group, Los Alamos 
National Laboratory, Los Alamos, NM, 87545, USA. 102chair of econometrics and Statistics, Karlsruhe institute of 
Technology, Karlsruhe, Germany. 103TRIUMF, Vancouver, BC, V6T 2A3, Canada. 104Department of Physics and 
Astronomy, University of Victoria, Victoria, BC, V8W 2Y2, Canada. 105Johns Hopkins University Applied Physics Lab, 
Laurel, MD, 20723, USA. 106Google Research, Mountainview, CA, 94043, USA. 107School of engineering and Applied 
Sciences, Harvard University, Cambridge, MA, 02134, US. 108Department of Epidemiology, UNC Gillings School of 
Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA. 109Department of 
Epidemiology, Harvard TH Chan School of Public Health, Boston, MA, 02115, USA. 110Department of epidemiology, 
Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, 21205, USA. 111institute for computational 
Medicine, Johns Hopkins University, Baltimore, MD, 21218, USA. 112Unaffiliated, Baltimore, MD, 21205, USA. 

https://doi.org/10.1038/s41597-022-01517-w


1 5Scientific Data |           (2022) 9:462  | https://doi.org/10.1038/s41597-022-01517-w

www.nature.com/scientificdatawww.nature.com/scientificdata/

113Division of Epidemiology, Department of Internal Medicine, University of Utah, Salt Lake City, UT, 84108, USA. 
114Laboratory of Ecohydrology, School of Architecture, Civil and Environmental Engineering, École Polytechnique 
Fédérale de Lausanne, Lausanne, 1015, Switzerland. 115Department of Epidemiology, Gillings School of Global Public 
Health and The Carolina Population Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, US. 
116The Carolina Population Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA. 
117Unaffiliated, San Francisco, CA, 94107, USA. 118international Vaccine Access center, Department of international 
Health, Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, 21231, 
USA. 119Unaffiliated, San Francisco, CA, 94122, USA. 120Department of Civil and Systems Engineering, Johns Hopkins 
University, Baltimore, MD, 21218-2682, USA. 121Unaffiliated, Amsterdam, Netherlands. 122Unaffiliated, Vienna, 
1010, Austria. 123Indiana University–Purdue University Indianapolis, Indianapolis, IN, 46202, USA. 124University of 
Illinois at Urbana-Champaign, Champaign, IL, USA. 125Analytics center of excellence, iQViA, Plymouth Meeting, 
Pennsylvania, PA, USA. 126Analytics Center of Excellence, IQVIA, Cambridge, MA, USA. 127Georgia Institute of 
Technology, Atlanta, GA, USA. 128iQViA, evanston, iL, USA. 129University of Illinois at Urbana-Champaign, 
champaign, iL, USA. 130Amplitude, San francisco, cA, USA. 131Department of Finance, Iowa State University, Ames, 
IA, 50011-1090, USA. 132Department of Statistics, Iowa State University, Ames, IA, 50011-1090, USA. 133School of 
mathematical and statistical sciences, Clemson University, Clemson, SC, 29634, USA. 134Iowa State University, Ames, 
IA, 50011-1091, USA. 135Department of mathematics, College of William & Mary, Williamsburg, VA, 23187, USA. 
136Department of Statistics, University of Virginia, Charlottesville, VA, 22904, USA. 137institute of Business 
Forecasting (IBF), Great Neck, NY, 11021, USA. 138imperial college London, London, UK. 139imperial college London, 
Brighton, UK. 140University of Sussex, Falmer, Brighton, BN1 9RH, UK. 141Institute for Health Metrics and Evaluation, 
University of Washington, Seattle, WA, 98121, USA. 142Emerging Technologies, IEM, Inc, Bel Air, MD, 21015, USA. 
143Emerging Technologies, IEM, Inc, Baton Rouge, LA, 70809, USA. 144The Hong Kong University of Science and 
Technology, Clear Water Bay, Hong Kong. 145Department of Computer Science, University of Iowa, Iowa City, IA, 
52242, USA. 146College of Computing, Georgia Institute of Technology, Atlanta, GA, 30308, USA. 147Georgia Institute 
of Technology, Atlanta, GA, 30308, USA. 148Department of Computer Science, Virginia Tech, Falls Church, VA, 22043, 
USA. 149Advanced Data Analytics, Metron, Inc, Reston, VA, 20190, USA. 150School of industrial and Systems 
Engineering, Georgia Insitute of Technology, Atlanta, GA, 30318, USA. 151Google Cloud, Sunnyvale, CA, 94089, USA. 
152Harvard University, Cambridge, MA, 02138, USA. 153Economic Research Department, Federal Reserve Bank of San 
Francisco, San Francisco, CA, 94105, USA. 154Office of Biostatistics and Epidemiology, Center for Biologics Evaluation 
and Research, Food and Drug Administration, Center for Biologics Evaluation and Research, Silver Spring, MD, 
20993, USA. 155Mathematical Biology Section, NIDDK/LBM, NIH, Bethesda, MD, 20892, USA. 156School of Life 
Sciences, Arizona State University, Tempe, AZ, 85287, USA. 157Meta AI, New York, NY, USA. 158Meta Ai, Paris, france. 
159Meta, Menlo Park, cA, USA. 160centre for Mathematical Modelling of infectious Diseases, London School of 
Hygiene & Tropical Medicine, London, UK. 161London School of Hygiene & Tropical Medicine, London, UK. 
162Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, TX, 78712, USA. 
163McCombs School of Business, The University of Texas at Austin, Austin, TX, 78712, USA. 164Department of 
Geography, Institute of Behavioral Science, University of Colorado Boulder, Boulder, CO, 80309, USA. 165Department 
of Geography, University of Colorado Boulder, Boulder, CO, 80309, USA. 166Social and Behavioral Science Research, 
Population Council, New York, NY, 10017, USA. 167Department of Environmental Health Sciences, Columbia 
University, New York, NY, 10032, USA. 168Radiology - Institute for Technology Assessment, Massachusetts General 
Hospital, Boston, MA, 02114, USA. 169Emory University Medical School, Atlanta, GA, 30322, USA. 170H. Milton 
Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA, 30332, USA. 
171Harvard Medical School, Boston, MA, 02114, USA. 172Health Economic Modeling, Value Analytics Labs, Boston, 
MA, 02114, USA. 173Department of Medicine, Section of infectious Diseases, Boston University School of Medicine, 
Boston, MA, 02118, USA. 174InterRayBio, LLC, Cleveland, Ohio, 44106, USA. 175Center for Global Health & Diseases, 
Case Western Reserve University, Cleveland, OH, 44106-4983, USA. 176Department of Biostatistics, Columbia 
University, New York, NY, 10032, USA. 177Department of Biostatistics, UNC Chapel Hill, Chapel Hill, NC, 27599, USA. 
178Marshall School of Business, Department of Data Sciences and Operations (DSO), University of Southern 
California, Los Angeles, CA, 90089, USA. 179Department of Statistics, Carnegie Mellon University, Pittsburgh, PA, 
15213, USA. 180Department of Statistics, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada. 
181Department of Biomedical Data Sciences and Department of Statistics, Stanford University, Stanford, CA, 94305-
4020, USA. 182Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA, 15213, USA. 
183Department of Statistics, Stanford University, Stanford, CA, 94305-4020, USA. 184Department of Biostatistics, 
University of Washington, Seattle, WA, 98195, USA. 185center for the ecology of infectious Diseases, University of 
Georgia, Athens, GA, 30602, USA. 186California Institute of Technology, Pasadena, CA, 91125, USA. 187california 
Institute of Technology, Mountain View, CA, 94043, USA. 188California Institute of Technology, Chicago, IL, 60606, 
USA. 189California Institute of Technology, Redwood City, CA, 94065, USA. 190california institute of technology, 
Edison, NJ, 08820, USA. 191Center for Theoretical Physics, California Institute of Technology, Cambridge, MA, 02139, 
USA. 192Unaffiliated, Tucson, AZ, 85710, USA. 193Auquan, London, EC2A 4DP, UK. 194Auquan, Bengaluru, KA, india. 
195Department of Mathematics and Statistics, Dalhousie University, Halifax, Nova Scotia, B3H 4R2, Canada. 
196AIpert, San Carlos, CA, 94070, USA. 197Virtual Power System, Milpitas, CA, 95035, USA. 198Walmart Inc, Sunnyvale, 
CA, 94085, USA. 199Centers for Disease Control and Prevention, Atlanta, GA, USA. 

https://doi.org/10.1038/s41597-022-01517-w

	The United States COVID-19 Forecast Hub dataset
	Introduction
	Results
	Data acquisition. 
	Repository structure. 
	Forecast outcomes. 
	Forecast target dates. 
	Summary of forecast data collected. 
	Ensemble and baseline forecasts. 
	Use scenarios. 
	R package covidHubUtils. 
	Visualization of forecasts in the COVID-19 Forecast Hub. 
	Communicating results from the COVID-19 Forecast Hub. 


	Discussion
	Methods
	Forecast assumptions. 
	Weekly submissions. 
	Exclusion criteria. 
	Updates to files. 
	Ground truth data. 
	Model designation. 
	GitHub repository data structure. 
	Forecast format. 
	Metadata format. 

	Technical Validations
	Validations during data submission. 
	Validations on Zoltar. 
	Truth data. 

	Acknowledgements
	Fig. 1 Time series of weekly incident deaths at the national level and forecasts from the COVID-19 Forecast Hub ensemble model for selected weeks in 2020 and 2021.
	Fig. 2 Schematic of the data storage and related infrastructure surrounding the COVID-19 Forecast Hub.
	Fig. 3 Number of primary forecasts submitted for each outcome per week from April 27th, 2020 through May 3rd, 2022.
	Fig. 4 Visualization tool updated weekly by the US COVID-19 Forecast Hub displays model forecasts and truth data at selected forecast dates, locations, forecast outcomes and PI levels.
	Table 1 Forecast characteristics for all four outcomes.