Zoltar uses a number of formats for representing truth data, forecast data, configurations, etc. This page documents those.
- Project creation configuration (JSON)
- Truth data format (CSV)
- Score data format (CSV)
- Forecast data format (JSON)
- Forecast data format (CSV)
- Quantile forecast format (CSV)
Project creation configuration (JSON)¶
As documented in Projects, as an alternative to manually creating a project via the web interface, projects can be created from a JSON configuration file. Here's the configuration file from the "Docs Example Project" demo project: zoltar-project-config.json.
Project configuration files contain eight metadata keys (
"visualization_y_label"), plus three keys that are lists of objects (
"timezeros"). The metadata values' meanings are self-evident except for these two:
time_interval_type: Used by the D3 component to label the X axis, is either
visualization_y_label: "" Y axis, can be any text
Here are the three list objects' formats:
"units": a list of objects containing only one field:
name: The name of the unit.
"targets": a list of the project's targets. Please see the Targets.md file for a detailed description of target parameters and which are required. Here are all possible parameters that can be passed in a project configuration file:
type: string - must be one of the following:
step_ahead_increment: integer - negative, zero, or positive
range: an array (list) of two numbers
cats: an array (list) of one or more numbers or strings (which depends on the target's type's data type)
dates: an array (list) of one or more strings in the
"timezeros": a list of the projects time zeros. Each has these fields:
timezero_date: The timezero's date in
data_version_date: Optional data version date in the same format. Pass
nullif the timezero does not have one
trueif this starts a season, and
season_name: Applicable when
true, names the season, e.g., "2010-2011"
Truth data format (CSV)¶
Every project in Zoltar can have ground truth values associated with targets. This information is required for Zoltar to do scoring. Users can access them as CSV as described in Truth. An example truths file is zoltar-ground-truth-example.csv. The file has four columns:
timezero: date the truth applies to, formatted as
unit: the unit's name
target: target name
value: truth value, formatted according to the target's type. date values are formatted
yyyy-mm-ddand booleans as
Score data format (CSV)¶
Zoltar calculates scores for all projects in the archive if they meet the requirements specified in Scoring. Users can download them as CSV through the web UI. The file has five fixed columns plus one column for each implemented score. Score names are in the header. Here is an example header and a few rows from the "COVID-19 Forecasts" project, starting with the zoltr query that returned them:
score_data <- zoltr::do_zoltar_query(zoltar_connection, project_url, FALSE, models = "YYG-ParamSearch", units=c("US", "01"), targets = "4 wk ahead cum death", timezeros = c("2020-05-11", "2020-05-12"), scores = "abs_error")
model timezero season unit target truth abs_error YYG-ParamSearch 2020-05-11 2019-2020 US 4 wk ahead cum death 112787 2681.353647 YYG-ParamSearch 2020-05-11 2019-2020 1 4 wk ahead cum death 689 69.14696021 YYG-ParamSearch 2020-05-12 2019-2020 US 4 wk ahead cum death 118093 4783.580487 YYG-ParamSearch 2020-05-12 2019-2020 1 4 wk ahead cum death 773 94.82157045
Forecast data format (JSON)¶
For prediction input and output we use a JSON file format. This format is strongly inspired by https://github.com/cdcepi/predx/blob/master/predx_classes.md . See zoltar-predictions-examples.json for an example. The file contains a top-level with two keys:
meta section is unused for uploads, and for downloads contains various information about the forecast in the repository in the
"forecast" field) plus lists of the project's
"predictions" list contains objects for each prediction, and each object contains the following four keys:
"location": name of the Location.
"target": name of the Target.
"class": the type of prediction this is. It is an abbreviation of the corresponding Prediction subclass - the names are :
"prediction": a class-specific dict containing the prediction data itself. The format varies according to class. Here is a summary (see Data model for details and examples):
"bin": Binned distribution with a category for each bin. It is a two-column table represented by two keys, one per column:
prob. They are paired, i.e., have the same number of rows.
"named": A named distribution with four fields:
familynames must be one of :
"point: A numeric point prediction with a single
"sample": Numeric samples represented as a table with one column that is found in the
Forecast data format (CSV)¶
Because the native Zoltar JSON format can be inconvenient to work with, the Zoltar libraries provide functions to convert from JSON to a Zoltar-specific CSV format with the following columns. Each row represents a prediction of a particular type as described on the data model page. Note that because different prediction types have different contents, the frame is 'sparse': not every row uses all columns, and unused ones are empty (
""). However, the first three columns (
class) are always non-empty.
unit: the prediction's unit
target: "" target
class: "" prediction type. one of
value: used for
quantileprediction types. empty otherwise
cat: used for
binprediction types. empty otherwise
sample: used for
sampleprediction types. empty otherwise
quantile: used for
quantileprediction types. empty otherwise
family: family name for
NamedPrediction Elements for a list of them
param1: parameter ""
param2: parameter ""
param3: parameter ""
Quantile forecast format (CSV)¶
Zoltar libraries support importing quantile data (see Validation.md for more information) via the COVID-19 CSV format documented at covid19-forecast-hub. While this format is not supported by Zoltar itself (i.e., you cannot upload one directly - you must always upload JSON files in the above format), the libraries allow you to translate between the two.
Columns: The Zoltar libraries ignore all but the following, which are allowed to be in any order:
"target": a unique id for the target
"location": a unique id for the location (we have standardized to FIPS codes). It is translated to Zoltar's "unit" concept.
"type": one of either
"quantile": a value between 0 and 1 (inclusive), stating which quantile is displayed in this row. if type=="point" then NA.
"value": a numeric value representing the value of the quantile function evaluated at the probability specified in quantile
See quantile-predictions.csv for an example.