File Formats¶
Zoltar uses a number of formats for representing truth data, forecast data, configurations, etc. This page documents those.
- Project creation configuration (JSON)
- Truth data format (CSV)
- Forecast data format (JSON)
- Forecast data format (CSV)
Project creation configuration (JSON)¶
As documented in Projects, as an alternative to manually creating a project via the web interface, projects can be created from a JSON configuration file. Here's the configuration file from the "Docs Example Project" demo project: zoltar-project-config.json.
Project configuration files contain six metadata keys ("name
, "is_public"
, "description"
, "home_url"
, "logo_url"
, "core_data"
), plus three keys that are lists of objects ("units"
, "targets"
, and "timezeros"
).
Here are the three list objects' formats:
"units": a list of objects containing two fields:
name
: The name of the unit.abbreviation
: The unit's abbreviation.
"targets": a list of the project's targets. Please see the Targets.md file for a detailed description of target parameters and which are required. Here are all possible parameters that can be passed in a project configuration file:
name
: stringdescription
: stringtype
: string - must be one of the following:continuous
,discrete
,nominal
,binary
, ordate
outcome_variable
: stringis_step_ahead
: booleannumeric_horizon
: integer - negative, zero, or positivereference_date_type
: one of the names listed in valid reference date typesrange
: an array (list) of two numberscats
: an array (list) of one or more numbers or strings (which depends on the target's type's data type)dates
: an array (list) of one or more strings in theYYYY-MM-DD
format
"timezeros": a list of the projects time zeros. Each has these fields:
timezero_date
: The timezero's date inYYYY-MM-DD
formatdata_version_date
: Optional data version date in the same format. Passnull
if the timezero does not have oneis_season_start
:true
if this starts a season, andfalse
otherwiseseason_name
: Applicable whenis_season_start
istrue
, names the season, e.g., "2010-2011"
Truth data format (CSV)¶
Every project in Zoltar can have ground truth values associated with targets. Users can access them as CSV as described in Truth. An example truths file is zoltar-ground-truth-example.csv. The file has four columns: timezero
, unit
, target
, value
:
timezero
: date the truth applies to, formatted asyyyy-mm-dd
unit
: the unit's abbreviationtarget
: target namevalue
: truth value, formatted according to the target's type. date values are formattedyyyy-mm-dd
and booleans astrue
orfalse
Forecast data format (JSON)¶
For prediction input and output we use a JSON file format. This format is strongly inspired by https://github.com/cdcepi/predx/blob/master/predx_classes.md . See zoltar-predictions-examples.json for an example. The file contains a top-level object with two keys: "meta"
and "predictions"
. The meta
section is unused for uploads, and for downloads contains various information about the forecast in the repository in the "forecast"
field, plus lists of the project's "units"
and "targets"
.
The "predictions"
list contains objects for each prediction, and each object contains the following four keys:
"unit"
: abbreviation of the Unit."target"
: name of the Target."class"
: the type of prediction this is. It is an abbreviation of the corresponding Prediction subclass - the names are :bin
,named
,point
, andsample
.-
"prediction"
: a class-specific object containing the prediction data itself. The format varies according to class. Here is a summary (see Data model for details and examples):"bin"
: Binned distribution with a category for each bin. It is a two-column table represented by two keys, one per column:cat
andprob
. They are paired, i.e., have the same number of rows."named"
: A named distribution with four fields:family
andparam1
throughparam3
.family
names must be one of :norm
,lnorm
,gamma
,beta
,bern
,binom
,pois
,nbinom
, andnbinom2
."point"
: A numeric point prediction with a singlevalue
key."sample"
: Numeric samples represented as a table with one column that is found in thesample
key."quantile"
: A quantile distribution with two paired columns:quantile
andvalue
."mean"
,"median"
, and"mode"
: A numeric prediction with a single value key, indicating the summary statistic indicated by the name, e.g. the mean.
Note: Regarding using the
point
prediction type vs.mean
,median
, andmode
, we strongly recommend adopting one of the latter types if possible. Doing so could help avoid future data analysis inconsistencies. For example, soliciting a specific type of point prediction can ensure that any scoring rule used to evaluate predictions is well-matched to that prediction type.
To indicate a Retracted prediction in JSON files, by use null
for the "prediction" value. For example:
{
"unit": "loc1",
"target": "pct next week",
"class": "point",
"prediction": null
}
Forecast data format (CSV)¶
Zoltar supports uploading and downloading forecast data in a CSV format with the following columns. It helps to think of this format as an "exploded" version of the prediction elements in the JSON format, where each element expands into one or more rows. named
, point
, mean
, median
, and mode
types expand into single rows, and bin
, sample
, and quantile
types expand into one or more rows depending on the particular data. You can read more about prediction types on the data model page.
Note that because different prediction types have different contents, the CSV rows are "sparse" in that not every row uses all columns (the unused ones are empty, i.e., ""
). However, the unit
, target
, and class
columns are always non-empty. For example, a point
row only uses the value
column whereas a quantile
row uses only the value
and quantile
columns. To learn more you can examine the example file zoltar-predictions-examples.csv, which contains the same data as zoltar-predictions-examples.json, but in CSV format.
Here are the columns used in the format, in order. Note that there are three additional columns present when downloading forecast data: model
, timezero
, and season
. They are positioned before the unit
column. These three columns are not present when uploading forecast data.
unit
: the prediction's unittarget
: "" targetclass
: "" prediction type. one ofbin
,named
,point
,sample
,quantile
,mean
,median
, andmode
value
: used forpoint
,quantile
,mean
,median
, andmode
prediction types. empty otherwisecat
: used forbin
prediction types. empty otherwiseprob
: ""sample
: used forsample
prediction types. empty otherwisequantile
: used forquantile
prediction types. empty otherwisefamily
: family name fornamed
predictions. seeNamed
Prediction Elements for a list of themparam1
: parameter ""param2
: parameter ""param3
: parameter ""
To indicate a Retracted prediction in CSV files, put NULL
(no quote marks) in the non-sparce cells. Taking the above example, a retracted point
row would have NULL
for its value
, and a retracted quantile
row would have NULL
for both value
and quantile
. Note that only one NULL
row of multi-row prediction types (bin
, sample
, and quantile
) needs to be present to retract a prediction element. For example, here are three retracted rows:
unit,target,class,value,cat,prob,sample,quantile,family,param1,param2,param3
loc2,pct next week,point,"NULL",,,,,,,,
loc2,pct next week,bin,,"NULL","NULL",,,,,,
loc2,pct next week,quantile,"NULL",,,,"NULL",,,,