File Formats

Zoltar uses a number of formats for representing truth data, forecast data, configurations, etc. This page documents those.

Project creation configuration (JSON)

As documented in Projects, as an alternative to manually creating a project via the web interface, projects can be created from a JSON configuration file. Here's the configuration file from the "Docs Example Project" demo project: zoltar-project-config.json.

Project configuration files contain eight metadata keys ("name, "is_public", "description", "home_url", "logo_url", "core_data", "time_interval_type", "visualization_y_label"), plus three keys that are lists of objects ("units", "targets", and "timezeros"). The metadata values' meanings are self-evident except for these two:

  • time_interval_type: Used by the D3 component to label the X axis, is either Week, Biweek, or Month
  • visualization_y_label: "" Y axis, can be any text

Here are the three list objects' formats:

"units": a list of objects containing only one field:

  • name: The name of the unit.

"targets": a list of the project's targets. Please see the Targets.md file for a detailed description of target parameters and which are required. Here are all possible parameters that can be passed in a project configuration file:

  • name: string
  • description: string
  • type: string - must be one of the following: continuous, discrete, nominal, binary, or date
  • is_step_ahead: boolean
  • step_ahead_increment: integer - negative, zero, or positive
  • unit: string
  • range: an array (list) of two numbers
  • cats: an array (list) of one or more numbers or strings (which depends on the target's type's data type)
  • dates: an array (list) of one or more strings in the YYYY-MM-DD format

"timezeros": a list of the projects time zeros. Each has these fields:

  • timezero_date: The timezero's date in yyyymmdd format
  • data_version_date : Optional data version date in the same format. Pass null if the timezero does not have one
  • is_season_start: true if this starts a season, and false otherwise
  • season_name: Applicable when is_season_start is true, names the season, e.g., "2010-2011"

Truth data format (CSV)

Every project in Zoltar can have ground truth values associated with targets. This information is required for Zoltar to do scoring. Users can access them as CSV as described in Truth. An example truths file is zoltar-ground-truth-example.csv. The file has four columns: timezero, unit, target, value:

  • timezero: date the truth applies to, formatted as yyyy-mm-dd
  • unit: the unit's name
  • target: target name
  • value: truth value, formatted according to the target's type. date values are formatted yyyy-mm-dd and booleans as true or false

Score data format (CSV)

Zoltar calculates scores for all projects in the archive if they meet the requirements specified in Scoring. Users can download them as CSV through the web UI. The file has five fixed columns plus one column for each implemented score. Score names are in the header. Here is an example header and a few rows from the "COVID-19 Forecasts" project, starting with the zoltr query that returned them:

score_data <- zoltr::do_zoltar_query(zoltar_connection, project_url, FALSE, models = "YYG-ParamSearch",
                                     units=c("US", "01"), targets = "4 wk ahead cum death",
                                     timezeros = c("2020-05-11", "2020-05-12"), scores = "abs_error")
model            timezero    season     unit  target                truth   abs_error
YYG-ParamSearch  2020-05-11  2019-2020  US    4 wk ahead cum death  112787  2681.353647
YYG-ParamSearch  2020-05-11  2019-2020  1     4 wk ahead cum death  689     69.14696021
YYG-ParamSearch  2020-05-12  2019-2020  US    4 wk ahead cum death  118093  4783.580487
YYG-ParamSearch  2020-05-12  2019-2020  1     4 wk ahead cum death  773     94.82157045

Forecast data format (JSON)

For prediction input and output we use a JSON file format. This format is strongly inspired by https://github.com/cdcepi/predx/blob/master/predx_classes.md . See zoltar-predictions-examples.json for an example. The file contains a top-level with two keys: "meta" and "predictions". The meta section is unused for uploads, and for downloads contains various information about the forecast in the repository in the "forecast" field) plus lists of the project's "units" and "targets".

The "predictions" list contains objects for each prediction, and each object contains the following four keys:

  • "location": name of the Location.
  • "target": name of the Target.
  • "class": the type of prediction this is. It is an abbreviation of the corresponding Prediction subclass - the names are : bin, named, point, and sample.
  • "prediction": a class-specific dict containing the prediction data itself. The format varies according to class. Here is a summary (see Data model for details and examples):

    • "bin": Binned distribution with a category for each bin. It is a two-column table represented by two keys, one per column: cat and prob. They are paired, i.e., have the same number of rows.
    • "named": A named distribution with four fields: family and param1 through param3. family names must be one of : norm, lnorm, gamma, beta, bern, binom, pois, nbinom, and nbinom2.
    • "point: A numeric point prediction with a single value key.
    • "sample": Numeric samples represented as a table with one column that is found in the sample key.

Forecast data format (CSV)

Because the native Zoltar JSON format can be inconvenient to work with, the Zoltar libraries provide functions to convert from JSON to a Zoltar-specific CSV format with the following columns. Each row represents a prediction of a particular type as described on the data model page. Note that because different prediction types have different contents, the frame is 'sparse': not every row uses all columns, and unused ones are empty (""). However, the first three columns (unit, target, and class) are always non-empty.

  • unit: the prediction's unit
  • target: "" target
  • class: "" prediction type. one of bin, named, point, sample, and quantile
  • value: used for point and quantile prediction types. empty otherwise
  • cat: used for bin prediction types. empty otherwise
  • prob: ""
  • sample: used for sample prediction types. empty otherwise
  • quantile: used for quantile prediction types. empty otherwise
  • family: family name for named predictions. see Named Prediction Elements for a list of them
  • param1: parameter ""
  • param2: parameter ""
  • param3: parameter ""

Quantile forecast format (CSV)

Zoltar libraries support importing quantile data (see Validation.md for more information) via the COVID-19 CSV format documented at covid19-forecast-hub. While this format is not supported by Zoltar itself (i.e., you cannot upload one directly - you must always upload JSON files in the above format), the libraries allow you to translate between the two.

Columns: The Zoltar libraries ignore all but the following, which are allowed to be in any order:

  • "target": a unique id for the target
  • "location": a unique id for the location (we have standardized to FIPS codes). It is translated to Zoltar's "unit" concept.
  • "type": one of either "point" or "quantile"
  • "quantile": a value between 0 and 1 (inclusive), stating which quantile is displayed in this row. if type=="point" then NA.
  • "value": a numeric value representing the value of the quantile function evaluated at the probability specified in quantile

See quantile-predictions.csv for an example.