Zoltar data model¶

This page documents how Zoltar represents forecast data, and the various types of predictions that are supported. Each forecast is made for a particular time zero in the project, and contains some number of predictions, one or more for each combination of the project's units and targets. Zoltar supports the following eight prediction types, which we've found covers many forecasting applications. All predictions must pass a number of validity checks as documented in Validation. Each section has an example as represented in Zoltar's JSON format which was taken from the "Docs Example Project" demo project.

Overview¶

(Please refer to the diagram below. Note that it does not contain the quantile, mean, median, and mode prediction types.) A forecast is made up of metadata (the model that made the forecast, the time zero for the forecast, and the forecast's date of submission) plus the forecast's data as a set of predictions. Each prediction is a collection of one or more prediction elements specific to one target and unit. Each prediction element is of a particular type (or "class") (point, named, bin, sample, quantile, mean, median, and mode as described below).

Predictions Diagram

Point predictions¶

Point predictions represent a single value. See here for details. In this example we see a prediction of 2.1 for the unit "loc1" and target "pct next week". Note that, like all predictions, the data type (in this case the float 2.1) must match the target's data type. (In this case it does; "pct next week" is a discrete target.)

{"unit": "loc1",
  "target": "pct next week",
  "class": "point",
  "prediction": {
    "value": 2.1
  }
}

Named distributions¶

Named distributions represent distributions like normal, log normal, gamma, etc. See here for details. Here is an example in which a normal distribution ("norm" is the abbreviation) is specified for unit "loc1" and target "pct next week". Named predictions support up to three parameters (named "param1", "param2", and "param3") depending on the family. Here the normal distribution requires two parameters: mean ("param1") and sd ("param2"). "param3" is unused and not listed.

{"unit": "loc1",
  "target": "pct next week",
  "class": "named",
  "prediction": {
    "family": "norm",
    "param1": 1.1,
    "param2": 2.2
  }
}

Bin distributions¶

Bin distributions represent binned distribution defined by inclusive lower bounds for each bin. See here for details. Example:

{"unit": "loc2",
  "target": "pct next week",
  "class": "bin",
  "prediction": {
    "cat": [1.1, 2.2, 3.3],
    "prob": [0.3, 0.2, 0.5]
  }
}

Sample distributions¶

Sample distributions represent samples taken for the unit and target. Here's an example:

{"unit": "loc3",
  "target": "pct next week",
  "class": "sample",
  "prediction": {
    "sample": [2.3, 6.5, 0.0, 10.0234, 0.0001]
  }
}

Quantile distributions¶

Quantile distributions represent quantiles of the predictive distribution. For example:

{"unit": "loc2",
  "target": "pct next week",
  "class": "quantile",
  "prediction": {
    "quantile": [0.025, 0.25, 0.5, 0.75, 0.975],
    "value": [1.0, 2.2, 2.2, 5.0, 50.0]
  }
}

Mean, median, and mode predictions¶

These three types are expressed similar to point predictions above, except the class name varies. All validations and constraints that apply to point also apply to these three. Here's the point example from above, but with the mean class:

{"unit": "loc1",
  "target": "pct next week",
  "class": "mean",
  "prediction": {
    "value": 2.1
  }
}