# Forecast Targets in Zoltar¶

Targets are the fundamental data structure of a forecast. In Zoltar, a single forecast made by a model may give predictions for multiple targets. For example, a single forecast might include a forecast of 1- and 2-week-ahead values and a prediction of when the time series will reach its maximum in a given period of time. When a project is created, the project owner specifies which targets should be part of any submitted forecast. As we will see below, targets have specific properties, and there are several different types of targets that determine which properties and features pertain to a particular target.

## Target types¶

*continuous*: A quantitative target whose range encapsulates a section of the real number line.

Examples: percentage of all doctors' office visits due to influenza like-illness, or disease incidence per 100,000 population.

*discrete*: A quantitative target whose range is a set of integer values.

Example: the number of incident cases in a time period.

*nominal*: A nominal, unordered categorical target.

Example: severity level in categories of "low", "moderate", and "high".

*binary*: A binary target, with a defined outcome that can be seen as a true/false.

Example: does the maximum value of a variable exceed some threshold C in a given period of time.

*date*: A target with a discrete set of calendar dates as possible outcomes.

Example: the calendar week in which peak incidence occurs (represented by the Sunday of that week.

## Target parameters¶

When created, all targets have a set of parameters that must be defined. Each type of target then has a set of additional, sometimes, optional parameters. These are all defined below.

### Summary of allowed, optional, and required parameters, by target type¶

Here is a table that summarizes which are allowed, optional, and required, by type. legend: 'x' = required, '(x)' = required if `is_step_ahead`

is `true`

, '-' = disallowed, '~' = optional.

target type | type | name | description | is_step_ahead | step_ahead_increment | unit | range | cats |
---|---|---|---|---|---|---|---|---|

continuous | x | x | x | x | (x) | x | ~ | ~ |

discrete | x | x | x | x | (x) | x | ~ | ~ |

nominal | x | x | x | x | (x) | - | - | x |

binary | x | x | x | x | (x) | - | - | - |

date | x | x | x | x | (x) | x | - | x |

### Required parameters for all targets¶

*name*: A brief name for the target. (The number of characters is not limited, but brevity is helpful.)*description*: A verbose description of what the target is. (The number of characters is not limited.)*type*: One of the five target types named above, e.g.,`continuous`

.*is_step_ahead*:`true`

if the target is one of a sequence of targets that predict values at different points in the future.*step_ahead_increment*: An integer, indicating the forecast horizon represented by this target. It is required if`is_step_ahead`

is`true`

.

### Parameters specific to continuous targets¶

*unit*: (Required) E.g., "percent" or "week".*range*: (Optional) a numeric vector of length 2 specifying a lower and upper bound of a range for the continuous target. The range is assumed to be inclusive on the lower bound and open on the upper bound, e.g. [a, b). If range is not specified than range is assumed to be (-infty, infty).*cats*: (Optional, but uploaded`Bin`

prediction types will be rejected unless these are specified) an ordered set of numeric values indicating the inclusive lower-bounds for the bins of binned distributions. E.g. if`cats`

is specified as [0, 1.1, 2.2] then the implied set of valid intervals would be [0,1.1), [1.1,2.2) and [2.2, \infty). Additionally, if`range`

had been specified as [0, 100] in addition to the above`cats`

, then the final bin would be [2.2, 100].

If both `range`

and `cats`

are specified, then min(`cats`

) must equal the lower bound and max(`cats`

) must be less than the upper bound of `range`

.

### Parameters specific to discrete targets¶

*unit*: (Required) E.g., "cases".*range*: (Optional, but uploaded`Bin`

prediction types will be rejected unless`range`

is specified) an integer vector of length 2 specifying a lower and upper bound of a range for the continuous target. The range is assumed to be inclusive on both the lower and upper bounds, e.g. [a, b]. If range is not specified than range is assumed to be (-infty, infty).*cats*: (Optional, and can only be specified if`range`

is also specified) an ordered set of integer values indicating the inclusive lower-bounds for the bins of binned distributions. E.g. if`cats`

is specified as [0, 10, 20, 30, 40, 50] and`range`

is specified as [0, 100] then the implied set of valid categories would be [0,10), [10, 20), [20, 30), [30, 40), [40, 50) and [50, 100].

If both `range`

and `cats`

are specified, then min(`cats`

) must equal the lower bound and max(`cats`

) must be less than the upper bound of `range`

.

### Parameters specific to nominal targets¶

*cats*: (Required) a list of strings that name the categories for this target. Categories must not include the following strings:`""`

,`"NA"`

, or`"NULL"`

(case does not matter).

### Parameters specific to binary targets¶

None.

### Parameters specific to date targets¶

*unit*: (Required) The unit parameter from the set of parameters required for all targets has a special meaning and use for date targets. It is required to be one of "month", "week", "biweek", or "day". This parameter specifies the units of the date target and how certain calculations are performed for dates. All inputs for date targets are required to be in the standard ISO`YYYY-MM-DD`

date format. This parameter determines the units on which scores are calculated. I.e., for the residual error, the calculation for a forecast where the point prediction is`forecasted_date`

and the unit is "week", the score would be calculated heuristically as`week(truth_date) - week(forecasted_date)`

. Note: to map dates to biweeks, we use the definitions as presented in Reich et al (2017).*cats*: (Required) a list of dates in`YYYY-MM-DD`

format. These are the only dates that will be considered as valid input for the target.

## Valid prediction types by target type¶

target type | data_type | point | bin | sample | named | quantile |
---|---|---|---|---|---|---|

continuous | float | x | x | x | (1) | x |

discrete | int | x | x | x | (2) | x |

nominal | text | x | x | x | - | - |

binary | boolean | x | x | x | - | - |

date | date | x | x | x | - | x |

Legend:
(1) = valid named distributions are `norm`

, `lnorm`

, `gamma`

, `beta`

(2) = valid named distributions are `pois`

, `nbinom`

, `nbinom2`

## Available scores by target type and prediction element¶

(Note that CRPS and brier scores have not yet been implemented.)

target type | prediction element | error | abs error | log score | CRPS | brier | PIT |
---|---|---|---|---|---|---|---|

continuous |
point | x | x | - | x(a) | - | - |

bin | - | - | x | x | x | x | |

sample | - | - | x(b) | x | - | x | |

named | - | - | x | x | - | x | |

quantile | ? | ? | ? | ? | ? | ? | |

discrete |
point | x | x | - | x(a) | - | - |

bin | - | - | x | x | x | x | |

sample | - | - | x(b) | x | - | x | |

named | - | - | x | x | - | x | |

quantile | ? | ? | ? | ? | ? | ? | |

nominal |
point | - | - | - | - | - | - |

bin | - | - | x | - | x | - | |

sample | - | - | x | - | x | - | |

binary |
point | x | x | - | x(a) | - | - |

bin | - | - | x | x | x | - | |

sample | - | - | x(b) | x | x | - | |

date |
point | x | x | - | x(a) | - | - |

bin | - | - | x | x | x | x | |

sample | - | - | x(b) | x | - | x | |

quantile | ? | ? | ? | ? | ? | ? |

- x(a) = CRPS is equivalent to abs error for point forecasts.
- x(b) = log score is required to be computed by approximation
- PIT = probability integral transform, CRPS = continuous ranked probability score