Forecast Targets in Zoltar¶
Targets are the fundamental data structure of a forecast. In Zoltar, a single forecast made by a model may give predictions for multiple targets. For example, a single forecast might include a forecast of 1- and 2-week-ahead values and a prediction of when the time series will reach its maximum in a given period of time. When a project is created, the project owner specifies which targets should be part of any submitted forecast. As we will see below, targets have specific properties, and there are several types of targets that determine which properties and features pertain to a particular target.
Target types¶
continuous: A quantitative target whose range encapsulates a section of the real number line.
Examples: percentage of all doctors' office visits due to influenza like-illness, or disease incidence per 100,000 population.
discrete: A quantitative target whose range is a set of integer values.
Example: the number of incident cases in a time period.
nominal: A nominal, unordered categorical target.
Example: severity level in categories of "low", "moderate", and "high".
binary: A binary target, with a defined outcome that can be seen as a true/false.
Example: does the maximum value of a variable exceed some threshold C in a given period of time.
date: A target with a discrete set of calendar dates as possible outcomes.
Example: the calendar week in which peak incidence occurs (represented by the Sunday of that week.
Target parameters¶
When created, all targets have a set of parameters that must be defined. Each type of target then has a set of additional, sometimes, optional parameters. These are all defined below.
Summary of allowed, optional, and required parameters, by target type¶
Here is a table that summarizes which are allowed, optional, and required, by type. legend: 'x' = required, '(x)' = required if is_step_ahead
is true
, '-' = disallowed, '~' = optional.
target type | type | name | description | outcome_variable | is_step_ahead | numeric_horizon | RDT | range | cats |
---|---|---|---|---|---|---|---|---|---|
continuous | x | x | x | x | x | (x) | (x) | ~ | ~ |
discrete | x | x | x | x | x | (x) | (x) | ~ | ~ |
nominal | x | x | x | x | x | (x) | (x) | - | x |
binary | x | x | x | x | x | (x) | (x) | - | - |
date | x | x | x | x | x | (x) | (x) | - | x |
Required parameters for all targets¶
- name: A brief name for the target. (The number of characters is not limited, but brevity is helpful.)
- description: A verbose description of what the target is. (The number of characters is not limited.)
- type: One of the five target types named above, e.g.,
continuous
. - is_step_ahead:
true
if the target is one of a sequence of targets that predict values at different points in the future. - numeric_horizon: An integer indicating the forecast horizon represented by this target. It is required if
is_step_ahead
istrue
. - reference date type (RDT): An integer that indicates how this target calculates
reference_date
andtarget_end_date
from a timezero. It is required ifis_step_ahead
istrue
. The allowed values are hard-coded (see the valid reference date types table below) and will be used for an upcoming visualization feature (more documentation to come then).
valid reference date types¶
Following are the allowed reference date types. id
is the integer value that's actually stored in the database, name
is the "official" unique name used by project configuration files, and abbreviation
is used to calculate target group names.
id | name | abbreviation |
---|---|---|
0 | DAY | day |
1 | MMWR_WEEK_LAST_TIMEZERO_MONDAY | week |
2 | MMWR_WEEK_LAST_TIMEZERO_TUESDAY | week |
3 | BIWEEK | biweek |
4 | MMWR_WEEK_LAST_TIMEZERO_SATURDAY | biweek |
Parameters specific to continuous targets¶
- range: (Optional) a numeric vector of length 2 specifying a lower and upper bound of a range for the continuous target. The range is assumed to be inclusive on the lower bound and open on the upper bound, e.g. [a, b). If range is not specified than range is assumed to be (-infty, infty).
- cats: (Optional, but uploaded
Bin
prediction types will be rejected unless these are specified) an ordered set of numeric values indicating the inclusive lower-bounds for the bins of binned distributions. E.g. ifcats
is specified as [0, 1.1, 2.2] then the implied set of valid intervals would be [0,1.1), [1.1,2.2) and [2.2, \infty). Additionally, ifrange
had been specified as [0, 100] in addition to the abovecats
, then the final bin would be [2.2, 100].
If both range
and cats
are specified, then min(cats
) must equal the lower bound and max(cats
) must be less than the upper bound of range
.
Parameters specific to discrete targets¶
- range: (Optional, but uploaded
Bin
prediction types will be rejected unlessrange
is specified) an integer vector of length 2 specifying a lower and upper bound of a range for the continuous target. The range is assumed to be inclusive on both the lower and upper bounds, e.g. [a, b]. If range is not specified than range is assumed to be (-infty, infty). - cats: (Optional, and can only be specified if
range
is also specified) an ordered set of integer values indicating the inclusive lower-bounds for the bins of binned distributions. E.g. ifcats
is specified as [0, 10, 20, 30, 40, 50] andrange
is specified as [0, 100] then the implied set of valid categories would be [0,10), [10, 20), [20, 30), [30, 40), [40, 50) and [50, 100].
If both range
and cats
are specified, then min(cats
) must equal the lower bound and max(cats
) must be less than the upper bound of range
.
Parameters specific to nominal targets¶
- cats: (Required) a list of strings that name the categories for this target. Categories must not include the following strings:
""
,"NA"
, or"NULL"
(case does not matter).
Parameters specific to binary targets¶
None.
Parameters specific to date targets¶
- cats: (Required) a list of dates in
YYYY-MM-DD
format. These are the only dates that will be considered as valid input for the target.
Valid prediction types by target type¶
target type | data_type | point | bin | sample | named | quantile | mean | median | mode |
---|---|---|---|---|---|---|---|---|---|
continuous | float | x | x | x | (1) | x | x | x | x |
discrete | int | x | x | x | (2) | x | x | x | x |
nominal | text | x | x | x | - | - | x | ||
binary | boolean | x | x | x | - | - | x | x | |
date | date | x | x | x | - | x | x | x | x |
Legend:
(1) = valid named distributions are norm
, lnorm
, gamma
, beta
(2) = valid named distributions are pois
, nbinom
, nbinom2