Examples
NoneSelector
The NoneSelector
simply assigns all days to the validation set and none to the holdout set.
using DateSelectors
using Dates
date_range = Date(2019, 1, 1):Day(1):Date(2019, 3, 31)
selector = NoneSelector()
validation, holdout = partition(date_range, selector)
validation
90-element Vector{Date}:
2019-01-01
2019-01-02
2019-01-03
2019-01-04
2019-01-05
2019-01-06
2019-01-07
2019-01-08
2019-01-09
2019-01-10
⋮
2019-03-23
2019-03-24
2019-03-25
2019-03-26
2019-03-27
2019-03-28
2019-03-29
2019-03-30
2019-03-31
RandomSelector
The RandomSelector
uniformly subsamples the collection of dates and assigns them to the holdout set.
Here we use a seed of 42
to uniformly sample from the date range with probability 10% into the holdout set, in 3-day blocks, some of which may be contiguous. Note that for a given seed and date range the portion in the holdout set may not be exactly 10% as it is a random sample.
The selection, while random, is fully determined by the RandomSelector
object and is invariant on the date range. That is to say if one has two distinct but overlapping date ranges, and uses the same RandomSelector
object, then the overlapping days will consistently be placed into either holdout or validation in both.
selector = RandomSelector(42, 0.10, Day(3))
validation, holdout = partition(date_range, selector)
validation
90-element Vector{Date}:
2019-01-01
2019-01-02
2019-01-03
2019-01-04
2019-01-05
2019-01-06
2019-01-07
2019-01-08
2019-01-09
2019-01-10
⋮
2019-03-23
2019-03-24
2019-03-25
2019-03-26
2019-03-27
2019-03-28
2019-03-29
2019-03-30
2019-03-31
PeriodicSelector
The PeriodicSelector
assigns holdout dates by taking a stride
once per period
. Where in the period the holdout stride
is taken from is determined by the offset
. The offset is relative to Monday 1st Jan 1900.
As the stride start location is relative to a fixed point rather than to the date range, this means that the selection, is fully determined by the PeriodicSelector
object and is invariant on the date range. That is to say if one has two distinct but overlapping date ranges, and uses the same PeriodicSelector
object, then the overlapping days will consistently be placed into either holdout or validation in both.
In this example - for whatever reason - we want to assign weekdays as validation days and weekends as holdout days. Therefore, our period
is Week(1)
and stride
is Day(2)
, because out of every week we want to keep 2 days in the holdout. Now, since we need to start selecting on the Saturday, we must first offset
by Day(5)
because zero offset corresponds to a Monday.
selector = PeriodicSelector(Week(1), Day(2), Day(5))
validation, holdout = partition(date_range, selector)
validation
64-element Vector{Date}:
2019-01-01
2019-01-02
2019-01-03
2019-01-04
2019-01-07
2019-01-08
2019-01-09
2019-01-10
2019-01-11
2019-01-14
⋮
2019-03-19
2019-03-20
2019-03-21
2019-03-22
2019-03-25
2019-03-26
2019-03-27
2019-03-28
2019-03-29
We can verify that it returned what we expected:
unique(dayname.(validation))
5-element Vector{String}:
"Tuesday"
"Wednesday"
"Thursday"
"Friday"
"Monday"
unique(dayname.(holdout))
2-element Vector{String}:
"Saturday"
"Sunday"
Using AbstractIntervals
You can also specify the date range as an Interval
:
using Intervals
selector = PeriodicSelector(Week(1), Day(2), Day(4))
date_range = Date(2018, 1, 1)..Date(2019, 3, 31)
validation, holdout = partition(date_range, selector)
validation
325-element Vector{Date}:
2018-01-01
2018-01-02
2018-01-03
2018-01-04
2018-01-07
2018-01-08
2018-01-09
2018-01-10
2018-01-11
2018-01-14
⋮
2019-03-19
2019-03-20
2019-03-21
2019-03-24
2019-03-25
2019-03-26
2019-03-27
2019-03-28
2019-03-31
as well as an AbstractInterval
:
selector = PeriodicSelector(Week(1), Day(2), Day(4))
date_range = AnchoredInterval{Day(90), Date}(Date(2019, 1, 1))
validation, holdout = partition(date_range, selector)
validation
64-element Vector{Date}:
2019-01-01
2019-01-02
2019-01-03
2019-01-06
2019-01-07
2019-01-08
2019-01-09
2019-01-10
2019-01-13
2019-01-14
⋮
2019-03-19
2019-03-20
2019-03-21
2019-03-24
2019-03-25
2019-03-26
2019-03-27
2019-03-28
2019-03-31