Validation
Impute.Validator
— TypeValidator
An Validator stores settings for checking the validity of a AbstractArray
or Tables.table
containing missing values. New validations are expected to subtype Impute.Validator
and, at minimum, implement the _validate(data::AbstractArray{Union{T, Missing}}, ::<MyValidator>)
method.
Impute.validate
— Methodvalidate(data::AbstractArray, v::Validator; dims=:)
If the validator v
fails then an error is thrown, otherwise the data
provided is returned without mutation. See Validator
for the minimum internal _validate
call requirements.
Arguments
data::AbstractArray
: the data to be impute along dimensionsdims
v::Validator
: the validator to apply
Keywords
dims
: The dimension to apply the_validate
along (default is:
)
Returns
- the input
data
if no error is thrown.
Throws
- An error when the test fails
julia> using Test; using Impute: Threshold, ThresholdError, validate
julia> M = [1.0 2.0 missing missing 5.0; 1.1 2.2 3.3 missing 5.5]
2×5 Matrix{Union{Missing, Float64}}:
1.0 2.0 missing missing 5.0
1.1 2.2 3.3 missing 5.5
julia> @test_throws ThresholdError validate(M, Threshold())
Test Passed
Thrown: ThresholdError
Impute.validate
— Methodvalidate(table, v::Validator; cols=nothing)
Applies the validator v
to the table
1 column at a time; if this is not the desired behaviour custom validate
methods should overload this method. See Validator
for the minimum internal _validate
call requirements.
Arguments
table
: the data to imputev
: the validator to apply
Keyword Arguments
cols
: The columns to impute along (default is to impute all columns)
Returns
- the input
data
if no error is thrown.
Throws
- An error when any column doesn't pass the test
Example
julia> using DataFrames, Test; using Impute: Threshold, ThresholdError, validate
julia> df = DataFrame(:a => [1.0, 2.0, missing, missing, 5.0], :b => [1.1, 2.2, 3.3, missing, 5.5])
5×2 DataFrame
Row │ a b
│ Float64? Float64?
─────┼──────────────────────
1 │ 1.0 1.1
2 │ 2.0 2.2
3 │ missing 3.3
4 │ missing missing
5 │ 5.0 5.5
julia> @test_throws ThresholdError validate(df, Threshold())
Test Passed
Thrown: ThresholdError
Thresholds
Impute.threshold
— FunctionImpute.threshold(data; limit=0.1, kwargs...)
Assert that proportion of missing values in the data
do not exceed the limit
.
Examples
julia> using DataFrames, Impute
julia> df = DataFrame(:a => [1.0, 2.0, missing, missing, 5.0], :b => [1.1, 2.2, 3.3, missing, 5.5])
5×2 DataFrames.DataFrame
│ Row │ a │ b │
│ │ Float64 │ Float64 │
├─────┼──────────┼──────────┤
│ 1 │ 1.0 │ 1.1 │
│ 2 │ 2.0 │ 2.2 │
│ 3 │ missing │ 3.3 │
│ 4 │ missing │ missing │
│ 5 │ 5.0 │ 5.5 │
julia> Impute.threshold(df)
ERROR: ThresholdError: Missing data limit exceeded 0.1 (0.4)
Stacktrace:
...
julia> Impute.threshold(df; limit=0.8)
5×2 DataFrames.DataFrame
│ Row │ a │ b │
│ │ Float64 │ Float64 │
├─────┼──────────┼──────────┤
│ 1 │ 1.0 │ 1.1 │
│ 2 │ 2.0 │ 2.2 │
│ 3 │ missing │ 3.3 │
│ 4 │ missing │ missing │
│ 5 │ 5.0 │ 5.5 │
Impute.Threshold
— TypeThreshold(; limit=0.1)
Assert that the ratio of missing values in the provided dataset does not exceed to specified limit.
Keyword Arguments
limit::Real
: Allowed proportion of missing values (should be between 0.0 and 1.0).
Weighted Thresholds
Impute.wthreshold
— FunctionImpute.wthreshold(data; ratio, weights, kwargs...)
Assert that the weighted proportion of missing values in the data
do not exceed the limit
.
Examples
julia> using DataFrames, Impute
julia> df = DataFrame(:a => [1.0, 2.0, missing, missing, 5.0], :b => [1.1, 2.2, 3.3, missing, 5.5])
5×2 DataFrames.DataFrame
│ Row │ a │ b │
│ │ Float64 │ Float64 │
├─────┼──────────┼──────────┤
│ 1 │ 1.0 │ 1.1 │
│ 2 │ 2.0 │ 2.2 │
│ 3 │ missing │ 3.3 │
│ 4 │ missing │ missing │
│ 5 │ 5.0 │ 5.5 │
julia> Impute.wthreshold(df; limit=0.4, weights=0.1:0.1:0.5)
ERROR: ThresholdError: Missing data limit exceeded 0.4 (0.4666666666666666)
Stacktrace:
...
julia> Impute.wthreshold(df; limit=0.4, weights=0.5:-0.1:0.1)
5×2 DataFrames.DataFrame
│ Row │ a │ b │
│ │ Float64 │ Float64 │
├─────┼──────────┼──────────┤
│ 1 │ 1.0 │ 1.1 │
│ 2 │ 2.0 │ 2.2 │
│ 3 │ missing │ 3.3 │
│ 4 │ missing │ missing │
│ 5 │ 5.0 │ 5.5 │
Impute.WeightedThreshold
— TypeWeightedThreshold(; limit, weights)
Assert that the weighted proportion missing values in the provided dataset does not exceed to specified limit. The weighed proportion is calculated as sum(weights[ismissing.(data)]) / sum(weights)
Keyword Arguments
limit::Real
: Allowed proportion of missing values (should be between 0.0 and 1.0).weights::AbstractWeights
: A set of statistical weights to use when evaluating the importance of each observation.