API
AxisSets.KeyAlignmentError
AxisSets.KeyedDataset
AxisSets.KeyedDataset
AxisSets.Pattern
AxisKeys.axiskeys
AxisSets.constraintmap
AxisSets.dimpaths
AxisSets.flatten
AxisSets.rekey
AxisSets.validate
Base.getindex
Base.getproperty
Base.map
Base.mapslices
Base.merge
Base.setindex!
FeatureTransforms.apply
Impute.apply
Impute.apply
Impute.impute
Impute.validate
NamedDims.dimnames
AxisSets.KeyAlignmentError
— TypeKeyAlignmentError
Is thrown when the constrained dimensions of components in a KeyedDataset
have misaligned key values.
Fields
- constraint::Pattern - Constraint pattern describing all dimensions that must align
- groups - An iterator of paths and keys for each non-matching group
AxisSets.KeyedDataset
— TypeKeyedDataset
A KeyedDataset
describes an associative collection of component KeyedArray
s with constraints on their shared dimensions.
Fields
constraints::OrderedSet{Pattern}
- ConstraintPattern
s on shared dimensions.data::LittleDict{Tuple, KeyedArray}
- Flattened key paths as tuples component keyed arrays.
AxisSets.KeyedDataset
— Method(ds::KeyedDataset)(key) -> KeyedDataset
A collable syntax for selecting of filtering a subset of a KeyedDataset
.
Example
julia> using AxisKeys; using AxisSets: KeyedDataset, flatten;
julia> ds = KeyedDataset(
flatten([
:g1 => [
:a => KeyedArray(zeros(3); time=1:3),
:b => KeyedArray(ones(3, 2); time=1:3, loc=[:x, :y]),
],
:g2 => [
:a => KeyedArray(ones(3); time=1:3),
:b => KeyedArray(zeros(3, 2); time=1:3, loc=[:x, :y]),
]
])...
);
julia> collect(keys(ds(:__, :a).data))
2-element Vector{Tuple}:
(:g1, :a)
(:g2, :a)
julia> collect(keys(ds(:g1, :__).data))
2-element Vector{Tuple}:
(:g1, :a)
(:g1, :b)
AxisSets.Pattern
— TypePattern
A pattern is just a wrapper around a Tuple
which enables searching and filtering for matching components and dimension paths in a KeyedDataset
. Special symbols :_
and :__
are used as wildcards, similar to *
and **
in glob pattern matching.
Example
julia> using AxisSets: Pattern;
julia> items = [
(:train, :input, :load, :time),
(:train, :input, :load, :id),
(:train, :input, :temperature, :time),
(:train, :input, :temperature, :id),
(:train, :output, :load, :time),
(:train, :output, :load, :id),
];
julia> filter(in(Pattern(:__, :time)), items)
3-element Vector{NTuple{4, Symbol}}:
(:train, :input, :load, :time)
(:train, :input, :temperature, :time)
(:train, :output, :load, :time)
julia> filter(in(Pattern(:__, :load, :_)), items)
4-element Vector{NTuple{4, Symbol}}:
(:train, :input, :load, :time)
(:train, :input, :load, :id)
(:train, :output, :load, :time)
(:train, :output, :load, :id)
AxisKeys.axiskeys
— Methodaxiskeys(ds)
axiskeys(ds, dimname)
axiskeys(ds, pattern)
axiskeys(ds, dimpath)
Returns a list of unique axis keys within the KeyedDataset
. A Tuple
will always be returned unless you explicitly specify the dimpath
you want.
Example
julia> using AxisKeys; using AxisSets: KeyedDataset;
julia> ds = KeyedDataset(
:val1 => KeyedArray(rand(4, 3, 2); time=1:4, loc=-1:-1:-3, obj=[:a, :b]),
:val2 => KeyedArray(rand(4, 3, 2) .+ 1.0; time=1:4, loc=-1:-1:-3, obj=[:a, :b]),
);
julia> axiskeys(ds)
(1:4, -1:-1:-3, [:a, :b])
julia> axiskeys(ds, :time)
(1:4,)
julia> axiskeys(ds, (:val1, :time))
1:4
AxisSets.constraintmap
— Methodconstraintmap(ds)
Returns a mapping of constraint patterns to specific dimension paths. The returned dictionary has keys of type Pattern
and the values are sets of Tuple
.
Example
julia> using AxisKeys; using AxisSets: KeyedDataset, constraintmap;
julia> ds = KeyedDataset(
:val1 => KeyedArray(rand(4, 3, 2); time=1:4, loc=-1:-1:-3, obj=[:a, :b]),
:val2 => KeyedArray(rand(4, 3, 2) .+ 1.0; time=1:4, loc=-1:-1:-3, obj=[:a, :b]),
);
julia> collect(constraintmap(ds))
3-element Vector{Pair{AxisSets.Pattern, Set{Tuple}}}:
Pattern((:__, :time)) => Set([(:val2, :time), (:val1, :time)])
Pattern((:__, :loc)) => Set([(:val1, :loc), (:val2, :loc)])
Pattern((:__, :obj)) => Set([(:val2, :obj), (:val1, :obj)])
AxisSets.dimpaths
— Methoddimpaths(ds, [pattern]) -> Vector{<:Tuple}
Return a list of all dimension paths in the KeyedDataset
. Optionally, you can filter the results using a Pattern
.
Example
julia> using AxisKeys; using AxisSets: KeyedDataset, dimpaths;
julia> ds = KeyedDataset(
:val1 => KeyedArray(rand(4, 3, 2); time=1:4, loc=-1:-1:-3, obj=[:a, :b]),
:val2 => KeyedArray(rand(4, 3, 2) .+ 1.0; time=1:4, loc=-1:-1:-3, obj=[:a, :b]),
);
julia> dimpaths(ds)
6-element Vector{Tuple{Symbol, Symbol}}:
(:val1, :time)
(:val1, :loc)
(:val1, :obj)
(:val2, :time)
(:val2, :loc)
(:val2, :obj)
AxisSets.flatten
— Functionflatten(collection, [delim])
Flatten a collection of nested associative types into a flat collection of pairs.
Example
julia> using AxisSets: flatten
julia> data = (
val1 = (a1 = 1, a2 = 2),
val2 = (b1 = 11, b2 = 22),
val3 = [111, 222],
val4 = 4.3,
);
julia> flatten(data, :_)
(val1_a1 = 1, val1_a2 = 2, val2_b1 = 11, val2_b2 = 22, val3 = [111, 222], val4 = 4.3)
flatten(A, dims, [delim])
Flatten a KeyedArray
along the specified consecutive dimensions. The dims
argument can either be a Tuple
of symbols or a Pair{Tuple, Symbol}
if you'd like to specify the desired flattened dimension name.
Example
julia> using AxisKeys, Dates, NamedDims; using AxisSets: flatten
julia> A = KeyedArray(
reshape(1:24, (4, 3, 2));
time=DateTime(2021, 1, 1, 11):Hour(1):DateTime(2021, 1, 1, 14),
obj=[:a, :b, :c],
loc=[1, 2],
);
julia> dimnames(flatten(A, (:obj, :loc), :_))
(:time, :obj_loc)
AxisSets.rekey
— Methodrekey(f, ds, dim)
Apply function f
to key values of each matching dim
in the KeyedDataset
. dim
can either be a Symbol
or a Pattern
for the dimension paths.
Example
julia> using AxisKeys; using AxisSets: KeyedDataset, rekey;
julia> ds = KeyedDataset(
:a => KeyedArray(zeros(3); time=1:3),
:b => KeyedArray(ones(3, 2); time=1:3, loc=[:x, :y]),
);
julia> r = rekey(k -> k .+ 1, ds, :time);
julia> r.time
3-element ReadOnlyArrays.ReadOnlyArray{Int64, 1, UnitRange{Int64}}:
2
3
4
AxisSets.validate
— Methodvalidate(ds, [constraint])
Validate that all constrained dimension paths within a KeyedDataset
have matching key values. Optionally, you can test an explicit constraint Pattern
.
Returns
true
if an error isn't thrown
Throws
ArgumentError
: If the constraints are not respected
Base.getindex
— Methodgetindex(ds::KeyedDataset, key)
Lookup KeyedDataset
component by its Tuple
key, or Symbol
for keys of depth 1. Shared axis keys for the returned KeyedArray
are wrapped in a ReadOnlyArray
for safety.
Example
```jldoctest julia> using AxisKeys; using AxisSets: KeyedDataset;
julia> ds = KeyedDataset( :val1 => KeyedArray(zeros(3, 2); time=1:3, obj=[:a, :b]), :val2 => KeyedArray(ones(3, 2) .+ 1.0; time=1:3, loc=[:x, :y]), );
julia> ds[:val1] 2-dimensional KeyedArray(NamedDimsArray(...)) with keys: ↓ time ∈ 3-element ReadOnlyArrays.ReadOnlyArray{Int64,...} → obj ∈ 2-element ReadOnlyArrays.ReadOnlyArray{Symbol,...} And data, 3×2 Array{Float64,2}: (:a) (:b) (1) 0.0 0.0 (2) 0.0 0.0 (3) 0.0 0.0
Base.getproperty
— Methodgetproperty(ds::KeyedDataset, sym::Symbol)
Extract KeyedDataset
fields, dimension keys or components in that order. Shared axis keys are wrapped in a ReadOnlyArray
for safety.
Example
julia> using AxisKeys; using AxisSets: KeyedDataset;
julia> ds = KeyedDataset(
:val1 => KeyedArray(zeros(3, 2); time=1:3, obj=[:a, :b]),
:val2 => KeyedArray(ones(3, 2) .+ 1.0; time=1:3, loc=[:x, :y]),
);
julia> collect(keys(ds.data))
2-element Vector{Tuple}:
(:val1,)
(:val2,)
julia> ds.time
3-element ReadOnlyArrays.ReadOnlyArray{Int64, 1, UnitRange{Int64}}:
1
2
3
julia> dimnames(ds.val1)
(:time, :obj)
Base.map
— Methodmap(f, ds, [key]) -> KeyedDataset
Apply function f
to each component of the KeyedDataset
. Returns a new dataset with the same constraints, but new components. The function can be applied to a subselection of components via a Pattern
key
.
Example
julia> using AxisKeys, Statistics; using AxisSets: KeyedDataset, flatten;
julia> ds = KeyedDataset(
flatten([
:g1 => [
:a => KeyedArray(zeros(3); time=1:3),
:b => KeyedArray(ones(3, 2); time=1:3, loc=[:x, :y]),
],
:g2 => [
:a => KeyedArray(ones(3); time=1:3),
:b => KeyedArray(zeros(3, 2); time=1:3, loc=[:x, :y]),
]
])...
);
julia> r = map(a -> a .+ 100, ds, (:__, :a, :_)); # The extra `:_` is to clarify that we don't care about the dimnames.
julia> [k => mean(v) for (k, v) in r.data] # KeyedArray printing isn't consistent in jldoctests
4-element Vector{Pair{Tuple{Symbol, Symbol}, Float64}}:
(:g1, :a) => 100.0
(:g1, :b) => 1.0
(:g2, :a) => 101.0
(:g2, :b) => 0.0
Base.mapslices
— Methodmapslices(f, ds, [key]; dims) -> KeyedDataset
Apply the mapslices
call to each of the desired components and returns a new KeyedDataset
. Selection Pattern
s may be provided via key
, but components are selected by the desired dims
by default.
Example
julia> using AxisKeys, Statistics; using AxisSets: KeyedDataset;
julia> ds = KeyedDataset(
:val1 => KeyedArray(zeros(3, 2); time=1:3, obj=[:a, :b]),
:val2 => KeyedArray(ones(3, 2); time=1:3, loc=[:x, :y]),
);
julia> r = mapslices(sum, ds; dims=:time); # KeyedArray printing isn't consistent in jldoctests
julia> [k => parent(parent(v)) for (k, v) in r.data]
2-element Vector{Pair{Tuple{Symbol}, Matrix{Float64}}}:
(:val1,) => [0.0 0.0]
(:val2,) => [3.0 3.0]
Base.merge
— Methodmerge(ds::KeyedDataset, others::KeyedDataset...)
Combine the constraints and data from multiple KeyedDataset
s.
Example
julia> using AxisKeys; using AxisSets: KeyedDataset;
julia> ds1 = KeyedDataset(
:a => KeyedArray(zeros(3); time=1:3),
:b => KeyedArray(ones(3, 2); time=1:3, loc=[:x, :y]),
);
julia> ds2 = KeyedDataset(
:c => KeyedArray(ones(3); time=1:3),
:d => KeyedArray(zeros(3, 2); time=1:3, loc=[:x, :y]),
);
julia> collect(keys(merge(ds1, ds2).data))
4-element Vector{Tuple}:
(:a,)
(:b,)
(:c,)
(:d,)
Base.setindex!
— Methodsetindex!(ds::KeyedDataset{T}, val, key) -> T
Store the new val
in the KeyedDataset
. If any new dimension names don't any existing constraints then Pattern(:__, <dimname>)
is used by default. If the axis values of the new val
doesn't meet the existing constraints in the dataset then an error will be throw.
Example
julia> using AxisKeys; using AxisSets: KeyedDataset, constraintmap;
julia> ds = KeyedDataset(:a => KeyedArray(zeros(3); time=1:3));
julia> ds[:b] = KeyedArray(ones(3, 2); time=1:3, lag=[-1, -2]);
julia> collect(constraintmap(ds))
2-element Vector{Pair{AxisSets.Pattern, Set{Tuple}}}:
Pattern((:__, :time)) => Set([(:b, :time), (:a, :time)])
Pattern((:__, :lag)) => Set([(:b, :lag)])
julia> ds[:c] = KeyedArray(ones(3, 2); time=2:4, lag=[-1, -2])
ERROR: KeyAlignmentError: Misaligned dimension keys on constraint Pattern((:__, :time))
Tuple[(:b, :time), (:a, :time)] ∈ 3-element UnitRange{Int64}
Tuple[(:c, :time)] ∈ 3-element UnitRange{Int64}
FeatureTransforms.apply
— FunctionFeatureTransforms.apply(ds::KeyedDataset, t::Transform, [key]; dims=:, kwargs...)
Apply the Transform
to each component of the KeyedDataset
. Returns a new dataset with the same constraints, but transformed components.
The transform can be applied to a subselection of components via a Pattern
key
. Otherwise, components are selected by the desired dims
.
Keyword arguments including dims
are passed to the appropriate FeatureTransforms
method for a component.
Example
julia> using AxisKeys, FeatureTransforms; using AxisSets: KeyedDataset, Pattern, flatten;
julia> ds = KeyedDataset(
flatten([
:train => [
:load => KeyedArray([7.0 7.7; 8.0 8.2; 9.0 9.9]; time=1:3, loc=[:x, :y]),
:price => KeyedArray([-2.0 4.0; 3.0 2.0; -1.0 -1.0]; time=1:3, id=[:a, :b]),
],
:predict => [
:load => KeyedArray([7.0 7.7; 8.1 7.9; 9.0 9.9]; time=1:3, loc=[:x, :y]),
:price => KeyedArray([0.5 -1.0; -5.0 -2.0; 0.0 1.0]; time=1:3, id=[:a, :b]),
]
])...
);
julia> p = Power(2);
julia> r = FeatureTransforms.apply(ds, p, (:_, :price, :_));
julia> [k => parent(parent(v)) for (k, v) in r.data]
4-element Vector{Pair{Tuple{Symbol, Symbol}, Matrix{Float64}}}:
(:train, :load) => [7.0 7.7; 8.0 8.2; 9.0 9.9]
(:train, :price) => [4.0 16.0; 9.0 4.0; 1.0 1.0]
(:predict, :load) => [7.0 7.7; 8.1 7.9; 9.0 9.9]
(:predict, :price) => [0.25 1.0; 25.0 4.0; 0.0 1.0]
Impute.apply
— MethodImpute.apply(ds::KeyedDataset, imp::DeclareMissings)
Declare missing values across all components in the KeyedDataset
.
Impute.apply
— MethodImpute.apply(ds, filter; dims)
Filter out missing data along the dims
for each component in the KeyedDataset
with that dimension.
Example
julia> using AxisKeys, Impute; using AxisSets: KeyedDataset, Pattern, flatten;
julia> ds = KeyedDataset(
flatten([
:train => [
:temp => KeyedArray([1.0 1.1; missing 2.2; 3.0 3.3]; time=1:3, id=[:a, :b]),
:load => KeyedArray([7.0 7.7; 8.0 missing; 9.0 9.9]; time=1:3, loc=[:x, :y]),
],
:predict => [
:temp => KeyedArray([1.0 missing; 2.0 2.2; 3.0 3.3]; time=1:3, id=[:a, :b]),
:load => KeyedArray([7.0 7.7; 8.1 missing; 9.0 9.9]; time=1:3, loc=[:x, :y]),
]
])...
);
julia> [k => parent(parent(v)) for (k, v) in Impute.filter(ds; dims=:time).data] # KeyedArray printing isn't consistent in jldoctests
4-element Vector{Pair{Tuple{Symbol, Symbol}, Matrix{Union{Missing, Float64}}}}:
(:train, :temp) => [3.0 3.3]
(:train, :load) => [9.0 9.9]
(:predict, :temp) => [3.0 3.3]
(:predict, :load) => [9.0 9.9]
julia> [k => parent(parent(v)) for (k, v) in Impute.filter(ds; dims=Pattern(:train, :__, :time)).data]
4-element Vector{Pair{Tuple{Symbol, Symbol}, Matrix{Union{Missing, Float64}}}}:
(:train, :temp) => [1.0 1.1; 3.0 3.3]
(:train, :load) => [7.0 7.7; 9.0 9.9]
(:predict, :temp) => [1.0 missing; 3.0 3.3]
(:predict, :load) => [7.0 7.7; 9.0 9.9]
julia> [k => parent(parent(v)) for (k, v) in Impute.filter(ds; dims=:loc).data]
4-element Vector{Pair{Tuple{Symbol, Symbol}, Matrix{Union{Missing, Float64}}}}:
(:train, :temp) => [1.0 1.1; missing 2.2; 3.0 3.3]
(:train, :load) => [7.0; 8.0; 9.0]
(:predict, :temp) => [1.0 missing; 2.0 2.2; 3.0 3.3]
(:predict, :load) => [7.0; 8.1; 9.0]
Impute.impute
— MethodImpute.impute(ds, imp; dims)
Apply the imputation algorithm imp
along the dims
for all components of the KeyedDataset
with that dimension.
Example
julia> using AxisKeys, Impute; using AxisSets: KeyedDataset, flatten;
julia> ds = KeyedDataset(
flatten([
:train => [
:temp => KeyedArray([1.0 1.1; missing 2.2; 3.0 3.3]; time=1:3, id=[:a, :b]),
:load => KeyedArray([7.0 7.7; 8.0 missing; 9.0 9.9]; time=1:3, loc=[:x, :y]),
],
:predict => [
:temp => KeyedArray([1.0 missing; 2.0 2.2; 3.0 3.3]; time=1:3, id=[:a, :b]),
:load => KeyedArray([7.0 7.7; 8.1 missing; 9.0 9.9]; time=1:3, loc=[:x, :y]),
]
])...
);
julia> [k => parent(parent(v)) for (k, v) in Impute.substitute(ds; dims=:time).data] # KeyedArray printing isn't consistent in jldoctests
4-element Vector{Pair{Tuple{Symbol, Symbol}, Matrix{Union{Missing, Float64}}}}:
(:train, :temp) => [1.0 1.1; 2.2 2.2; 3.0 3.3]
(:train, :load) => [7.0 7.7; 8.0 8.0; 9.0 9.9]
(:predict, :temp) => [1.0 1.0; 2.0 2.2; 3.0 3.3]
(:predict, :load) => [7.0 7.7; 8.1 8.1; 9.0 9.9]
julia> [k => parent(parent(v)) for (k, v) in Impute.substitute(ds; dims=:loc).data]
4-element Vector{Pair{Tuple{Symbol, Symbol}, Matrix{Union{Missing, Float64}}}}:
(:train, :temp) => [1.0 1.1; missing 2.2; 3.0 3.3]
(:train, :load) => [7.0 7.7; 8.0 8.8; 9.0 9.9]
(:predict, :temp) => [1.0 missing; 2.0 2.2; 3.0 3.3]
(:predict, :load) => [7.0 7.7; 8.1 8.8; 9.0 9.9]
Impute.validate
— MethodImpute.validate(ds::KeyedDataset, validator::Validator; dims=:)
Apply the validator to components in the KeyedDataset
with the specified dims
.
NamedDims.dimnames
— Methoddimnames(ds)
Returns a list of the unique dimension names within the KeyedDataset
.
Example
julia> using AxisKeys; using NamedDims; using AxisSets: KeyedDataset;
julia> ds = KeyedDataset(
:val1 => KeyedArray(rand(4, 3, 2); time=1:4, loc=-1:-1:-3, obj=[:a, :b]),
:val2 => KeyedArray(rand(4, 3, 2) .+ 1.0; time=1:4, loc=-1:-1:-3, obj=[:a, :b]),
);
julia> dimnames(ds)
3-element Vector{Symbol}:
:time
:loc
:obj