API
Transforms
Abstract Transform Types
FeatureTransforms.Transform
— TypeTransform
Abstract supertype for all feature Transforms.
FeatureTransforms.AbstractScaling
— TypeAbstractScaling <: Transform
Linearly scale the data according to some statistics.
Implemented Transforms
FeatureTransforms.HoD
— TypeHoD <: Transform
Get the hour of day corresponding to the data.
FeatureTransforms.Power
— TypePower(exponent) <: Transform
Raise the data by the given exponent
.
FeatureTransforms.Periodic
— TypePeriodic{P, S}(f, period::P, [phase_shift::S]) <: Transform
Applies a periodic function f
with provided period
and phase_shift
to the data.
The period
and phase_shift
must have the same supertype of Real
or Period
, depending on whether the data is Real
or TimeType
respectively.
For TimeType
data, the result will change depending on the type of period
given, even if the same amount of time is described. Example: Week(1)
vs Second(Week(1))
; the former starts the period on the most recent Monday, while the latter starts the period on the most recent multiple of 604800 seconds since time 0.
Fields
f::Union{typeof(cos), typeof(sin)}
: the periodic functionperiod::Union{Real, Period}
: the function period. Must be strictly positive.phase_shift::Union{Real, Period}
(optional): adjusts the phase of the periodic function, measured in the same units as the input. Increasing the value translates the function to the right, toward higher/later input values.
FeatureTransforms.StandardScaling
— TypeStandardScaling <: AbstractScaling
Transforms the data according to
x -> (x - μ) / σ
where μ and σ are the mean and standard deviation of the training data.
fit!(scaling, data)
needs to be called before the transform can be apply
ed. By default all the data is considered when fit!
ing the mean and standard deviation.
FeatureTransforms.IdentityScaling
— TypeIdentityScaling <: AbstractScaling
Represents the no-op scaling which simply returns the data
it is applied on.
FeatureTransforms.InverseHyperbolicSine
— TypeInverseHyperbolicSine <: Transform
Logarithmically transform the data through: log(x + √(x² + 1)).
This is the inverse hyperbolic sine.
FeatureTransforms.LinearCombination
— TypeLinearCombination(coefficients) <: Transform
Calculates the linear combination of a collection of terms weighted by some coefficients
.
When applied to an N-dimensional array, LinearCombination
reduces along the dim
provided and returns an (N-1)-dimensional array.
If no inds
are specified, then the transform is applied to all elements.
!!!note The current default is that dims=1
but this behaviour will be deprecated in a future release and the dims
keyword argument will have to be specified explicitly. https://github.com/invenia/FeatureTransforms.jl/issues/82
FeatureTransforms.LogTransform
— TypeLogTransform <: Transform
Logarithmically transform the data through: sign(x) * log(|x| + 1).
This allows transformations of all real numbers, not just positive ones.
FeatureTransforms.OneHotEncoding
— TypeOneHotEncoding{R<:Real} <: Transform
One-hot encode the categorical value for each target element.
Construct a n-by-p binary matrix, given a Vector
of target data x
(of length n) and a Vector
of all unique possible values in x (of length p).
The element [i, j] is true
if the i^th target in x
corresponds to the j^th possible value and false
otherwise. Note that R
can be specified to determine the return type of results. It defaults to a Matrix
of Bool
s.
Note that this Transform does not support specifying dims other than :
(all dims) because it is a one-to-many transform (for example a Vector
input produces a Matrix
output).
Note that OneHotEncoding
needs to be first encoded with the expected categories before it can be used. This is because the data might be missing certain categories which will lead to incomplete classification.
Applying Transforms
FeatureTransforms.apply
— Functionapply(data::T, ::Transform; kwargs...)
Applies the Transform
to the data. New transforms should usually only extend _apply
which this method delegates to.
Where necessary, this should be extended for new data types T
.
FeatureTransforms.apply!
— Functionapply!(data::T, ::Transform; kwargs...) -> T
Applies the Transform
mutating the input data
. This method delegates to apply
under the hood so does not need to be defined separately.
If Transform
does not support mutation, this method will error.
FeatureTransforms.apply_append
— Functionapply_append(A::AbstractArray, ::Transform; append_dim, kwargs...)
Applies the Transform
to A
and returns the result in a new array where the output is appended to A
along the append_dim
dimension. The remaining kwargs
correspond to the usual Transform
being invoked.
apply_append(table, ::Transform; [header], kwargs...)
Applies the Transform
to the table
and appends the result in a new table with an optional header
. If none is provided the default in Tables.table
is used. The remaining kwargs
correspond to the Transform
being invoked.
Transform Interface
FeatureTransforms.is_transformable
— Functionis_transformable(x)
Determine if x
is both a valid input and output of any Transform
, i.e. that it has an apply
method defined and therefore follows the transform
interface.
FeatureTransforms.transform!
— Functiontransform!(::T, data)
Mutating version of transform
.
FeatureTransforms.transform
— Functiontransform(::T, data)
Defines the feature engineering pipeline for some type T
, which comprises a collection of Transform
s and other steps to be peformed on the data
.
The idea around a "transform interface” is to make feature transformations composable, i.e. the output of any one Transform
should be valid input to another.
Feature engineering pipelines should obey the same principle and it should be trivial to add/remove Transform
steps that compose the pipeline without it breaking.
transform
should be overloaded for custom types T
that require feature engineering. The only requirement is that the return of transform
is itself "transformable", i.e. calling is_transformable
on the output returns true.
Deprecated funtionality
FeatureTransforms.MeanStdScaling
— TypeMeanStdScaling(μ, σ) <: AbstractScaling
Linearly scale the data by the statistical mean μ
and standard deviation σ
. This is also known as standardization, or the Z score transform.
Keyword arguments to apply
inverse=true
: inverts the scaling (e.g. to reconstruct the unscaled data).eps=1e-3
: used in place of all 0 values inσ
before scaling (ifinverse=false
).