API
Transforms
Abstract Transform Types
FeatureTransforms.Transform — TypeTransformAbstract supertype for all feature Transforms.
FeatureTransforms.AbstractScaling — TypeAbstractScaling <: TransformLinearly scale the data according to some statistics.
Implemented Transforms
FeatureTransforms.HoD — TypeHoD <: TransformGet the hour of day corresponding to the data.
FeatureTransforms.Power — TypePower(exponent) <: TransformRaise the data by the given exponent.
FeatureTransforms.Periodic — TypePeriodic{P, S}(f, period::P, [phase_shift::S]) <: TransformApplies a periodic function f with provided period and phase_shift to the data.
The period and phase_shift must have the same supertype of Real or Period, depending on whether the data is Real or TimeType respectively.
For TimeType data, the result will change depending on the type of period given, even if the same amount of time is described. Example: Week(1) vs Second(Week(1)); the former starts the period on the most recent Monday, while the latter starts the period on the most recent multiple of 604800 seconds since time 0.
Fields
f::Union{typeof(cos), typeof(sin)}: the periodic functionperiod::Union{Real, Period}: the function period. Must be strictly positive.phase_shift::Union{Real, Period}(optional): adjusts the phase of the periodic function, measured in the same units as the input. Increasing the value translates the function to the right, toward higher/later input values.
FeatureTransforms.StandardScaling — TypeStandardScaling <: AbstractScalingTransforms the data according to
x -> (x - μ) / σwhere μ and σ are the mean and standard deviation of the training data.
fit!(scaling, data) needs to be called before the transform can be applyed. By default all the data is considered when fit!ing the mean and standard deviation.
FeatureTransforms.IdentityScaling — TypeIdentityScaling <: AbstractScalingRepresents the no-op scaling which simply returns the data it is applied on.
FeatureTransforms.InverseHyperbolicSine — TypeInverseHyperbolicSine <: TransformLogarithmically transform the data through: log(x + √(x² + 1)).
This is the inverse hyperbolic sine.
FeatureTransforms.LinearCombination — TypeLinearCombination(coefficients) <: TransformCalculates the linear combination of a collection of terms weighted by some coefficients.
When applied to an N-dimensional array, LinearCombination reduces along the dim provided and returns an (N-1)-dimensional array.
If no inds are specified, then the transform is applied to all elements.
!!!note The current default is that dims=1 but this behaviour will be deprecated in a future release and the dims keyword argument will have to be specified explicitly. https://github.com/invenia/FeatureTransforms.jl/issues/82
FeatureTransforms.LogTransform — TypeLogTransform <: TransformLogarithmically transform the data through: sign(x) * log(|x| + 1).
This allows transformations of all real numbers, not just positive ones.
FeatureTransforms.OneHotEncoding — TypeOneHotEncoding{R<:Real} <: TransformOne-hot encode the categorical value for each target element.
Construct a n-by-p binary matrix, given a Vector of target data x (of length n) and a Vector of all unique possible values in x (of length p).
The element [i, j] is true if the i^th target in x corresponds to the j^th possible value and false otherwise. Note that Rcan be specified to determine the return type of results. It defaults to a Matrix of Bools.
Note that this Transform does not support specifying dims other than : (all dims) because it is a one-to-many transform (for example a Vector input produces a Matrix output).
Note that OneHotEncoding needs to be first encoded with the expected categories before it can be used. This is because the data might be missing certain categories which will lead to incomplete classification.
Applying Transforms
FeatureTransforms.apply — Functionapply(data::T, ::Transform; kwargs...)Applies the Transform to the data. New transforms should usually only extend _apply which this method delegates to.
Where necessary, this should be extended for new data types T.
FeatureTransforms.apply! — Functionapply!(data::T, ::Transform; kwargs...) -> TApplies the Transform mutating the input data. This method delegates to apply under the hood so does not need to be defined separately.
If Transform does not support mutation, this method will error.
FeatureTransforms.apply_append — Functionapply_append(A::AbstractArray, ::Transform; append_dim, kwargs...)Applies the Transform to A and returns the result in a new array where the output is appended to A along the append_dim dimension. The remaining kwargs correspond to the usual Transform being invoked.
apply_append(table, ::Transform; [header], kwargs...)Applies the Transform to the table and appends the result in a new table with an optional header. If none is provided the default in Tables.table is used. The remaining kwargs correspond to the Transform being invoked.
Transform Interface
FeatureTransforms.is_transformable — Functionis_transformable(x)Determine if x is both a valid input and output of any Transform, i.e. that it has an apply method defined and therefore follows the transform interface.
FeatureTransforms.transform! — Functiontransform!(::T, data)Mutating version of transform.
FeatureTransforms.transform — Functiontransform(::T, data)Defines the feature engineering pipeline for some type T, which comprises a collection of Transforms and other steps to be peformed on the data.
The idea around a "transform interface” is to make feature transformations composable, i.e. the output of any one Transform should be valid input to another.
Feature engineering pipelines should obey the same principle and it should be trivial to add/remove Transform steps that compose the pipeline without it breaking.
transform should be overloaded for custom types T that require feature engineering. The only requirement is that the return of transformis itself "transformable", i.e. calling is_transformable on the output returns true.
Deprecated funtionality
FeatureTransforms.MeanStdScaling — TypeMeanStdScaling(μ, σ) <: AbstractScalingLinearly scale the data by the statistical mean μ and standard deviation σ. This is also known as standardization, or the Z score transform.
Keyword arguments to apply
inverse=true: inverts the scaling (e.g. to reconstruct the unscaled data).eps=1e-3: used in place of all 0 values inσbefore scaling (ifinverse=false).