The "transform interface” is a mechanism that allows sequences of
Transforms to be combined (with other steps) into end-to-end feature engineering pipelines.
This is supported by the return of a
Transform having the same type as the input. This type consistency helps to make
Transforms composable, i.e., the output of one is always a valid input to another, which allows users to "stack" sequences of
Transforms together with minimal glue code needed to keep it working.
Morever, the end-to-end pipelines themselves should obey the same principle: you should be able to add or remove
Transforms (or another pipeline) to the output without breaking your code. That is, the output should also be a valid "transformable" type: either an
Table, or other type for which the user has extended
FeatureTransforms.apply to support. Valid types can be checked by calling
is_transformable, which is the first part of the transform interface. See the FeatureTransforms.TestUtils for this and other testing utiliies.
The second part is the
transform method stub, which users should overload when they want to "encapsulate" an end-to-end pipeline. The exact method for doing so is an implementation detail for the user but refer to the code below as an example. The only requirement of the transform API is that the return of the implemented
transform method is itself "transformable". That is, it should satisfy
is_transformable by defining the required
This is a trivial example of a feature engineering pipeline. In practice, there may be other steps involved, such as checking for missing data or logging, which are omitted for clarity. An advantage of the transform API is that the output can be readily integrated into another transform pipeline downstream. For example, if
MyModel were being stacked with the result of a previous model.
function FeatureTransforms.transform(data) # Define the Transforms we will apply p = Power(0.123) lc = LinearCombination([0.1, 0.9]) ohe = OneHotEncoding(["type1", "type2", "type3"]) features = deepcopy(data) FeatureTransforms.apply!(features, p; cols=[:a], header=[:a]) features = FeatureTransforms.apply_append(features, lc; cols=[:a, :b], header=[:ab]) features = FeatureTransforms.apply_append(features, ohe; cols=:types, header=[:type1, :type2, :type3]) end # this could be any table-type, including a DataFrame input = (a=rand(5), b=rand(5), types=["type1", "type2", "type1", "type1", "type1"]); output = FeatureTransforms.transform(input); # verify the output is transformable is_transformable(output) && print("output is transformable") # output output is transformable