Transform Interface

The "transform interface” is a mechanism that allows sequences of Transforms to be combined (with other steps) into end-to-end feature engineering pipelines.

This is supported by the return of a Transform having the same type as the input. This type consistency helps to make Transforms composable, i.e., the output of one is always a valid input to another, which allows users to "stack" sequences of Transforms together with minimal glue code needed to keep it working.

Morever, the end-to-end pipelines themselves should obey the same principle: you should be able to add or remove Transforms (or another pipeline) to the output without breaking your code. That is, the output should also be a valid "transformable" type: either an AbstractArray, a Table, or other type for which the user has extended FeatureTransforms.apply to support. Valid types can be checked by calling is_transformable, which is the first part of the transform interface. See the FeatureTransforms.TestUtils for this and other testing utiliies.

The second part is the transform method stub, which users should overload when they want to "encapsulate" an end-to-end pipeline. The exact method for doing so is an implementation detail for the user but refer to the code below as an example. The only requirement of the transform API is that the return of the implemented transform method is itself "transformable". That is, it should satisfy is_transformable by defining the required FeatureTransforms.apply method(s).


This is a trivial example of a feature engineering pipeline. In practice, there may be other steps involved, such as checking for missing data or logging, which are omitted for clarity. An advantage of the transform API is that the output can be readily integrated into another transform pipeline downstream. For example, if MyModel were being stacked with the result of a previous model.

function FeatureTransforms.transform(data)
    # Define the Transforms we will apply
    p = Power(0.123)
    lc = LinearCombination([0.1, 0.9])
    ohe = OneHotEncoding(["type1", "type2", "type3"])

    features = deepcopy(data)
    FeatureTransforms.apply!(features, p; cols=[:a], header=[:a])
    features = FeatureTransforms.apply_append(features, lc; cols=[:a, :b], header=[:ab])
    features = FeatureTransforms.apply_append(features, ohe; cols=:types, header=[:type1, :type2, :type3])

# this could be any table-type, including a DataFrame
input = (a=rand(5), b=rand(5), types=["type1", "type2", "type1", "type1", "type1"]);

output = FeatureTransforms.transform(input);

# verify the output is transformable
is_transformable(output) && print("output is transformable")

# output

output is transformable