There are several preprocessors applied to the original data before you run the experiment. This converts your data to a machine learning friendly format and improves the final model's performance.
You can find preprocessor options in Advanced tab of Configuration panel.
This preprocessor handles missing values in the dataset. Depending on parameter options it either drops rows/columns with NaN values or tries to fill them with mean value. For categorical data missing value is encoded as unknown category.
Date and time features
This preprocessor extracts features from the columns with Datatype marked as Datetime. This includes: absolute time, day of year, weekday, month day, month, hour and minute.
This preprocessor does One-hot Encoding for columns with Datatype marked as Categorical. It creates at most Max Categorical Columns features per each Categorical column. If number of unique categories exceeds that value, all but most frequent categories are encoded with integers.
This preprocessor eliminates features with small variance. This effectively reduces data size without much information loss. You can change Variance Threshold value to make this preprocessor more or less aggressive.
This preprocessor tries to squeeze sparse features without loosing much data. It applies Principal Component Analysis over columns with a lot of identical values. It replaces sparse features with (Fraction of Embedding for Sparse * number of sparse columns) columns.
This preprocessor converts range of all features from 0 to 1. This solves the problem when some features have different magnitude, that is after scaling all features are in the same range.