There are several preprocessors applied to the original data before you run the experiment. This converts your data to a machine learning friendly format and improves the final model's performance.
You can find preprocessor options in Preprocessors tab of Configuration panel.
This preprocessor handles missing values in the dataset. Depending on parameter options it either drops columns with a lot of missing values or tries to fill them using iterative imputing regression. For very large datasets it replaces missing values with mean for the sake of performance. For categorical data missing value is encoded as unknown category.
Date and time features
This preprocessor extracts features from the columns with Datatype marked as Datetime. This includes: absolute time, day of year, weekday, month day, month, hour and minute.
This preprocessor does One-hot Encoding for columns with Datatype marked as Categorical. It creates at most Max Categorical Columns features per each Categorical column. If number of unique categories exceeds that value, all but most frequent categories are encoded using Label Encoder.
This preprocessor eliminates features with small variance. This effectively reduces data size without much information loss. You can change Variance Threshold value to make this preprocessor more or less aggressive.
This preprocessor tries to squeeze sparse features without loosing much data. It applies Principal Component Analysis over columns with a lot of identical values. It replaces sparse features with (Fraction of Embedding for Sparse * number of sparse columns) columns.
This preprocessor converts all features to range from 0 to 1. This solves the problem when some features have different magnitude, that is after scaling all features are in the same range.