Getting Started API
Import Classes
First I will import the auger classes I will be using
from auger.api.project import Project
from auger.api.dataset import DataSet
from auger.api.experiment import Experiment
from auger.api.model import Model
from auger.api.utils.context import Context
Create Project
Next I will create a new project
ctx = Context()
project_name = 'irisproject'
project = Project(ctx, project_name).create()
Create Dataset
I can upload a datset to the project I just created using a local path or a remote url. Below I am uploading a local file iris_data.csv located in my root directory of my notebook.
ctx = Context()
project = Project(ctx, project_name)
dataset = DataSet(ctx, project).create('iris_data.csv')
print(dataset.name)
Setup an Experiment
To start an experiment I will need to create one through the API as well as modify my auger.yaml file with some basic experiment settings.
Edit Experiment Config
It is important to set your target and model_type.
model_type: classification
# Target column name
target: class
# List of columns to be excluded from the training data
# Experiment
experiment:
# Experiment name (if not specified, will be set on the first run)
name: MyFirstEx
# Time series feature. If Data Source contains more then one DATETIME feature
# you will have to explicitly specify feature to use as time series
time_series:
# List of columns which should be used as label encoded features
label_encoded: []
# Number of folds used for k-folds validation of individual trial
cross_validation_folds: 3
# Maximum time to run experiment in minutes
max_total_time: 60
# Maximum time to run individual trial in minutes
max_eval_time: 1
# Maximum trials to run to complete experiment
max_n_trials: 50
# Try to improve model performance by creating ensembles from the trial models
use_ensemble: true
### Metric used to build Model
# Score used to optimize ML model.
metric: accuracy
Start Experiment
experiment_name, session_id = Experiment(ctx, dataset).start()
Once an experiment is running I can check for results periodically until it finishes.
import time
leaderboard, status = Experiment(ctx, dataset, experiment_name).leaderboard()
while status != 'complete' or status != 'error':
print(status)
leaderboard, status = Experiment(ctx, dataset, experiment_name).leaderboard()
print(leaderboard)
time.sleep(3)
output:
started
[{'model id': '7D80802FB01B402', 'accuracy': '0.3000', 'algorithm': 'BaselineClassifier'}]
started
[{'model id': 'EB5ACBE1FDF145C', 'accuracy': '0.9600', 'algorithm': 'ExtraTreesClassifier'}, {'model id': '72EF9AC817E74B4', 'accuracy': '0.9600', 'algorithm': 'XGBClassifier'}, {'model id': '1DBD63E03369499', 'accuracy': '0.9600', 'algorithm': 'ExtraTreesClassifier'}, {'model id': 'BDC2B68E49AF446', 'accuracy': '0.9533', 'algorithm': 'CatBoostClassifier'}, {'model id': '030621B32F6748D', 'accuracy': '0.9533', 'algorithm': 'CatBoostClassifier'}, {'model id': '1D2D9708B6C94E9', 'accuracy': '0.9533', 'algorithm': 'LGBMClassifier'}, {'model id': 'B94D42F1EEBC42C', 'accuracy': '0.9533', 'algorithm': 'SVC'}, {'model id': '5568580C250A476', 'accuracy': '0.9467', 'algorithm': 'GradientBoostingClassifier'}, {'model id': '1052C92E3D7D4EE', 'accuracy': '0.9467', 'algorithm': 'LGBMClassifier'}, {'model id': '559B0883462A475', 'accuracy': '0.9467', 'algorithm': 'SVC'}, {'model id': '8470AC1D177D4E5', 'accuracy': '0.9467', 'algorithm': 'XGBClassifier'}, {'model id': 'D45F498E0E2642B', 'accuracy': '0.9467', 'algorithm': 'RandomForestClassifier'}, {'model id': '256469BADC0B4BB', 'accuracy': '0.9467', 'algorithm': 'GradientBoostingClassifier'}, {'model id': 'F665DA87A5584EB', 'accuracy': '0.9333', 'algorithm': 'DecisionTreeClassifier'}, {'model id': 'DCA09DC10ABA448', 'accuracy': '0.9333', 'algorithm': 'RandomForestClassifier'}, {'model id': 'A35EBBFA9B0543C', 'accuracy': '0.9333', 'algorithm': 'DecisionTreeClassifier'}, {'model id': 'D7EE5A9241D6492', 'accuracy': '0.9133', 'algorithm': 'AdaBoostClassifier'}, {'model id': 'C060512BD843460', 'accuracy': '0.9133', 'algorithm': 'AdaBoostClassifier'}, {'model id': '7D80802FB01B402', 'accuracy': '0.3000', 'algorithm': 'BaselineClassifier'}]
completed
Wow! .96 percent accuracy is great! Also notice that 'completed.' is now displayed at the bottom of my leaderboard.
I'm now ready to deploy my model to get inferences!
Deploy Locally
I can deploy to a hosted endpoint or download the model to get inferences locally. Notice the model guids displayed in the printed leaderboard. I will choose to deploy my most accurate model EB5ACBE1FDF145C
model_id = '27BB3C43A8DC4A6'
ctx = Context()
project = Project(ctx, project_name)
# deploys model locally
Model(ctx, project).deploy(model_id, True)
If you haven't deployed a model before it might take a little bit of time to download the auger docker image used to get predictions. This only happens for local model deplyment.
Getting Inferences
Next I will get inferences for a new file called get_predictions.csv that only contains feature values. The output will be a newly created file called get_predictions_predicted.csv that is stored in the same location.
ctx = Context()
project = Project(ctx, project_name)
# predict on the locally deployed model
Model(ctx, project).predict('get_predictions.csv', model_id, None, True)
# result will be stored in get_predictions_predicted.csv
Checking my file I can see that values for my target class have been added.