Usage¶
Command Line Interface¶
ML Launchpad’s command line interface is usually only used when developing and
preparing a machine learning application. When actually
running the API in production, a WSGI server (e.g. Gunicorn
or Waitress) is used to run mllaunchpad.wsgi:application
instead
(the config file is then provided via an environment variable).
All commands (train
, retest
, predict
, api
and generate-raml
) can
be abbreviated, so you can use e.g. mllaunchpad t
or mllaunchpad pred
to save
some keystrokes.
mllaunchpad¶
Train, test or run a config file’s model.
mllaunchpad [OPTIONS] COMMAND [ARGS]...
Options
- --version¶
Show the version and exit.
- -v, --verbose¶
Print debug messages.
- -c, --config <config>¶
Use this configuration file. [default: look for env var LAUNCHPAD_CFG or ./LAUNCHPAD_CFG.yml]
- -l, --log-config <log_config>¶
Use this log configuration file. [default: look for env var LAUNCHPAD_LOG or ./LAUNCHPAD_LOG.yml]
api¶
Run API server in unsafe debug mode.
mllaunchpad api [OPTIONS]
generate-raml¶
Generate and print RAML template from DATASOURCE_NAME.
The datasource named DATASOURCE_NAME in the config will be used to create the API’s query parameters (from columns), types, and examples.
mllaunchpad generate-raml [OPTIONS] DATASOURCE_NAME
Arguments
- DATASOURCE_NAME¶
Required argument
predict¶
Run prediction on features from JSON file ( - for stdin).
mllaunchpad predict [OPTIONS] [JSON_FILE]
Arguments
- JSON_FILE¶
Optional argument
retest¶
Retest existing model, update metrics.
mllaunchpad retest [OPTIONS]
train¶
Run training, store created model and metrics.
mllaunchpad train [OPTIONS]
Environment variables¶
- LAUNCHPAD_CFG¶
(Optional) path to configuration file
- LAUNCHPAD_LOG¶
(Optional) path to logging configuration file
Configuration¶
See separate page Configuration.
What about support for R, Spark, <other technology>?¶
ML Launchpad is designed to be as technology-agnostic and flexible as possible. For machine learning technologies, this means that it does not care whether you use it with SciKit-Learn, PyTorch, spaCy, etc. Just import the Python packages you need and enjoy. See the tutorial in the next section for an example using SciKit-Learn.
For interfacing with the outside world (getting data, etc.), we created interfaces
for extending this functionality. The most common kinds are already supported out of
the box. For getting and persisting data, look into inheriting
DataSources and DataSinks. For providing your model results in
other ways as the provided WSGI API (events, Azure functions, etc),
look into the mllaunchpad API (particularly get_validated_config()
and predict()
).
That said, we already accumulated some partial or complete solutions, and the one you need might already be there:
Oracle, Impala, Hive, etc. support is covered by
SqlDataSource
). It uses SQLAlchemy, which adds a lot of flexibility to the datasource configuration. Please see theSqlDataSource docs
for more information. There also are some special classes like OracleDataSource and, in the examples, ImpalaDataSource, but those were made before SqlDataSource, and we suggest trying SqlDataSource first.R support works by using and adapting the
r_example*
files in the examples directory (experimental). You leaver_model.py
as is, configure it as themodel:module:
, where you also configuremodel:r_file
andmodel:r_dependencies
with your script and R requirements. You will have to have R installed, as well as the Python packagerpy2[pandas]
.Spark support is available through the
spark_datasource.py
module in the examples (experimental). Copy it into your project and include it in your config using theplugins:
directive. Its detailed use is documented in the module itself.Containerization is straightforward to do – build an image that exposes the ML Launchpad REST API:
# Example Dockerfile ARG PYTHON=3.7 FROM python:${PYTHON}-slim-buster as mllp RUN apt-get update && apt-get install -y --no-install-recommends \ build-essential \ vim \ unixodbc-dev \ unixodbc \ libpq-dev \ && apt-get clean \ && apt-get autoremove -y \ && rm -rf /var/lib/apt/lists/* WORKDIR /var/www/mllp/app COPY . . # In your project, be selective in what you put into the image. RUN pip install -r requirements.txt RUN pip install gunicorn RUN python -m mllaunchpad -c my_config.yml train # If not pre-trained earlier. EXPOSE 5000 CMD gunicorn --workers 4 --bind 0.0.0.0:5000 mllaunchpad.wsgiAzure/Firebase/AWS lambda functions for prediction can be easily created using the mllaunchpad API:
import json import azure.functions as func import mllaunchpad # see https://mllaunchpad.readthedocs.io/en/stable/mllaunchpad.html conf = mllaunchpad.get_validated_config("my_cfg_file_or_stream_or_url.yml") # None=use LAUNCHPAD_CFG env var def main(req: func.HttpRequest) -> func.HttpResponse: # (you need to validate params yourself here, skipped in this example) result = mllaunchpad.predict(conf, arg_dict=req.params) return func.HttpResponse(json.dumps(result), mimetype="application/json")For any other technology, there’s a good chance that you can tackle it with one of these mechanisms (extending DataSources/DataSinks or through the API). If you are unsure, please create an issue.
Tutorial¶
This tutorial will guide you through using ML Launchpad to publish a small machine learning project as a Web API.
Let’s assume that you have developed a Python script called tree_script.py
which contains the code to train, test and apply your model from Python:
my_project/
iris_train.csv
iris_holdout.csv
tree_script.py
Contents of tree_script.py
:
import sys
import pandas as pd
from sklearn import tree
from sklearn.metrics import accuracy_score, confusion_matrix
def train():
df = pd.read_csv('iris_train.csv')
X = df.drop('variety', axis=1)
y = df['variety']
model = tree.DecisionTreeClassifier()
model.fit(X, y)
return model
def test(model):
df = pd.read_csv('iris_holdout.csv')
X_test = df.drop('variety', axis=1)
y_test = df['variety']
y_predict = model.predict(X_test)
acc = accuracy_score(y_test, y_predict)
conf = confusion_matrix(y_test, y_predict).tolist()
metrics = {'accuracy': acc, 'confusion_matrix': conf}
return metrics
def predict(model, args_dict):
# Create DF explicitly. No guarantee that dict keys are in correct order,
# so we have to make sure *manually* that they match the column order we used
# when training the model:
X = pd.DataFrame({
'sepal.length': [args_dict['sepal.length']],
'sepal.width': [args_dict['sepal.width']],
'petal.length': [args_dict['petal.length']],
'petal.width': [args_dict['petal.width']]
})
y = model.predict(X)[0]
return {'prediction': y}
if __name__ == '__main__':
args = dict(zip([n for n in sys.argv[1::2]], [float(v) for v in sys.argv[2::2]]))
my_model = train()
print('metrics:', test(my_model))
pred = predict(my_model, args)
print('prediction result:', pred)
# Example:
# $ python tree_script.py sepal.length 3 sepal.width 2.7 petal.length 4.5 petal.width 3.5
# metrics: {'accuracy': 0.95, 'confusion_matrix': [[6, 0, 0], [0, 7, 0], [0, 1, 6]]}
# prediction result: {'prediction': 'Virginica'}
This script can be called from the command line and guesses the variety of iris from some physical measurements provided as command line arguments. It somewhat wastefully trains a new model every time it is called, and does not check the validity of the arguments at all. Besides making the model available as a Web API, ML Launchpad will also solve these two problems.
To use ML Launchpad, install it first using:
$ pip install mllaunchpad
Now, we’ll create a new Python file called tree_model.py
in which we will fill in the
blanks:
my_project/
iris_train.csv
iris_holdout.csv
tree_script.py
tree_model.py
The file tree_model.py
looks like this at first:
from mllaunchpad import ModelInterface, ModelMakerInterface
from sklearn.metrics import accuracy_score, confusion_matrix
from sklearn import tree
import pandas as pd
import logging
logger = logging.getLogger(__name__)
class MyTreeModelMaker(ModelMakerInterface):
"""Creates a Iris prediction model"""
def create_trained_model(self, model_conf, data_sources, data_sinks, old_model=None):
...
return model
def test_trained_model(self, model_conf, data_sources, data_sinks, model):
...
return metrics
class MyTreeModel(ModelInterface):
"""Uses the created Iris prediction model"""
def predict(self, model_conf, data_sources, data_sinks, model, args_dict):
...
return output
You can find a template like this in ML Launchpad’s examples
(download the examples
,
or copy-paste from TEMPLATE_model.py
on GitHub).
The three methods
create_trained_model()
,
test_trained_model()
and predict()
correspond to the three functions in our script above.
We can essentially copy and paste the contents of our three functions into
those, but we will need to change some details to make the code work with
ML Launchpad.
Here, we’ll make use of the method arguments data_sources
and model
.
See model_interface
for details on all available
arguments.
If we call our training DataSource petals
and our test
DataSource petals_test
, our completed tree_model.py
looks
like this (we highlight changed code with #comments
):
from mllaunchpad import ModelInterface, ModelMakerInterface, order_columns
from sklearn.metrics import accuracy_score, confusion_matrix
from sklearn import tree
import pandas as pd
import logging
logger = logging.getLogger(__name__)
class MyTreeModelMaker(ModelMakerInterface):
"""Creates a Iris prediction model"""
def create_trained_model(self, model_conf, data_sources, data_sinks, old_model=None):
# use data_source instead of reading CSV ourselves:
df_unordered = data_sources['petals'].get_dataframe()
df = order_columns(df_unordered) # make col order reproducible for API use
X = df.drop('variety', axis=1)
y = df['variety']
model = tree.DecisionTreeClassifier()
model.fit(X, y)
return model
def test_trained_model(self, model_conf, data_sources, data_sinks, model):
# use data_source instead of reading CSV ourselves:
df_unordered = data_sources['petals_test'].get_dataframe()
df = order_columns(df_unordered) # make col order reproducible for API use
X_test = df.drop('variety', axis=1)
y_test = df['variety']
y_predict = model.predict(X_test)
acc = accuracy_score(y_test, y_predict)
conf = confusion_matrix(y_test, y_predict).tolist()
metrics = {'accuracy': acc, 'confusion_matrix': conf}
return metrics
class MyTreeModel(ModelInterface):
"""Uses the created Iris prediction model"""
def predict(self, model_conf, data_sources, data_sinks, model, args_dict):
# No changes required, but instead of this clumsy construct here...
# X = pd.DataFrame({
# 'sepal.length': [args_dict['sepal.length']],
# 'sepal.width': [args_dict['sepal.width']],
# 'petal.length': [args_dict['petal.length']],
# 'petal.width': [args_dict['petal.width']]
# })
# ... we can use this much shorter method thanks to using
# order_columns earlier, guaranteeing deterministic column ordering:
X = order_columns(pd.DataFrame(args_dict, index=[0]))
y = model.predict(X)[0]
return {'prediction': y}
So we are now getting our data from the data_source
arguments
instead of directly from csv
files, and we get our model
object passed as an argument, same as before.
The three methods return the same things as our own functions:
create_trained_model()
returns a trained model object (can be pretty much anything),test_trained_model()
returns adict
with metrics (can also containlists
, numpy arrays or pandas DataFrames), andpredict()
returns a prediction (usually adict
, but can also containlists
, numpy arrays or pandas DataFrames).
Sidenote: To save additional information while training for traceability’s sake,
use mllaunchpad.report()
in your train and test code.
The metadata thus saved resides in the model store together with the model. By default, it includes basic
info such as the configuration (see below), some system info, and the test metrics.
When done with training, you can retrieve metadata of all models in the model
store from Python by using mllaunchpad.list_models()
.
Next, we will configure some extra info about our model,
as well as tell ML Launchpad where to find
the petal
and petal_test
DataSources.
Create a file called tree_cfg.yml
:
my_project/
iris_train.csv
iris_holdout.csv
tree_model.py
tree_cfg.yml
(We’re done with our original tree_script.py
so I’ve removed it)
Contents of tree_cfg.yml
:
datasources:
petals:
type: csv
path: ./iris_train.csv # The string can also be a URL. Valid URL schemes include http, ftp, s3, and file.
expires: 0 # -1: never (=cached forever), 0: immediately (=no caching), >0: time in seconds.
options: {}
tags: train
petals_test:
type: csv
path: ./iris_holdout.csv
expires: 3600
options: {}
tags: test
model_store:
location: ./model_store # Just in current directory for now
model:
name: TreeModel
version: '0.0.1' # use semantic versioning (<breaking>.<adding>.<fix>), first segment will be used in API url as e.g. .../v1/...
module: tree_model # same as file name without .py
train_options: {}
predict_options: {}
api:
name: iris # name of the service api
raml: tree.raml
preload_datasources: False # Load datasources into memory before any predictions. Only makes sense with caching.
Here, we define our datasources
so ML Launchpad knows where to find the
data we refer to from our model. Besides csv
files,
other types of DataSources are supported, and
extending DataSources is also possible.
(see DataSources and DataSinks for more information on supported
builtin DataSources
).
The model_store
is just a directory where all trained models will
be stored together with their metrics.
The model
section gives our model a name and version which will be
used to uniquely identify it when saving/loading. Here, we also
provide the importable name of our tree_model.py
, which is just
tree_model
. If it were in a package (directory) called something
,
we would write something.tree_model
instead.
It’s a good idea to make sure our model is in Pythons path (sys.path
or PYTHONPATH
) so it can be found when ML Launchpad wants to import it.
The api
section provides details on the Web API we want to publish.
This section is maybe surprisingly empty. The reason is that the API
definition is off-loaded into a RESTful API Markup Language (RAML) file.
You can genereate a RAML file using the command line tool that has been installed when you installed ML Launchpad:
$ mllaunchpad --config tree_cfg.yml generate-raml petals >tree.raml
This creates the API definition file tree.raml
using the columns
and their types in the petals
datasource for defining parameters.
We still need to adapt this file a little because it also lists
our target variable variety
as an input parameter, which we don’t
want, so we edit the file and remove these lines:
variety:
displayName: Friendly Name of variety
type: string
description: Description of what variety really is
example: 'Versicolor'
required: true
This is the only change which is necessary from a technical standpoint.
Feel free to read the RAML file and improve the template descriptions
there, correct mythings
to something that makes sense, like
varieties
, adapt the output format to what you want to use, and so on.
Our model is done! Let’s try it out.
$ mllaunchpad --config tree_cfg.yml train
Now we have a trained model in our model_store
. Let’s run a test Web API
(only for debug purposes, see here for running production APIs):
$ mllaunchpad --config tree_cfg.yml api
We can find a test URL in our generated tree.raml
. Just remove
the &variety=...
part, and open the link
http://127.0.0.1:5000/iris/v0/mythings?sepal.length=5.6&sepal.width=2.7&petal.length=4.2&petal.width=1.3
e.g. in Chrome. You can see the result of our model’s prediction
immediately:
{
"prediction": "Versicolor"
}
Automatic input validation is included for free. Try changing the URL to provide a string value instead of a number, or remove one of the parameters, and you get a message explaining what is wrong.
What we have now is what is called RESTful API. Web APIs like this are easy to use by other systems or web sites to include your model’s predictions in their functionality.
Here’s a quick hacked-together HTML page which makes the predictions available to an end user:
<!DOCTYPE html>
<html><body>
<h2>Iris Tree Demo</h2>
<p>
Sepal Width: <input id="sl" type="range" min="0.1" max="7" step="0.1"><br>
Sepal Length: <input id="sw" type="range" min="0.1" max="7" step="0.1"><br>
Petal Length: <input id="pl" type="range" min="0.1" max="7" step="0.1"><br>
Petal Width: <input id="pw" type="range" min="0.1" max="7" step="0.1"><br>
</p>
<p id="output"></p>
<script>
function predict() {
let sl = document.querySelector('#sl').value;
let sw = document.querySelector('#sw').value;
let pl = document.querySelector('#pl').value;
let pw = document.querySelector('#pw').value;
fetch(`http://127.0.0.1:5000/iris/v0/mythings?sepal.length=${sl}&sepal.width=${sw}&petal.length=${pl}&petal.width=${pw}`)
.then(function(response) {
console.log(response);
return response.json();
})
.then(function(myJson) {
console.log(myJson);
document.querySelector('#output').innerHTML =
`This is an example of the ${myJson.iris_variety} variety`;
});
}
let inputs = document.querySelectorAll('input');
for (let input of inputs) {
input.addEventListener('change', predict, false);
}
</script>
</body></html>
If you put prototype HTML interfaces like this in a static
subfolder, then
they will be accessible at e.g. http://127.0.0.1:5000/static/tree.html.
Keep in mind that this is only for demo/debug usage, not for production. The
position of the static
subfolder is governed by the api:root_path
key
(with a default value of .
) in your config file.
You can find this and other examples here
(download).
To run the tree
example from this tutorial:
$ cd examples
$ mllaunchpad --config tree_cfg.yml train
$ mllaunchpad --config tree_cfg.yml api
Then open http://127.0.0.1:5000/static/tree.html in your browser.
To learn more, have a look at the examples provided in mllaunchpad’s GitHub repository (examples as zip file).