.. highlight:: shell ============================================================================== Configuration ============================================================================== The configuration file is the glue that holds your ML-Launchpad-based application together. It links the things on the "inside", that is, your model's implementation, to the things on the "outside", such as the data connection (:doc:`datasources`), as well as the API configuration. **Sidenote**: You can use this to your advantage when developing and testing your machine learning algorithm by using different configuration files for different purposes of your development life cycle. That way, you can cleanly separate different environments like development/testing/production, without having to touch your code (using the same build) when switching between these environments. For ML Launchpad to know how to do its job, it *always* needs a configuration. To accommodate different ways of using ML Launchpad, you have different options of providing the configuration. From most common to least common: * Provide the path to the config file on the command line (``--config`` or ``-c`` option). * Set the environment variable ``LAUNCHPAD_CFG`` to the path of the config file. * Put a config file named ``LAUNCHPAD_CFG.yml`` in the current working directory. * Call :meth:`~mllaunchpad.get_validated_config` with the path of the config file to get a config ``dict`` and provide it as an argument when calling :meth:`~mllaunchpad.predict`, :meth:`~mllaunchpad.retest` and/or :meth:`~mllaunchpad.train_model`. ===================================== ======================================= Way of providing config When to use ===================================== ======================================= ``mllaunchpad --config `` when developing; in some cases training ``LAUNCHPAD_CFG`` environment var developing; when deployed (e.g. in production) ``./LAUNCHPAD_CFG.yml`` deployed in code when using ``mllaunchpad`` functionality from another app instead of WSGI, e.g. as an Azure function ===================================== ======================================= **Note**: Besides ``LAUNCHPAD_CFG``, there is also the ``LAUNCHPAD_LOG`` environment variable, which, if provided, will be used as the `logging configuration file `_. .. _config_file: Config File ------------------------------------------------------------------------------ The configuration file is written in `YAML `_ (.yml) format (used internally as a Python ``dict``). It uses UTF-8 encoding. Here's an example configuration with comments: .. code-block:: yaml plugins: # Optionally specify any additional imports (only external DataSources/-Sinks for now, cf. ``DataSources``) - bogusdatasource - records_datasource datasources: # This section is optional. Places to get data from, and how. petals: # Name by which you want to refer to the datasource, e.g. using ``data_sources["petals"]``/ # The properties ``type``, ``expires``, ``options`` and ``tags`` are present # in all types of datasources/datasinks. # All other properties are specific to the datasource type. type: csv # Generic; the type of the datasource path: ./iris_train.csv # Can also be a URL. Valid URL schemes: http, ftp, s3, and file expires: 0 # -1: never (=cached forever), 0: immediately (=no caching), >0: time in seconds cache_size: 10 # Optional: maximum number of different results to keep in memory, default=32 options: {} # Special kwargs to pass to the datasource's implementation tags: train # String or list of strings. Valid are "train", "test" and/or "predict". petals_test: type: csv path: ./iris_holdout.csv expires: 3600 options: {} tags: test # You can define as many datasources and datasinks as you like. # The tags "train", "test" and/or "predict" will determine which datasources/datasink # will be provided to which functions in your model implementation. # Any combination of tags with datasources/datasinks is valid. # datasinks: # This section is optional. Places to put data. NOT needed for prediction outputs, unless you require batch output, special file formats, etc. # The configuration structure of datasinks is equivalent to that of datasources. model_store: # Required. Where your model and metadata is persisted. location: ./model_store # Directory on file system (local or remote). model: # Required. Details about your model's implementation. name: TreeModel version: '0.0.1' # Use semantic versioning (..), first segment will be used in API url as e.g. .../v1/... module: tree_model # Main module of your functionality. Same as source code file name without .py # Put custom properties for your implementation here. # For example, to configure NLP-related aspects of your model (language, etc.), # to perform fewer iterations for testing purposes, etc. # It is not recommended to put low-level hyperparameters here. api: # Optional. Details about your API. The API will start with //v/ # If you don't specify the api property, you cannot use mllaunchpad's WSGI API. # You would eschew mllaunchpad's WSGI API if you want to make it available as # part of another service framework, e.g. AWS Lambda or Azure Functions. name: iris # Name of the service API raml: tree.raml # Path to the API's RAML definition (see two sections below) preload_datasources: False # Load datasources into memory before any predictions. Only makes sense with caching (expires != 0). Details on how to configure specific types of ``DataSources`` and ``DataSinks`` can be found on the page :doc:`datasources`. .. _plugins: Plugins ------------------------------------------------------------------------------ In your :ref:`config_file`, you can optionally use a top-level ``plugins:`` key to specify (a list of) modules that should be imported by ML Launchpad (currently only used while initializing the :doc:`datasources`). If any of these plugins are in conflict with other plugins or built-ins, the last-imported one has precedence over the previous ones. For example, if several :doc:`DataSource ` plugins offer to serve the same type (e.g. ``csv``), the last one in the ``plugins:`` list will be chosen as the designated ``csv`` handler, overruling both the built-in :class:`~mllaunchpad.datasources.FileDataSource` as well as any other ``csv``-serving DataSources listed before the one in question. RAML API Definition ------------------------------------------------------------------------------ The API will be prefixed with ``//v/`` from your configuration file (``/iris/v0/`` in above example). How the API actually looks beyond that is governed by your RAML file. The `RAML specification language `_ has been chosen as the way to specify the API in a way that is compatible with common tools (such as MuleSoft). Other languages do exist, and :doc:`contributions to support them are welcome `. The RAML is the contract between you and you service API's clients. How to write a valid RAML is beyond the scope of this documentation. But to help you starting out, there are various `examples `_, and you can generate a basic :ref:`queryparams`-based RAML using :ref:`mllaunchpad generate-raml `. ML Launchpad understands a subset of RAML in order to automatically create APIs for the (currently) three most common use cases (please note that they support GET as well as POST): .. _queryparams: Query Parameters ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ These are named parameters with a value. E.g. in our "iris" example, in an API call that looks like ``/iris/v0/varieties?sepal.width=3&sepal.length=1.3[...]`` these would be ``sepal.width``, ``sepal.length`` etc., each with one value: .. code-block:: yaml /varieties: # the resource name that comes after /iris/v0 get: # can also be post description: Get a prediction for the variety of iris flower based on measurements of physical petal and sepal dimensions queryParameters: sepal.length: displayName: Sepal Length type: number description: Measured length of iris flower sepals (flower leaves) example: 3.14 required: false # test, should be true minimum: 0 repeat: false # set to true to get param's list of values in your args_dict sepal.width: displayName: Sepal Width type: number description: Measured width of iris flower sepals (flower leaves) example: 3.14 required: false # test, should be true minimum: 0 # ... The ``displayName``, ``type``, ``required``, ``example``, and ``minimum``/``maximum`` properties are used by ML Launchpad for validation and logging. The optional ``repeat`` property turns the parameter into an array if you need it to support multiple values (RAML 0.8 standard). As a half-baked implementation of RAML 1.0 arrays, you can alternatively specify the type with brackets (``number[]``, `string[]`` etc). Use ``enum`` to specify a list of allowed values (for categorical data). Other RAML properties are ignored. Your model's :meth:`~mllaunchpad.ModelInterface.predict` method will get passed an ``args_dict`` with a key for each query parameter, by which you can access the values. Query parameters may be combined with :ref:`urlparams` (see `tree example `_). **Sidenote**: While the technology that ML Launchpad uses under the hood also supports requests with arbitrary JSON bodies which might work with ML Launchpad to provide more complex values, this is at this point in time not officially supported. .. _urlparams: URL Parameters ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ A string in your APIs URL, e.g. ``/iris/v0/varieties/12``, which usually identifies one record in the set of resources. Example RAML: .. code-block:: yaml /varieties: /{my_url_param_name}: # parameter name to use get: # post also possible queryParameters: # Optional, just to demonstrate that this can be used in conjunction with query parameters. hallo: description: some demo query parameter in addition to the uri param type: string required: true enum: ['metric', 'imperial'] # ... The ``args_dict`` passed to your model's :meth:`~mllaunchpad.ModelInterface.predict` method will contain the value under whatever name you gave it (here: "my_url_param_name"), in addition to any other query parameters. URL Parameters may be combined with :ref:`queryparams` (see `tree example `_). Files ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Handling files (using ``multipart/form-data``) is also possible. Example RAML: .. code-block:: yaml /topics: post: description: Upload a PDF file to predict the topic for. body: multipart/form-data: formParameters: text: displayName: Optional alternative text of a client message type: string description: The plain text of a clients's letter, email, etc (uncleaned) required: false properties: file: description: The PDF file containing the client message, to be uploaded required: false type: file fileTypes: ["application/pdf"] # ... The ``args_dict`` passed to your model's :meth:`~mllaunchpad.ModelInterface.predict` method will contain a parameter named "file" with a FileStorage_ object. You can get its file name using ``args_dict["file"].filename`` and access its contents using ``args_dict["file"].stream``. See the FileStorage_ documentation for more details. As can be seen in the example, a file can be combined with :ref:`queryparams`. But it cannot currently be combined with :ref:`urlparams` in ML Launchpad. .. _FileStorage: https://werkzeug.palletsprojects.com/en/1.0.x/datastructures/#werkzeug.datastructures.FileStorage