Configuring runs and functions#

MLRun orchestrates serverless functions over Kubernetes. You can specify the resource requirements (CPU, memory, GPUs), preferences, and pod priorities in the logical function object. You can also configure how MLRun prevents stuck pods. All of these are used during the function deployment.

Configuring runs and functions is relevant for all supported cloud platforms.

In this section

Environment variables#

Environment variables can be added individually, from a Python dictionary, or a file:

# Single variable
fn.set_env(name="MY_ENV", value="MY_VAL")

# Multiple variables
fn.set_envs(env_vars={"MY_ENV" : "MY_VAL", "SECOND_ENV" : "SECOND_VAL"})

# Multiple variables from file
fn.set_envs(file_path="env.txt")

Replicas#

Some runtimes can scale horizontally, configured either as a number of replicas:

training_function = mlrun.code_to_function("training.py", name="training", handler="train", 
                                           kind="mpijob", image="mlrun/mlrun-gpu")
training_function.spec.replicas = 2

or a range (for auto-scaling in Dask or Nuclio):

# set range for # of replicas with replicas and max_replicas
dask_cluster.spec.min_replicas = 1
dask_cluster.spec.max_replicas = 4

Note

If a target utilization (Target CPU%) value is set, the replication controller calculates the utilization value as a percentage of the equivalent resource request (CPU request) on the replicas and based on that provides horizontal scaling. See also Kubernetes horizontal autoscale.

See more details in Dask, MPIJob and Horovod, Spark, Nuclio.

CPU, GPU, and memory — requests and limits for user jobs#

Requests and limits define how much the memory, CPU, and GPU, the pod must have to be able to start to work, and its maximum allowed consumption. MLRun and Nuclio functions run in their own pods. The default CPU and memory limits for these pods are defined by their respective services. You can change the limits when creating a job, or a function. It is best practice to define this for each MLRun function.

See more details in the Kubernetes documentation: Resource Management for Pods and Containers.

SDK configuration#

Examples of with_requests() and with_limits():

training_function = mlrun.code_to_function("training.py", name="training", handler="train", 
                                           kind="mpijob", image="mlrun/mlrun-gpu")
training_function.with_requests(mem="1G", cpu=1) #lower bound
training_function.with_limits(mem="2G", cpu=2, gpus=1) #upper bound

Note

When specifying GPUs, MLRun uses nvidia.com/gpu as default GPU type. To use a different type of GPU, specify it using the optional gpu_type parameter.

UI configuration#

Note

Relevant when MLRun is executed in the Iguazio platform.

Configure requests and limits in the service's Common Parameters tab and in the Configuration tab of the function.

Number of workers and GPUs#

For each Nuclio or serving function, MLRun creates an HTTP trigger with the default of 1 worker. When using GPU in remote functions you must ensure that the number of GPUs is equal to the number of workers (or manage the GPU consumption within your code). You can set the number of GPUs for each pod using the MLRun SDK.

You can change the number of workers after you create the trigger (function object), then you need to redeploy the function. Examples of changing the number of workers:

using mlrun.runtimes.RemoteRuntime.with_http():
serve.with_http(workers=8, worker_timeout=10)

using mlrun.runtimes.RemoteRuntime.add_v3io_stream_trigger():
serve.add_v3io_stream_trigger(stream_path='v3io:///projects/myproj/stream1', maxWorkers=3,name='stream', group='serving', seek_to='earliest', shards=1)

Volumes#

When you create a pod in an MLRun job or Nuclio function, the pod by default has access to a file-system which is ephemeral, and gets deleted when the pod completes its execution. In many cases, a job requires access to files residing on external storage, or to files containing configurations and secrets exposed through Kubernetes config-maps or secrets. Pods can be configured to consume the following types of volumes, and to mount them as local files in the local pod file-system:

  • V3IO containers: when running on the Iguazio system, pods have access to the underlying V3IO shared storage. This option mounts a V3IO container or a subpath within it to the pod through the V3IO FUSE driver.

  • PVC: Mount a Kubernetes persistent volume claim (PVC) to the pod. The persistent volume and the claim need to be configured beforehand.

  • Config Map: Mount a Kubernetes Config Map as local files to the pod.

  • Secret: Mount a Kubernetes secret as local files to the pod.

For each of the options, a name needs to be assigned to the volume, as well as a local path to mount the volume at (using a Kubernetes Volume Mount). Depending on the type of the volume, other configuration options may be needed, such as an access-key needed for V3IO volume.

See more about Kubernetes Volumes.

MLRun supports the concept of volume auto-mount, which automatically mounts the most commonly used type of volume to all pods, unless disabled. See more about MLRun auto mount.

SDK configuration#

Configure volumes attached to a function by using the apply function modifier on the function.

For example, using v3io storage:

# import the training function from the Function Hub (hub://)
train = mlrun.import_function('hub://sklearn_classifier')# Import the function:
open_archive_function = mlrun.import_function("hub://open_archive")

# use mount_v3io() for iguazio volumes
open_archive_function.apply(mount_v3io())

You can specify a list of the v3io path to use and how they map inside the container (using volume_mounts). For example:

mlrun.mount_v3io(name='data',access_key='XYZ123..',volume_mounts=[mlrun.VolumeMount("/data", "projects/proj1/data")])

See full details in mount_v3io().

Alternatively, using a PVC volume:

mount_pvc(pvc_name="data-claim", volume_name="data", volume_mount_path="/data")

See full details in mount_pvc().

UI configuration#

Note

Relevant when MLRun is executed in the Iguazio platform.

You can configure Volumes when creating a job, rerunning an existing job, and creating an ML function. Modify the Volumes for an ML function by pressing ML functions, then ../_images/kebab-menu.png of the function, Edit | Resources | Volumes drop-down list.

Select the volume mount type: either Auto (using auto-mount), Manual or None. If selecting Manual, fill in the details in the volumes list for each volume to mount to the pod. Multiple volumes can be configured for a single pod.

Preemption mode: Spot vs. On-demand nodes#

When running ML functions you might want to control whether to run on spot nodes or on-demand nodes. Preemption mode controls whether pods can be scheduled on preemptible (spot) nodes. Preemption mode is supported for all functions.

Preemption mode uses Kubernetes Taints and Tolerations to enforce the mode selected.

Why preemption mode?#

On-demand instances provide full control over the instance lifecycle. You decide when to launch, stop, hibernate, start, reboot, or terminate it. With Spot instances, you request capacity from specific availability zones, though it is susceptible to spot capacity availability. This is a good choice if you can be flexible about when your applications run and if your applications can be interrupted.

Here are some questions to consider when choosing the type of node:

  • Is the function mission critical and must be operational at all times?

  • Is the function a stateful function or stateless function?

  • Can the function recover from unexpected failure?

  • Is this a job that should run only when there are available inexpensive resources?

Important

When an MLRun job is running on a spot node and it fails, it won't get back up again. However, if Nuclio goes down due to a spot issue, it is brought up by Kubernetes.

Kubernetes has a few methods for configuring which nodes to run on. To get a deeper understanding, see Pod Priority and Preemption. Also, you must understand the configuration of the spot nodes as specified by the cloud provider.

Stateless and Stateful Applications#

When deploying your MLRun jobs to specific nodes, take into consideration that on-demand nodes are designed to run stateful applications while spot nodes are designed for stateless applications. MLRun jobs are more stateful by nature. An MLRun job that is assigned to run on a spot node might be subject to interruption; it would have to be designed so that the job/function state will be saved when scaling to zero.

Supported preemption modes#

Preemption mode has these values:

  • allow: The function pod can run on a spot node if one is available.

  • constrain: The function pod only runs on spot nodes, and does not run if none is available.

  • prevent: Default. The function pod cannot run on a spot node.

  • none: No preemptible configuration is applied to the function

To change the default function preemption mode, it is required to override the api configuration (and specifically "MLRUN_FUNCTION_DEFAULTS__PREEMPTION_MODE" envvar to either one of the above modes).

SDK configuration#

Configure preemption mode by adding the with_preemption_mode() parameter in your Jupyter notebook, specifying a mode from the list of values above.
This example illustrates a function that cannot be scheduled on preemptible nodes:

# Can be scheduled on a preemptible (spot) node
fn. with_preemption_mode("allow")

And another function that can only be scheduled on preemptible noodes:

import mlrun
import os

train_fn = mlrun.code_to_function('training', 
                            kind='job', 
                            handler='my_training_function') 
train_fn.with_preemption_mode(mode="prevent") 
train_fn.run(inputs={"dataset": my_data})

See {py:meth}`~KubeResource.with_preemption_mode.

Alternatively, you can specify the preemption using with_priority_class and with_node_selection parameters. This example specifies that the pod/function runs only on non-preemptible nodes:

import mlrun
import os
train_fn = mlrun.code_to_function('training', 
                            kind='job', 
                            handler='my_training_function') 
train_fn.with_preemption_mode(mode="prevent") 
train_fn.run(inputs={"dataset" :my_data})

fn.with_priority_class(name="default-priority")
fn.with_node_selection(node_selector={"app.iguazio.com/lifecycle":"non-preemptible"})

See:

UI configuration#

Note

Relevant when MLRun is executed in the Iguazio platform.

You can configure Spot node support when creating a job, rerunning an existing job, and creating an ML function. The Run on Spot nodes drop-down list is in the Resources section of jobs. Configure the Spot node support for individual Nuclio functions when creating a function in the Configuration tab, under Resources.

Pod priority for user jobs#

Priority classes are a mechanism in Kubernetes to control the order in which pods are scheduled and evicted — to make room for other, higher priority pods. Priorities also affect the pods’ evictions in case the node’s memory is pressured (called Node-pressure Eviction).

Pod priority is relevant for all of the jobs created by MLRun. For Nuclio it applies to the pods of the Nuclio-created functions.

Pod priority is specified through Priority classes, which map to a priority value. Use these to view the priority classes and the default:

  • fn.list_valid_priority_class_names()

  • fn.get_default_priority_class_name()

SDK configuration#

Configure pod priority by adding the priority class parameter in your Jupyter notebook.
For example:

import mlrun
import os
train_fn = mlrun.code_to_function('training', 
                            kind='job', 
                            handler='my_training_function') 
train_fn.with_priority_class(name={value})
train_fn.run(inputs={"dataset" :my_data})
 

See with_priority_class().

UI configuration#

Note

Relevant when MLRun is executed in the Iguazio platform.

Configure the default priority for a service, which is applied to the service itself or to all subsequently created user-jobs in the service's Common Parameters tab, User jobs defaults section, Priority class drop-down list.

Modify the priority for an ML function by pressing ML functions, then ../_images/kebab-menu.png of the function, Edit | Resources | Pods Priority drop-down list.

Node selection#

Node selection can be used to specify where to run workloads (e.g. specific node groups, instance types, etc.). This is a more advanced parameter mainly used in production deployments to isolate platform services from workloads.

SDK configuration#

Configure node selection by adding the key:value pairs in your Jupyter notebook formatted as a Python dictionary.
For example:

# Only run on non-spot instances
fn.with_node_selection(node_selector={"app.iguazio.com/lifecycle" : "non-preemptible"})

See with_node_selection().

UI configuration#

Note

Relevant when MLRun is executed in the Iguazio platform.

Configure node selection for individual MLRun jobs when creating a Batch run: go to your project, press Create New and select Batch run. When you get to the Resources tab, add Key:Value pair(s). Configure the node selection for individual Nuclio functions when creating a function in the Confguration tab, under Resources, by adding Key:Value pairs.

Scaling and auto-scaling#

Scaling behavior can be added to real-time and distributed runtimes including nuclio, serving, spark, dask, and mpijob. In environments where node auto-scaling is available, auto-scaling is triggered in situations where pods cannot be scheduled to any existing node due to lack of resources. In situations where pod requests for CPU/Memory are low, auto-scaling may not be triggered since pods could still be placed on existing nodes (per their low requests), even though in practice they do not have the needed resources as they near their (much higher) limits and might be in danger of eviction due to OOM situations.

Auto-scaling works best when jobs are created with limit=request. In this situation, once resources are not sufficient, new jobs are not scheduled to any existing node, and new nodes are automatically added to accommodate them.

Auto-scaling is a node-group configuration.

Mounting persistent storage#

In some instances, you might need to mount a file-system to your container to persist data. This can be done with native K8s PVC's or the V3IO data layer for Iguazio clusters. See Attach storage to functions for more information on the storage options.

# Mount persistent storage - V3IO
fn.apply(mlrun.mount_v3io())

# Mount persistent storage - PVC
fn.apply(mlrun.platforms.mount_pvc(pvc_name="data-claim", volume_name="data", volume_mount_path="/data"))

Preventing stuck pods#

The runtimes spec has four "state_threshold" attributes that can determine when to abort a run. Once a threshold is passed and the run is in the matching state, the API monitoring aborts the run, deletes its resources, sets the run state to aborted, and issues a "status_text" message.

The four states and their default thresholds are:

'pending_scheduled': '1h', #Scheduled and pending and therefore consumes resources
'pending_not_scheduled': '-1', #Scheduled but not pending, can continue to wait for resources
'image_pull_backoff': '1h', #Container running in a pod fails to pull the required image from a container registry
'running': '24h' #Job is running  

The thresholds are time strings constructed of value and scale pairs (e.g. "30 minutes 5h 1day"). To configure to infinity, use -1.

To change the state thresholds, use:

func.set_state_thresholds({"pending_not_scheduled": "1 min"}) 

For just the run, use:

func.run(state_thresholds={"running": "1 min", "image_pull_backoff": "1 minute and 30s"}) 

See set_state_thresholds()

Note

State thresholds are not supported for Nuclio/serving runtimes (since they have their own monitoring) or for the Dask runtime (which can be monitored by the client).