VLXI-Cookiecutter-Research

MLflow Quick Reference

Authors
Kevin T. Chu <kevin@velexi.com>


Preliminaries

MLflow Tracking is the only component of MLflow that is needed for general research projects. The other components of MLflow may be useful for projects involving data science and machine learning models.


Table of Contents

  1. Using MLflow Tracking within a Jupyter Notebook

  2. Viewing MLflow Tracking Results


1. Using MLflow Tracking within a Jupyter Notebook

MLflow Tracking facilitates support for recording experiment configuration parameters and results. Below is a short set of instructions for setting up MLflow experiment tracking within a Jupyter notebook.

  1. Near the beginning of the Jupyter notebook, include a cell to set up MLflow Tracking.

    # --- Set up MLflow Tracking
    
    # Set experiment
    mlflow.set_experiment(experiment_name)
    
    # Ensure that previous run (possibly failed) has been terminated by MLflow.
    if mlflow.active_run():
        mlflow.end_run()
    
    # Initialize dictionary for experiment results
    mlflow_results = {}
    

    Note. For situations where it is useful to group experiments by date or time, the utils Python module provides the get_experiment_name() function to faciliate consistent generation of date and time stamped experiment names.

  2. Before running the experiment, include a cell to record all of the experiment parameters.

    # --- Record experiment parameters
    
    mlflow.log_param("some-parameter", some_parameter)
    mlflow.log_param("another-parameter", another_parameter)
    

    Note. MLflow Tracking automatically includes a timestamp for each run of an experiment to facilitate comparison of different runs of an experiment using the same set of configuration parameters.

  3. Throughout the Jupyter notebook, add results to mlflow_results and/or record individual results (saved as MLflow “metrics”).

    # Add a result to `mlflow_results`. This result will be saved at the end of
    # the Jupyter notebook
    mlflow_results["some-result"] = some_result
    
    # Record an individual result (as an MLflow "metric")
    mlflow.log_metric("another-result") = another_result
    
  4. After the experiment is completed, include a cell to record the results.

    # --- Record experiment results
    
    mlflow.log_dict(mlflow_results, "results.json")
    
  5. At the end of the Juypter notebook, include a cell to end the MLflow run.

    # --- End current MLflow run
    
    mlflow.end_run()
    

2. Viewing MLflow Tracking Results

MLflow Tracking provides support for reviewing and comparing experiments. It is particularly useful when comparing results across multiple runs of the same experiment with different parameter settings. For basic research projects, the following short set of steps should be sufficient to viewing MLflow Tracking results.