utils.mlflow_io#

MLflow I/O utilities for fetching runs and artifacts.

This module provides helpers for retrieving experiment data from MLflow/Databricks and downloading artifacts to the local data directory.

Functions

download_artifacts(experiment, output_dir[, ...])

Download artifacts from MLflow runs to local directory.

download_artifacts_with_naming(experiment, ...)

Download HDF5 artifacts with standardized naming.

load_runs(experiment[, converged_only, ...])

Load runs from an MLflow experiment.

setup_mlflow_auth()

Configure MLflow authentication.

utils.mlflow_io.download_artifacts(experiment: str, output_dir: Path, converged_only: bool = True, artifact_filter: List[str] | None = None) List[Path][source]#

Download artifacts from MLflow runs to local directory.

Parameters:
experimentstr

Experiment name (e.g., “HPC-FV-Solver”).

output_dirPath

Directory to save artifacts. Files are named based on run parameters.

converged_onlybool, default True

Only download from converged runs.

artifact_filterlist of str, optional

Only download artifacts matching these patterns (e.g., [”.h5”, “.png”]). If None, downloads all artifacts.

Returns:
list of Path

Paths to downloaded files.

Examples

>>> paths = download_artifacts("HPC-FV-Solver", Path("data/FV-Solver"))
>>> print(paths)
[Path('data/FV-Solver/LDC_N32_Re100.h5'), ...]
utils.mlflow_io.download_artifacts_with_naming(experiment: str, output_dir: Path, converged_only: bool = True) List[Path][source]#

Download HDF5 artifacts with standardized naming.

Names files as: POISSON_N{n}_Iter{iter}.h5 (Adapted for LSM)

Parameters:
experimentstr

Experiment name.

output_dirPath

Directory to save artifacts.

converged_onlybool, default True

Only download from converged runs.

Returns:
list of Path

Paths to downloaded files.

utils.mlflow_io.load_runs(experiment: str, converged_only: bool = True, exclude_parent_runs: bool = True) DataFrame[source]#

Load runs from an MLflow experiment.

Parameters:
experimentstr

Experiment name (e.g., “HPC-FV-Solver” or full path “/Shared/ANA-P3/HPC-FV-Solver”).

converged_onlybool, default True

Only return runs where metrics.converged = 1.

exclude_parent_runsbool, default True

Exclude parent runs (nested run containers).

Returns:
pd.DataFrame

DataFrame with run info, parameters (params.*), and metrics (metrics.*).

Examples

>>> df = load_runs("HPC-FV-Solver")
>>> df[["run_id", "params.nx", "metrics.wall_time_seconds"]]
utils.mlflow_io.setup_mlflow_auth()[source]#

Configure MLflow authentication.

Uses DATABRICKS_TOKEN environment variable if available (for CI), otherwise falls back to interactive login.