utils.mlflow_io#

MLflow I/O utilities for fetching runs and artifacts.

This module provides helpers for retrieving experiment data from MLflow/Databricks and downloading artifacts to the local data directory.

Functions

`download_artifacts`(experiment, output_dir[, ...])	Download artifacts from MLflow runs to local directory.
`download_artifacts_with_naming`(experiment, ...)	Download HDF5 artifacts with standardized naming.
`load_runs`(experiment[, converged_only, ...])	Load runs from an MLflow experiment.
`setup_mlflow_auth`()	Configure MLflow authentication.

utils.mlflow_io.download_artifacts(experiment: str, output_dir: Path, converged_only: bool = True, artifact_filter: List[str] | None = None) → List[Path][source]#

Download artifacts from MLflow runs to local directory.

Parameters:

experimentstr: Experiment name (e.g., “HPC-FV-Solver”).
output_dirPath: Directory to save artifacts. Files are named based on run parameters.
converged_onlybool, default True: Only download from converged runs.
artifact_filterlist of str, optional: Only download artifacts matching these patterns (e.g., [”.h5”, “.png”]). If None, downloads all artifacts.

Returns:

list of Path: Paths to downloaded files.

Examples

>>> paths = download_artifacts("HPC-FV-Solver", Path("data/FV-Solver"))
>>> print(paths)
[Path('data/FV-Solver/LDC_N32_Re100.h5'), ...]

utils.mlflow_io.download_artifacts_with_naming(experiment: str, output_dir: Path, converged_only: bool = True) → List[Path][source]#

Download HDF5 artifacts with standardized naming.

Names files as: POISSON_N{n}_Iter{iter}.h5 (Adapted for LSM)

Parameters:

experimentstr: Experiment name.
output_dirPath: Directory to save artifacts.
converged_onlybool, default True: Only download from converged runs.

Returns:

list of Path: Paths to downloaded files.

utils.mlflow_io.load_runs(experiment: str, converged_only: bool = True, exclude_parent_runs: bool = True) → DataFrame[source]#

Load runs from an MLflow experiment.

Parameters:

experimentstr: Experiment name (e.g., “HPC-FV-Solver” or full path “/Shared/ANA-P3/HPC-FV-Solver”).
converged_onlybool, default True: Only return runs where metrics.converged = 1.
exclude_parent_runsbool, default True: Exclude parent runs (nested run containers).

Returns:

pd.DataFrame: DataFrame with run info, parameters (params.*), and metrics (metrics.*).

Examples

>>> df = load_runs("HPC-FV-Solver")
>>> df[["run_id", "params.nx", "metrics.wall_time_seconds"]]

utils.mlflow_io.setup_mlflow_auth()[source]#

Configure MLflow authentication.

Uses DATABRICKS_TOKEN environment variable if available (for CI), otherwise falls back to interactive login.