Experiments#

This section presents the experiments performed to analyze the different aspects of the parallel Poisson solver. The experiments build chronologically towards a final ‘integration’ test where we verify the correctness of our implementation and finally a scaling analysis.

01 - Choice of Kernel: NumPy vs Numba#

Description#

Here we compare Numba JIT-compiled kernels vs pure NumPy implementations. This experiment tests only the computational kernel in isolation - without MPI, domain decomposition, or parallel communication.

Purpose#

Identify impacts of the choice of kernel implementations and parameters like thread-count.

  • Kernel correctness - Verify both implementations produce identical results

  • Performance pr. iteration - Compare execution time for NumPy vs Numba with different thread counts across various problem sizes

  • Speedup analysis - Comparing different Numba thread configurations against a NumPy baseline.

  • Compute time scaling - Measure computation cost with fixed iteration count and also fixed tolerance.

Decision Point: Choose optimal kernel (NumPy or Numba) and thread count for subsequent experiments.

Visualization of Kernel Experiments

Visualization of Kernel Experiments

Kernel Experiments

Kernel Experiments

02 - Domain Decomposition#

Description#

Compare 1D sliced decomposition and 3D cubic decomposition strategies for parallel domain partitioning. This experiment tests domain decomposition logic - partitioning across MPI ranks, local to global indice mapping, and how ghost zones are structured.

Sliced (1D): Splits domain along Z-axis with horizontal slices, exchanging 2 ghost planes.

Cubic (3D): Uses 3D Cartesian grid across all dimensions, exchanging 6 ghost faces.

Purpose#

Determine which decomposition strategy provides better performance for different problem sizes and rank counts by analyzing: Get an understanding of how the type of domain decomposition impacts the size of the data that needs to be communicated between ranks along with the ‘connectivity’ of different ranks.

  • Visual comparison - Illustrate how the domain is partitioned for each method

  • Surface-area-to-volume ratios - investigate how much data needs to be communicated between ranks depending on the decomposition strategy

Domain Decomposition Visualization

Domain Decomposition Visualization

03 - Communication Methods#

Description#

Compare custom MPI datatypes vs NumPy array communication for halo exchange operations. This experiment tests communication implementation details - how data is transferred between ranks during halo exchanges.

Custom MPI Datatypes: Zero-copy communication using MPI.Create_contiguous() and MPI.Create_subarray().

NumPy Arrays: Explicit buffer copies using np.ascontiguousarray().

Purpose#

Determine whether custom MPI datatypes provide measurable performance improvements over NumPy arrays by evaluating:

  • Communication overhead - Demonstrate whether custom datatypes reduce overhead compared to NumPy arrays

  • Scaling behavior - Analyze how each method scales with problem size and rank count

  • Scaling order analysis - Use log-log plots with reference lines to derive computational complexity

Communication Analysis: Contiguous vs Non-Contiguous

Communication Analysis: Contiguous vs Non-Contiguous

Communication Method Benchmark

Communication Method Benchmark

04 - Solver Validation#

Description#

End-to-end validation of the complete Poisson solver across all implementation permutations. This experiment tests the fully assembled solver - Decomposition from experiment 02, and communication from experiment 03.

The correctness is asserted by comparing the obtained solution with the analytical solution in a grid refinement study and verifying the theoretical order of spatial accuracy.

Note

We only use a single kernel-configuration here since the kernel correctness has already been established in experiment 01.

Purpose#

Establish correctness of the solver implementation by:

  • Analytical comparison - Test against known exact solution: u(x,y,z) = sin(πx)sin(πy)sin(πz)

  • Spatial convergence - Demonstrate expected O(h²) convergence order as grid is refined

Solver Validation

Solver Validation

Validation Analysis and Visualization

Validation Analysis and Visualization

05 - Scaling Analysis#

Description#

Scaling analysis using the validated solver configuration from experiment 04. This experiment measures the performance limits of the complete solver across different problem sizes and processor counts.

Strong Scaling: Fixed problem size with increasing ranks → measures parallel speedup.

Weak Scaling: Constant work per rank with proportional growth → measures scalability.

Purpose#

Characterize the parallel performance limits of the validated solver by analyzing:

  • Strong scaling efficiency - Measure speedup curves for fixed problem sizes with increasing processor counts

  • Weak scaling efficiency - Evaluate performance with constant work per rank as both problem size and processors grow

  • Memory usage scaling - Analyze per-rank memory footprint and total memory requirements as problem size and rank count vary

  • Parallel I/O considerations - Demonstrate impact of parallel HDF5 writes vs serial gather-to-rank-0 on scaling behavior

Poisson Scaling Experiment

Poisson Scaling Experiment

Gallery generated by Sphinx-Gallery