View on GitHub

Parallel Adaptive Mesh Refinement

Summary

We aim to parallelize and optimize an Adaptive Mesh Refinement (AMR) code using OpenMPI that will run on GHC and PSC machines. The project involves parallelizing an existing serial base code to leverage parallel processing, exploring techniques to reduce work imbalance, communication and synchronization overhead, thus enhancing its performance and scalability.

Background

Adaptive Mesh Refinement (AMR) is a numerical technique used in computational simulations to dynamically adjust the resolution of a computational grid based on the features of the solution being computed. In many scientific and engineering applications, such as fluid dynamics, astrophysics, and structural analysis, certain regions of interest may require higher resolution to capture complex phenomena accurately, while other regions can suffice with coarser resolution to reduce computational cost.

AMR algorithms typically work by initially defining a coarse grid covering the entire domain of interest. As the simulation progresses, the algorithm identifies regions where higher resolution is needed and refines the grid locally in those regions, creating finer grids known as refinement patches or refinement levels. This adaptive refinement process allows computational resources to be concentrated where they are most needed, leading to more accurate results with less computational expense.

The following figure from Wikipedia shows the grid structure of an example AMR calculation. Each of the boxes is a grid; the more boxes it is nested within, the higher the level of refinements.

AMR Example

Application Description

The application we are focusing on is an implementation of an AMR algorithm for simulating physical phenomena, such as fluid flow or structural deformation, in a computational domain. The algorithm consists of several key components:

Things for Parallelism

Several aspects of the AMR algorithm can benefit from parallelism.

Challenges

Resources

Access to PSC machines and the university's GHC cluster with multi-core CPUs.

We will start from an existing open-source AMR implementations (which include the serial implementation version), extending and optimizing it for our needs. Link

Berger, M. J., and Oliger, J. Adaptive Mesh Refinement for Hyperbolic Partial Differential Equations. Journal of Computational Physics, 1984.

Diachin, Lori A., Richard D. Hornung, Paul E. Plassmann, and Andrew M. Wissink. "Parallel Adaptive Mesh Refinement." In Parallel Processing for Scientific Computing, 2005. Link

We will base our AMR strategy primarily on the principles outlined in the aforementioned references. Additionally, we will look for and incorporate ideas from new studies on parallel AMR.

Goals and Deliverables

Hope to Achieve:

Demo:

During the poster session, we plan to demonstrate the following:

Platform Choice

Since we choose to parallelize a piece of code designed for multi-core machines, early-stage tests will be conducted on GHC machines, while final measurements will include results from PSC machines to examine the implementation's scalability under a large number of cores. We temporarily choose OpenMPI as our implementation framework.

Schedule

Week Dates Tasks Completed/Planned
0 March 25 – March 31 Reviewed several existing AMR codebase and literature. (both)
1 April 1 – April 7 Further reviewed a structured AMR repository that simulates heat propogation but does not modify mesh during execution. (Xingyuan Wang) Examined wave simulation problem and finalized data structures to use. (Hangrui Cao)
2 April 8 – April 14 Implemented grid initialization and a (simplified) workload rebalancing algorithm. (Xingyuan Wang) Implemented and tested flux calculation procedure; implemented mesh refinement criteria. (Hangrui Cao)
3 April 15 – April 17 Finish mid-project report. (both) Fix bugs when program runs near grid boundaries, and complete the initial solution. (Xingyuan Wang) Implement visualization script for displaying wave grid. (Hangrui Cao)
3 April 18 – April 21 Run initial performance tests on GHC machines with varying processor numbers, identiy any performance issues. (Xingyuan Wang) Test another workload rebalancing algorithm with finer granularity. (Hangrui Cao)
4 April 22 – April 24 Support for multi-level refinement (if possible, currently fixed at 2). (Xingyuan Wang) Examine locality and MPI communication schema issues. (Hangrui Cao)
4 April 25 – April 28 Refine current memory access pattern with improved data layout. (Hangrui Cao) Get final performance results from GHC and PSC machines. (Xingyuan Wang)
5 April 29 – May 5 Final comparison tests, make project poster and write the final report. (both)

This schedule allows us to allocate time efficiently, considering the project's complexity and our academic workload. We plan to reassess our progress weekly to ensure we stay on track to meet our goals.