Complex Multiphysics on Future Exascale Systems
Abstract
Objective: The overal objective of the current research effort is to reduce by an order of magnitude the turnaround times for the multidisciplinary multiphysics package FEMAP , when used for blast, penetration, weapon effects and WMD applications. This order of magnitude reduction will only be accomplished if all the elements along the simulation pipeline: grid generation, flow and structural mechanics solvers, coupling methodology and post-processing can be scaled up to pre-exascale number of cores and processors. Acccomplishments: - FRGEN: The grid generator FRGEN has been ported and tested in MPI mode. It was discovered that the multi-material/multi-domain option was incorrect. This had to be recoded (not a trivial task) and is now running in MPI mode. At present, we are running higher and higher numbers of cores in order to achieve production status for 1,000 cores. - FEFLO: An intensive, in-depth analysis of the CFD code FEFLO has been carried out. Several key findings include: - A very high CPU percentage is spent in I/o (this was mitigated by migrating to binary output formats); - Several locfct.f (core solver) subroutines were not being vectorized as expected; this led to re-writing (remove redundancies, compact storage, transposing of arrays, etc) and further testing; to date the gains have been modest (10-20%), but we are continuing along this path; - A considerable effort was devoted to improve the embedded CSD boundary conditions and load transfer; this led to CPU reductions of more than 25% for this part of the code; - The particle transport was also tested and analzied in detail; this led to CPU reductions of 50% for this part of the code; - An optimal domain renumbering technique for mpi-runs has been implemented. It takes into account the communication layout of the machine FEFLO/FEMAP are running on, as well as the information transfer required between domains. - FEMAP: The coupling libaries in FEMAP now allow for multiple codes running different number of mpi-processes. This part has now come into production status. Initial results indicate that by using this option, the overall speed of coupled runs can be reduced by a factor of 1:2 at present. - Algorithms/Theoretical Work: Twe habe continued developing computational flow algorithms with minimal memory access techniques and improved node-to-node (MPI) communication. - Runs/Applications: At present, we are testing coupled, realistic, production-like runs that scale to 10Kcores. Results to date have been very encouraging.
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- Jun 29, 2016
- Source ID
- HDTRA11510068
Entities
People
- Rainald Löhner
Organizations
- Defense Threat Reduction Agency
- George Mason University