High performance computing is required to make feasible simulations of whole organ models of the heart with biophysically detailed cellular models in a clinical setting. Increasing model detail by simulating electrophysiology and mechanical models increases computation demands. We present scaling results of an electro mechanical cardiac model of two ventricles and compare them to our previously published results using an electrophysiological model only. The anatomical data-set was given by both ventricles of the Visible Female data-set in a 0.2 mm resolution. Fiber orientation was included. Data decomposition for the distribution onto the distributed memory system was carried out by orthogonal recursive bisection. Load weight ratios for non tissue vs. tissue elements used in the data decomposition were 1:1, 1:2, 1:5, 1:10, 1:25, 1:38.85, 1:50 and 1:100. The ten Tusscher et al. (2004) electrophysiological cell model was used and the Rice et al. (1999) model for the computation of the calcium transient dependent force. Scaling results for 512, 1024, 2048, 4096, 8192 and 16,384 processors were obtained for 1 ms simulation time. The simulations were carried out on an IBM Blue Gene/L supercomputer. The results show linear scaling from 512 to 16,384 processors with speedup factors between 1.82 and 2.14 between partitions. The most optimal load ratio was 1:25 for on all partitions. However, a shift towards load ratios with higher weight for the tissue elements can be recognized as can be expected when adding computational complexity to the model while keeping the same communication setup. This work demonstrates that it is potentially possible to run simulations of 0.5 s using the presented electro-mechanical cardiac model within 1.5 hours.
Multi-scale, multi-physical heart models have not yet been able to include a high degree of accuracy and resolution with respect to model detail and spatial resolution due to computational limitations of current systems. We propose a framework to compute large scale cardiac models. Decomposition of anatomical data in segments to be distributed on a parallel computer is carried out by optimal recursive bisection (ORB). The algorithm takes into account a computational load parameter which has to be adjusted according to the cell models used. The diffusion term is realized by the monodomain equations. The anatomical data-set was given by both ventricles of the Visible Female data-set in a 0.2 mm resolution. Heterogeneous anisotropy was included in the computation. Model weights as input for the decomposition and load balancing were set to (a) 1 for tissue and 0 for non-tissue elements; (b) 10 for tissue and 1 for non-tissue elements. Scaling results for 512, 1024, 2048, 4096 and 8192 computational nodes were obtained for 10 ms simulation time. The simulations were carried out on an IBM Blue Gene/L parallel computer. A 1 s simulation was then carried out on 2048 nodes for the optimal model load. Load balances did not differ significantly across computational nodes even if the number of data elements distributed to each node differed greatly. Since the ORB algorithm did not take into account computational load due to communication cycles, the speedup is close to optimal for the computation time but not optimal overall due to the communication overhead. However, the simulation times were reduced form 87 minutes on 512 to 11 minutes on 8192 nodes. This work demonstrates that it is possible to run simulations of the presented detailed cardiac model within hours for the simulation of a heart beat.
Increasing biophysical detail in multi physical, multiscale cardiac model will demand higher levels of parallelism in multi-core approaches to obtain fast simulation times. As an example of such a highly parallel multi-core approaches, we develop a completely distributed bidomain cardiac model implemented on the IBM Blue Gene/L architecture. A tissue block of size 50 times 50 times 100 cubic elements based on ten Tusscher et al. (2004) cell model is distributed on 512 computational nodes. The extracellular potential is calculated by the Gauss-Seidel (GS) iterative method that typically requires high levels of inter-processor communication. Specifically, the GS method requires knowledge of all cellular potentials at each of it iterative step. In the absence of shared memory, the values are communicated with substantial overhead. We attempted to reduce communication overhead by computing the extracellular potential only every 5th time step for the integration of the cell models. We also investigated the effects of reducing inter-processor communication to every 5th, 10th, 50th iteration or no communication within the GS iteration. While technically incorrect, these approximation had little impact on numerical convergence or accuracy for the simulations tested. The results suggest some heuristic approaches may further reduce the inter-processor communication to improve the execution time of large-scale simulations.