Day 2, Workshop Programming of Heterogeneous Systems in Physics

This post was automatically copied from Day 2, Workshop Programming of Heterogeneous Systems in Physics on eklausmeier.goip.de.

Day 2 of the conference had below talks. Prof. Dr. Bernd Brügmann gave a short introduction. He pointed out that Jena is number 10 in Physics in Germany, has ca. 100.000 inhabitants, and 20.000 students.

  1. Dr. Karl Rupp, Vienna, Lessons Learned in Developing the Linear Algebra Library ViennaCL. Notes: C++ operator overloading normally uses temporary, special trickery necessary to circumvent this, ViennaCL not callable from Fortran due to C++/operator overloading, eigen.tuxfamily.org, Karl Rupp's slides, with CUDA 5+6 OpenCL and CUDA are more or less on par,
  2. Prof. Dr. Rainer Heintzmann, Jena, CudaMat - a toolbox for Cuda computations. Rainer Heintzmann Notes: Information on CudaMat, 300 GB fly-head, Delft Image Processing Library, wrote his own CUDA memory allocator with storage from heap, does not work on Octave
  3. Prof. Dipl.-Ing. Dr. Gundolf Haase, Graz, Interpolation with Radial Basis Functions on GPGPUs using CUDA. Gundolf Haase + Dirk Martin Notes: AVL Graz, car industry, simulation software, OpenACC disappointing, significant speedup with GPU/CUDA, rule of thumb: start with OpenMP, then MIC, then OpenACC
  4. Lars Kühne, Jena, A Concurrent Algorithm for Computing the Flow Complex. Lars Kühne
  5. Axel Hübl, Helmholtz-Zentrum Dresden-Rossendorf, Scaling Plasma Simulations to more than 18,000 GPUs. Axel Hübl
  6. Carsten Eye Frigaard, www.lab4241.com, Running GADGET2 on GPUs: Optimizing Tree-search Algorithms by Detailed Profiling of GPU Code. Carsten Eye Frigaard Notes: gpuprofgui, C-source level counters, PTX level counters, SASS level counters, BARRA, UNISIM
  7. M. Sc. Moritz Kreutzer, Erlangen, Building blocks for sparse linear algebra on heterogeneous hardware. Moritz Kreutzer Notes: excellent speech, 45% comes from accelerator in Top50 supercomputers, vulnerability for hardware faults, fusing kernels, checkpoints, ESSEX programme/project, JDS, CRS, SSE, AVX, Sliced ELLPACK, computation done in permuted fashion
  8. Dipl.-Phys. Marcus Noack, Oslo, Parallel and simultaneous computation of eikonal and transport equations by taking full advantage of GPU computer architecture. Marcus Noack Notes: Oil, seismic, just a single CUDA kernel, used OpenMP
  9. Dr. Manfred Liebmann, Graz, Optimal Control of the Schrödinger Equation on Many-Core Architectures. Manfred Liebmann Notes: Crank-Nicholson much worse, Intel compiler not better than gcc/g++, 10+ PDEs per iteration, good initial approximation necessary, GPU two times faster, unitarity not a problem
  10. Dr. Johannes Langguth, Oslo, Scalable Finite Volume Computations in Heterogeneous Systems. Johannes Langguth
  11. Dipl.-Inf. Ralf Seidler, Jena, Implementing the Radon Transform using Advanced Techniques on GPGPUs. Ralf Seidler Notes: GTX750 consumer card, no problem with single precision
  12. Prof. Dr. Gerhard Zumbusch, Jena, A parallel functional language for high performance finite difference stencil codes. Gerhard Zumbusch Notes: Very interesting, excellent presentation, large gap between GFlop/s and memory speed, you have to fuse operations, Runge-Kutta discretization, you are measuring memory speed not computational speed
  13. Mohammed Sourouri, Oslo, An Optimized Intra-Node Communication Scheme Using Multiple CUDA Streams and OpenMP Threads. Mohammed Sourouri
  14. Carsten Eckert, Helmholtz-Zentrum Dresden-Rossendorf, An adaptive, load-balanced MPI/GPU-Code for calculating the gain in High Power Laser media. Carsten Eckert Notes: ArchLinux, 64 GPUs, all communication via MPI, 1 point = 1 kernel, Tesla K20M, Computational Radiation Physics, Monte-Carlo integration
  15. Dr. Erik Rodner, Jena, Computational Challenges for Visual Recognition with Deep Learning Architectures. Erik Rodner
  16. Dipl.-Phys. Richard Pausch, Dresden, Scalable, interactive 3D in-situ visualization of large-scale Simulations. Richard Pausch