TBG

Section author: Axel Huebl, Klaus Steiniger

Module author: René Widera

Our tool template batch generator (tbg) abstracts program runtime options from technical details of supercomputers. On a desktop PC, one can just execute a command interactively and instantaneously. Contrarily on a supercomputer, resources need to be shared between different users efficiently via job scheduling. Scheduling on today’s supercomputers is usually done via batch systems that define various queues of resources.

An unfortunate aspect about batch systems from a user’s perspective is, that their usage varies a lot. And naturally, different systems have different resources in queues that need to be described.

PIConGPU runtime options are described in configuration files (.cfg). We abstract the description of queues, resource acquisition and job submission via template files (.tpl). For example, a .cfg file defines how many devices shall be used for computation, but a .tpl file calculates how many physical nodes will be requested. Also, .tpl files takes care of how to spawn a process when scheduled, e.g. with mpiexec and which flags for networking details need to be passed. After combining the machine independent (portable) .cfg file from user input with the machine dependent .tpl file, tbg can submit the requested job to the batch system.

Last but not least, one usually wants to store the input of a simulation with its output. tbg conveniently automates this task before submission. The .tpl and the .cfg files that were used to start the simulation can be found in <tbg destination dir>/tbg/ and can be used together with the .param files from <tbg destination dir>/input/.../param/ to recreate the simulation setup.

In summary, PIConGPU runtime options in .cfg files are portable to any machine. When accessing a machine for the first time, one needs to write template .tpl files, abstractly describing how to run PIConGPU on the specific queue(s) of the batch system. We ship such template files already for a set of supercomputers, interactive execution and many common batch systems. See $PICSRC/etc/picongpu/ and our list of systems with .profile files for details.

Usage

TBG (template batch generator)
create a new folder for a batch job and copy in all important files

usage: tbg -c [cfgFile] [-s [submitsystem]] [-t [templateFile]]
          [-o "VARNAME1=10 VARNAME2=5"] [-f] [-h]
          [projectPath] destinationPath

recommended usage when sourcing a PIConGPU config file before:
    tbg -s -t -c cfgFile destinationPath

-c | --cfg      [file]         - Configuration file to set up batch file.
                                 Default: [cfgFile] via export TBG_CFGFILE
-s | --submit   [command]      - Submit command (qsub, "qsub -h", sbatch, ...)
                                 Default: [submitsystem] via export TBG_SUBMIT
-t | --tpl      [file]         - Template to create a batch file from.
                                 tbg will use stdin, if no file is specified.
                                 Default: [templateFile] via export TBG_TPLFILE
                                 Warning: If -t is omitted, stdin will be used as
                                 input for the template.
-o                             - Overwrite any template variable:
                                 spaces within the right side of assign are not allowed
                                 e.g. -o "VARNAME1=10 VARNAME2=5"
                                 Overwriting is done after cfg file was executed
-f | --force                   - Override if 'destinationPath' exists.
-h | --help                    - Shows help (this output).

[projectPath]                  - Project directory containing source code and
                                 binaries
                                 Default: current directory
destinationPath                - Directory for simulation output. 
 
 
TBG exports the following variables, which can be used in cfg and tpl files at
any time:
 TBG_jobName                   - name of the job
 TBG_jobNameShort              - short name of the job, without blanks
 TBG_cfgPath                   - absolute path to cfg file
 TBG_cfgFile                   - full absolute path and name of cfg file
 TBG_projectPath               - absolute project path (see optional parameter
                                 projectPath)
 TBG_dstPath                   - absolute path to destination directory

.cfg File Macros

Feel free to copy & paste sections of the files below into your .cfg, e.g. to configure complex plugins:

# Copyright 2014-2023 Felix Schmitt, Axel Huebl, Richard Pausch, Heiko Burau,
#                     Franz Poeschel, Sergei Bastrakov, Pawel Ordyna
#
# This file is part of PIConGPU.
#
# PIConGPU is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# PIConGPU is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with PIConGPU.
# If not, see <http://www.gnu.org/licenses/>.

################################################################################
## This file describes sections and variables for PIConGPU's
## TBG batch file generator.
## These variables basically wrap PIConGPU command line flags.
## To see all flags available for your PIConGPU binary, run
## picongpu --help. The avalable flags depend on your configuration flags.
## Note that this is not meant to be a complete and functioning .cfg file.
##
## Flags that target a specific species e.g. electrons (--e_png) or ions
## (--i_png) must only be used if the respective species is activated (configure flags).
##
## If not stated otherwise, variables/flags must not be used more than once!
################################################################################

################################################################################
## Section: Required Variables
## Variables in this secton are necessary for PIConGPU to work properly and should not
## be removed. However, you are free to adjust them to your needs, e.g. setting
## the number of GPUs in each dimension.
################################################################################

# Batch system walltime
TBG_wallTime="1:00:00"

# Number of devices in each dimension (x,y,z) to use for the simulation
TBG_devices_x=1
TBG_devices_y=2
TBG_devices_z=1

# Size of the simulation grid in cells as "X Y Z"
# note: the number of cells needs to be an exact multiple of a supercell
#       and has to be at least 3 supercells per device,
#       the size of a supercell (in cells) is defined in `memory.param`
TBG_gridSize="128 256 128"

# Number of simulation steps/iterations as "N"
TBG_steps="100"

# disable grid size auto adjustment
TBG_disableGridAutoAdjustment="--autoAdjustGrid off"

################################################################################
## Section: Optional Variables
## You are free to add and remove variables here as you like.
## The only exception is TBG_plugins which is used to forward your variables
## to the TBG program. This variable can be modified but should not be removed!
##
## Please add all variables you define in this section to TBG_plugins.
################################################################################

# Variables which are created by TBG (should be self-descriptive)
TBG_jobName
TBG_jobNameShort
TBG_cfgPath
TBG_cfgFile
TBG_projectPath
TBG_dstPath


# version information on startup
TBG_version="--versionOnce"

# Set progress output period in stdout:
# -p or --progress can be used to set an integer percent value (of the total simulation duration).
# It is also possible to set the exact time steps with --progressPeriod and the plugin period syntax.
# --progress and --progressPeriod are combined. For setting the period only in percent just leave the --progressPeriod option out.
# For setting the period only with the plugin syntax use  -p 0 to disable the default -p 5 setting.
# Example for writing every 2 % as well as every 100 steps and every 20 steps from the 100th to the 1000th step.
TBG_progress="-p 2 --progressPeriod 100,100:1000:20"


# Regex to describe the static distribution of the cells for each device
# default: equal distribution over all devices
# example for -d 2 4 1 -g 128 192 12
TBG_gridDist="--gridDist '64{2}' '64,32{2},64'"


# Specifies whether the grid is periodic (1) or not (0) in each dimension (X,Y,Z).
# Default: no periodic dimensions
TBG_periodic="--periodic 1 0 1"


# Specifies boundaries for given species with a particle pusher, 'e' in the example.
# The axis order matches --periodic.
# Default: what is set by --periodic, all offsets 0.
# Supported particle boundary kinds: periodic, absorbing, reflecting, thermal.
# Periodic boundaries require 0 offset, thermal require a positive offset, other boundary kinds can have non-negative offsets.
# Boundary temperature is set in keV, only affects thermal boundaries.
TBG_particleBoundaries="--e_boundary periodic absorbing thermal --e_boundaryOffset 0 10 5 --e_boundaryTemperature 0 0 20.0"


# Specify runtime density file for given species, 'e' in the example.
# Only has effect when FromOpenPMDImpl<Param> density is used for the species and its Param::filename is empty.
# In case the species has multiple such FromOpenPMDImpl invocations, same runtime value is used for all of them.
TBG_runtimeDensityFile="--e_runtimeDensityFile /bigdata/hplsim/production/someDirectory/density.bp"


# Set absorber type of absorbing boundaries.
# Supported values: exponential, pml (default).
# When changing absorber type, one should normally adjust NUM_CELLS in fieldAbsorber.param
TBG_absorber="--fieldAbsorber pml"

# Enables moving window (sliding) in your simulation
TBG_movingWindow="-m"


# Defines when to start sliding the window.
# The window starts sliding at the time required to pass the distance of
# windowMovePoint * (global window size in y) when moving with the speed of light
# Note: beware, there is one "hidden" row of gpus at the y-front, so e.g. when the window is enabled
# and this variable is set to 0.75, only 75% of simulation area is used for real simulation
TBG_windowMovePoint="--windowMovePoint 0.9"


# stop the moving window after given simulation step
TBG_stopWindow="--stopWindow 1337"


# Set current smoothing.
# Supported values: none (default), binomial
TBG_currentInterpolation="--currentInterpolation binomial"


# Duplicate E and B field storage inside field background to improve performance at cost of additional memory
TBG_fieldBackground="--fieldBackground.duplicateFields"


# Allow two MPI ranks to use one compute device.
TBG_ranksPerDevice="--numRanksPerDevice 2"

################################################################################
## Placeholder for multi data plugins:
##
## placeholders must be substituted with the real data name
##
## <species> = species name e.g. e (electrons), i (ions)
## <field>  = field names e.g. FieldE, FieldB, FieldJ
################################################################################

# The following flags are available for the radiation plugin.
# For a full description, see the plugins section in the online wiki.
#--<species>_radiation.period     Radiation is calculated every .period steps. Currently 0 or 1
#--<species>_radiation.dump     Period, after which the calculated radiation data should be dumped to the file system
#--<species>_radiation.lastRadiation     If flag is set, the spectra summed between the last and the current dump-time-step are stored
#--<species>_radiation.folderLastRad     Folder in which the summed spectra are stored
#--<species>_radiation.totalRadiation     If flag is set, store spectra summed from simulation start till current time step
#--<species>_radiation.folderTotalRad     Folder in which total radiation spectra are stored
#--<species>_radiation.start     Time step to start calculating the radition
#--<species>_radiation.end     Time step to stop calculating the radiation
#--<species>_radiation.radPerGPU     If flag is set, each GPU stores its own spectra without summing the entire simulation area
#--<species>_radiation.folderRadPerGPU     Folder where the GPU specific spectras are stored
#--<species>_radiation.numJobs     Number of independent jobs used for the radiation calculation.
#--<species>_radiation.openPMDSuffix   Suffix for openPMD filename extension and iteration expansion pattern.
#--<species>_radiation.openPMDCheckpointExtension    Filename extension for openPMD checkpoints.
#--<species>_radiation.openPMDConfig    JSON/TOML configuration for initializing openPMD checkpointing.
#--<species>_radiation.openPMDCheckpointConfig    JSON/TOML configuration for initializing openPMD checkpointing.
#--<species>_radiation.distributedAmplitude    Additionally output distributed amplitudes per MPI rank.
TBG_radiation="--<species>_radiation.period 1 --<species>_radiation.dump 2 --<species>_radiation.totalRadiation \
               --<species>_radiation.lastRadiation --<species>_radiation.start 2800 --<species>_radiation.end 3000"

# The following flags are available for the transition radiation plugin.
# For a full description, see the plugins section in the online documentation.
#--<species>_transRad.period   Gives the number of time steps between which the radiation should be calculated.
TBG_transRad="--<species>_transRad.period 1000"

# Create 2D images in PNG format every .period steps.
# The slice plane is defined using .axis [yx,yz] and .slicePoint (offset from origin
# as a float within [0.0,1.0].
# The output folder can be set with .folder.
# Can be used more than once to print different images, e.g. for YZ and YX planes.
TBG_<species>_pngYZ="--<species>_png.period 10 --<species>_png.axis yz --<species>_png.slicePoint 0.5 --<species>_png.folder pngElectronsYZ"
TBG_<species>_pngYX="--<species>_png.period 10 --<species>_png.axis yx --<species>_png.slicePoint 0.5 --<species>_png.folder pngElectronsYX"

# Create a particle-energy histogram [in keV] per species for every .period steps
TBG_<species>_histogram="--<species>_energyHistogram.period 500 --<species>_energyHistogram.binCount 1024     \
                         --<species>_energyHistogram.minEnergy 0 --<species>_energyHistogram.maxEnergy 500000 \
                         --<species>_energyHistogram.filter all"


# Calculate a 2D phase space
# - momentum range in m_<species> c
TBG_<species>_PSxpx="--<species>_phaseSpace.period 10 --<species>_phaseSpace.filter all --<species>_phaseSpace.space x --<species>_phaseSpace.momentum px --<species>_phaseSpace.min -1.0 --<species>_phaseSpace.max 1.0"
TBG_<species>_PSxpz="--<species>_phaseSpace.period 10 --<species>_phaseSpace.filter all --<species>_phaseSpace.space x --<species>_phaseSpace.momentum pz --<species>_phaseSpace.min -1.0 --<species>_phaseSpace.max 1.0"
TBG_<species>_PSypx="--<species>_phaseSpace.period 10 --<species>_phaseSpace.filter all --<species>_phaseSpace.space y --<species>_phaseSpace.momentum px --<species>_phaseSpace.min -1.0 --<species>_phaseSpace.max 1.0"
TBG_<species>_PSypy="--<species>_phaseSpace.period 10 --<species>_phaseSpace.filter all --<species>_phaseSpace.space y --<species>_phaseSpace.momentum py --<species>_phaseSpace.min -1.0 --<species>_phaseSpace.max 1.0"
TBG_<species>_PSypz="--<species>_phaseSpace.period 10 --<species>_phaseSpace.filter all --<species>_phaseSpace.space y --<species>_phaseSpace.momentum pz --<species>_phaseSpace.min -1.0 --<species>_phaseSpace.max 1.0"

# Sum up total energy every .period steps for
# - species   (--<species>_energy)
# - fields    (--fields_energy)
TBG_sumEnergy="--fields_energy.period 10 --<species>_energy.period 10 --<species>_energy.filter all"


# Count the number of macro particles per species for every .period steps
TBG_macroCount="--<species>_macroParticlesCount.period 100"


# Count makro particles of a species per super cell
TBG_countPerSuper="--<species>_macroParticlesPerSuperCell.period 100 --<species>_macroParticlesPerSuperCell.period 100"


# Dump simulation data (fields and particles) via the openPMD API.
# Data is dumped every .period steps to the fileset .file.
# To select only parts of data .range can be used.
TBG_openPMD="--openPMD.period 100   \
             --openPMD.file simOutput \
             --openPMD.ext bp \
             --openPMD.json '{ \"adios2\": { \"engine\": { \"type\": \"file\", \"parameters\": { \"BufferGrowthFactor\": \"1.2\", \"InitialBufferSize\": \"2GB\" } } } }' \
             --openPMD.range 0:100,:,:"
# Further control over the backends used in the openPMD plugins is available
# through the mechanisms exposed by the openPMD API:
# * environment variables
# * JSON-formatted configuration string
# Further information on both is retrieved from the official documentation
# https://openpmd-api.readthedocs.io

# Create a checkpoint that is restartable every --checkpoint.period steps
#   http://git.io/PToFYg
TBG_checkpoint="--checkpoint.period 1000"
# Time periodic checkpoint creation [period in minutes]
TBG_checkpointTime="--checkpoint.timePeriod 2"
# Select the backend for the checkpoint, available are openPMD
#    --checkpoint.backend openPMD
# Available backend options are exactly as in --openPMD.* and can be set
# via:
#   --checkpoint.<IO-backend>.* <value>
# e.g.:
#   --checkpoint.openPMD.dataPreparationStrategy doubleBuffer
# One additional parameter is available for configuring the openPMD-api via JSON
# for restarting procedures:
TBG_openPMD_restart="--checkpoint.openPMD.jsonRestart '{\"adios2\": {\"dataset\": {\"operators\": [{\"type\": \"blosc\",\"parameters\": {\"nthreads\": 8}}]}}}'"

# Restart the simulation from checkpoint created using TBG_checkpoint
TBG_restart="--checkpoint.restart"
# Try to restart if a checkpoint is available else start the simulation from scratch.
TBG_tryrestart="--checkpoint.tryRestart"
# Select the backend for the restart (must fit the created checkpoint)
#    --checkpoint.restart.backend openPMD
# By default, the last checkpoint is restarted if not specified via
#   --checkpoint.restart.step 1000
# To restart in a new run directory point to the old run where to start from
#   --checkpoint.restart.directory /path/to/simOutput/checkpoints

# Presentation mode: loop a simulation via restarts
#   does either start from 0 again or from the checkpoint specified with
#   --checkpoint.restart.step as soon as the simulation reached the last time step;
#   in the example below, the simulation is run 5000 times before it shuts down
# Note: does currently not work with `Radiation` plugin
TBG_restartLoop="--checkpoint.restart.loop 5000"

# Live in situ visualization using ISAAC
#   Initial period in which a image shall be rendered
#     --isaac.period PERIOD
#   Name of the simulation run as seen for the connected clients
#     --isaac.name NAME
#   URL of the server
#     --isaac.url URL
#   Number from 1 to 100 decribing the quality of the transceived jpeg image.
#   Smaller values are faster sent, but of lower quality
#     --isaac.quality QUALITY
#   Resolution of the rendered image. Default is 1024x768
#     --isaac.width WIDTH
#     --isaac.height HEIGHT
#   Pausing directly after the start of the simulation
#     --isaac.directPause off=0(default)|on=1
#   By default the ISAAC Plugin tries to reconnect if the sever is not available
#   at start or the servers crashes. This can be deactivated with this option
#     --isaac.reconnect false
#   Enable and write benchmark results into the given file.
#     --isaac.timingsFilename benchResults.txt
TBG_isaac="--isaac.period 1 --isaac.name !TBG_jobName --isaac.url <server_url>"
TBG_isaac_quality="--isaac.quality 90"
TBG_isaac_resolution="--isaac.width 1024 --isaac.height 768"
TBG_isaac_pause="--isaac.directPause 1"
TBG_isaac_reconnect="--isaac.reconnect false"

# Print the maximum charge deviation between particles and div E to textfile 'chargeConservation.dat':
TBG_chargeConservation="--chargeConservation.period 100"

# Particle calorimeter: (virtually) propagates and collects particles to infinite distance
TBG_<species>_calorimeter="--<species>_calorimeter.period 100 --<species>_calorimeter.openingYaw 90 --<species>_calorimeter.openingPitch 30
                        --<species>_calorimeter.numBinsEnergy 32 --<species>_calorimeter.minEnergy 10 --<species>_calorimeter.maxEnergy 1000
                        --<species>_calorimeter.logScale 1 --<species>_calorimeter.file filePrefix --<species>_calorimeter.filter all"

################################################################################
## Section: Program Parameters
## This section contains TBG internal variables, often composed from required
## variables. These should not be modified except when you know what you are doing!
################################################################################

# Number of compute devices in each dimension as "X Y Z"
TBG_deviceDist="!TBG_devices_x !TBG_devices_y !TBG_devices_z"


# Combines all declared variables. These are passed to PIConGPU as command line flags.
# The program output (stdout) is stored in a file called output.stdout.
TBG_programParams="-d !TBG_deviceDist \
                   -g !TBG_gridSize   \
                   -s !TBG_steps      \
                   !TBG_plugins"

# Total number of devices
TBG_tasks="$(( TBG_devices_x * TBG_devices_y * TBG_devices_z ))"

Batch System Examples

Section author: Axel Huebl, Richard Pausch, Klaus Steiniger

Linux workstation

PIConGPU can run on your laptop or workstation, even if there is no dedicated GPU available. In this case it will run on the CPU.

In order to run PIConGPU on your machine, use bash as the submit command, i.e. tbg -s bash -t etc/picongpu/bash/mpirun.tpl -c etc/picongpu/1.cfg $SCRATCH/picRuns/001

Slurm

Slurm is a modern batch system, e.g. installed on the Taurus cluster at TU Dresden, Hemera at HZDR, Cori at NERSC, among others.

Job Submission

PIConGPU job submission on the Taurus cluster at TU Dresden:

  • tbg -s sbatch -c etc/picongpu/0008gpus.cfg -t etc/picongpu/taurus-tud/k80.tpl $SCRATCH/runs/test-001

Job Control

  • interactive job:

    • salloc --time=1:00:00 --nodes=1 --ntasks-per-node=2 --cpus-per-task=8 --partition gpu-interactive

    • e.g. srun "hostname"

    • GPU allocation on taurus requires an additional flag, e.g. for two GPUs --gres=gpu:2

  • details for my jobs:

    • scontrol -d show job 12345 all details for job with <job id> 12345

    • squeue -u $(whoami) -l all jobs under my user name

  • details for queues:

    • squeue -p queueName -l list full queue

    • squeue -p queueName --start (show start times for pending jobs)

    • squeue -p queueName -l -t R (only show running jobs in queue)

    • sinfo -p queueName (show online/offline nodes in queue)

    • sview (alternative on taurus: module load llview and llview)

    • scontrol show partition queueName

  • communicate with job:

    • scancel <job id> abort job

    • scancel -s <signal number> <job id> send signal or signal name to job

    • scontrol update timelimit=4:00:00 jobid=12345 change the walltime of a job

    • scontrol update jobid=12345 dependency=afterany:54321 only start job 12345 after job with id 54321 has finished

    • scontrol hold <job id> prevent the job from starting

    • scontrol release <job id> release the job to be eligible for run (after it was set on hold)

LSF

LSF (for Load Sharing Facility) is an IBM batch system (bsub/BSUB). It is used, e.g. on Summit at ORNL.

Job Submission

PIConGPU job submission on the Summit cluster at Oak Ridge National Lab:

  • tbg -s bsub -c etc/picongpu/0008gpus.cfg -t etc/picongpu/summit-ornl/gpu.tpl $PROJWORK/$proj/test-001

Job Control

  • interactive job:

    • bsub -P $proj -W 2:00 -nnodes 1 -Is /bin/bash

  • details for my jobs:

    • bjobs 12345 all details for job with <job id> 12345

    • bjobs [-l] all jobs under my user name

    • jobstat -u $(whoami) job eligibility

    • bjdepinfo 12345 job dependencies on other jobs

  • details for queues:

    • bqueues list queues

  • communicate with job:

    • bkill <job id> abort job

    • bpeek [-f] <job id> peek into stdout/stderr of a job

    • bkill -s <signal number> <job id> send signal or signal name to job

    • bchkpnt and brestart checkpoint and restart job (untested/unimplemented)

    • bmod -W 1:30 12345 change the walltime of a job (currently not allowed)

    • bstop <job id> prevent the job from starting

    • bresume <job id> release the job to be eligible for run (after it was set on hold)

References