TBG

Section author: Axel Huebl

Module author: René Widera

Our tool template batch generator (tbg) abstracts program runtime options from technical details of supercomputers. On a desktop PC, one can just execute a command interactively and instantaneously. Contrarily on a supercomputer, resources need to be shared between different users efficiently via job scheduling. Scheduling on today’s supercomputers is usually done via batch systems that define various queues of resources.

An unfortunate aspect about batch systems from a user’s perspective is, that their usage varies a lot. And naturally, different systems have different resources in queues that need to be described.

PIConGPU runtime options are described in configuration files (.cfg). We abstract the description of queues, resource acquisition and job submission via template files (.tpl). For example, a .cfg file defines how many devices shall be used for computation, but a .tpl file calculates how many physical nodes will be requested. Also, .tpl files takes care of how to spawn a process when scheduled, e.g. with mpiexec and which flags for networking details need to be passed. After combining the machine independent (portable) .cfg file from user input with the machine dependent .tpl file, tbg can submit the requested job to the batch system.

Last but not least, one usually wants to store the input of a simulation with its output. tbg conveniently automates this task before submission.

In summary, PIConGPU runtime options in .cfg files are portable to any machine. When accessing a machine for the first time, one needs to write template .tpl files, abstractly describing how to run PIConGPU on the specific queue(s) of the batch system. We ship such template files already for a set of supercomputers, interactive execution and many common batch systems. See $PICSRC/etc/picongpu/ and our list of systems with .profile files for details.

Usage

TBG (template batch generator)
create a new folder for a batch job and copy in all important files

usage: tbg -c [cfgFile] [-s [submitsystem]] [-t [templateFile]]
          [-o "VARNAME1=10 VARNAME2=5"] [-f] [-h]
          [projectPath] destinationPath

-c | --cfg      [file]         - Configuration file to set up batch file.
                                 Default: [cfgFile] via export TBG_CFGFILE
-s | --submit   [command]      - Submit command (qsub, "qsub -h", sbatch, ...)
                                 Default: [submitsystem] via export TBG_SUBMIT
-t | --tpl      [file]         - Template to create a batch file from.
                                 tbg will use stdin, if no file is specified.
                                 Default: [templateFile] via export TBG_TPLFILE
-o                             - Overwrite any template variable:
                                 spaces within the right side of assign are not allowed
                                 e.g. -o "VARNAME1=10 VARNAME2=5"
                                 Overwriting is done after cfg file was executed
-f | --force                   - Override if 'destinationPath' exists.
-h | --help                    - Shows help (this output).

[projectPath]                  - Project directory containing source code and
                                 binaries
                                 Default: current directory
destinationPath                - Directory for simulation output. 
 
 
TBG exports the following variables, which can be used in cfg and tpl files at
any time:
 TBG_jobName                   - name of the job
 TBG_jobNameShort              - short name of the job, without blanks
 TBG_cfgPath                   - absolute path to cfg file
 TBG_cfgFile                   - full absolute path and name of cfg file
 TBG_projectPath               - absolute project path (see optional parameter
                                 projectPath)
 TBG_dstPath                   - absolute path to destination directory

.cfg File Macros

Feel free to copy & paste sections of the files below into your .cfg, e.g. to configure complex plugins:

# Copyright 2014-2018 Felix Schmitt, Axel Huebl, Richard Pausch, Heiko Burau
#
# This file is part of PIConGPU.
#
# PIConGPU is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# PIConGPU is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with PIConGPU.
# If not, see <http://www.gnu.org/licenses/>.

################################################################################
## This file describes sections and variables for PIConGPU's
## TBG batch file generator.
## These variables basically wrap PIConGPU command line flags.
## To see all flags available for your PIConGPU binary, run
## picongpu --help. The avalable flags depend on your configuration flags.
##
## Flags that target a specific species e.g. electrons (--e_png) or ions
## (--i_png) must only be used if the respective species is activated (configure flags).
##
## If not stated otherwise, variables/flags must not be used more than once!
################################################################################

################################################################################
## Section: Required Variables
## Variables in this secton are necessary for PIConGPU to work properly and should not
## be removed. However, you are free to adjust them to your needs, e.g. setting
## the number of GPUs in each dimension.
################################################################################

# Batch system walltime
TBG_wallTime="1:00:00"

# Number of devices in each dimension (x,y,z) to use for the simulation
TBG_devices_x=1
TBG_devices_y=2
TBG_devices_z=1

# Size of the simulation grid in cells as "X Y Z"
# note: the number of cells needs to be an exact multiple of a supercell
#       and has to be at least 3 supercells per device,
#       the size of a supercell (in cells) is defined in `memory.param`
TBG_gridSize="128 256 128"

# Number of simulation steps/iterations as "N"
TBG_steps="100"


################################################################################
## Section: Optional Variables
## You are free to add and remove variables here as you like.
## The only exception is TBG_plugins which is used to forward your variables
## to the TBG program. This variable can be modified but should not be removed!
##
## Please add all variables you define in this section to TBG_plugins.
################################################################################

# Variables which are created by TBG (should be self-descriptive)
TBG_jobName
TBG_jobNameShort
TBG_cfgPath
TBG_cfgFile
TBG_projectPath
TBG_dstPath


# version information on startup
TBG_version="--versionOnce"


# Regex to describe the static distribution of the cells for each device
# default: equal distribution over all devices
# example for -d 2 4 1 -g 128 192 12
TBG_gridDist="--gridDist '64{2}' '64,32{2},64'"


# Specifies whether the grid is periodic (1) or not (0) in each dimension (X,Y,Z).
# Default: no periodic dimensions
TBG_periodic="--periodic 1 0 1"


# Enables moving window (sliding) in your simulation
TBG_movingWindow="-m"

################################################################################
## Placeholder for multi data plugins:
##
## placeholders must be substituted with the real data name
##
## <species> = species name e.g. e (electrons), i (ions)
## <field>  = field names e.g. FieldE, FieldB, FieldJ
################################################################################

# The following flags are available for the radiation plugin.
# For a full description, see the plugins section in the online wiki.
#--<species>_radiation.period     Radiation is calculated every .period steps. Currently 0 or 1
#--<species>_radiation.dump     Period, after which the calculated radiation data should be dumped to the file system
#--<species>_radiation.lastRadiation     If flag is set, the spectra summed between the last and the current dump-time-step are stored
#--<species>_radiation.folderLastRad     Folder in which the summed spectra are stored
#--<species>_radiation.totalRadiation     If flag is set, store spectra summed from simulation start till current time step
#--<species>_radiation.folderTotalRad     Folder in which total radiation spectra are stored
#--<species>_radiation.start     Time step to start calculating the radition
#--<species>_radiation.end     Time step to stop calculating the radiation
#--<species>_radiation.omegaList     If spectrum frequencies are taken from a file, this gives the path to this list
#--<species>_radiation.radPerGPU     If flag is set, each GPU stores its own spectra without summing the entire simulation area
#--<species>_radiation.folderRadPerGPU     Folder where the GPU specific spectras are stored
#--e_<species>_radiation.compression    If flag is set, the hdf5 output will be compressed.
TBG_radiation="--<species>_radiation.period 1 --<species>_radiation.dump 2 --<species>_radiation.totalRadiation \
               --<species>_radiation.lastRadiation --<species>_radiation.start 2800 --<species>_radiation.end 3000"


# Create 2D images in PNG format every .period steps.
# The slice plane is defined using .axis [yx,yz] and .slicePoint (offset from origin
# as a float within [0.0,1.0].
# The output folder can be set with .folder.
# Can be used more than once to print different images, e.g. for YZ and YX planes.
TBG_<species>_pngYZ="--<species>_png.period 10 --<species>_png.axis yz --<species>_png.slicePoint 0.5 --<species>_png.folder pngElectronsYZ"
TBG_<species>_pngYX="--<species>_png.period 10 --<species>_png.axis yx --<species>_png.slicePoint 0.5 --<species>_png.folder pngElectronsYX"

# Enable macro particle merging
TBG_<species>_merger="--<species>_merger.period 100 --<species>_merger.minParticlesToMerge 8 --<species>_merger.posSpreadThreshold 0.2 --<species>_merger.absMomSpreadThreshold 0.01"


# Notification period of position plugin (single-particle debugging)
TBG_<species>_pos_dbg="--<species>_position.period 1"


# Create a particle-energy histogram [in keV] per species for every .period steps
TBG_<species>_histogram="--<species>_energyHistogram.period 500 --<species>_energyHistogram.binCount 1024     \
                         --<species>_energyHistogram.minEnergy 0 --<species>_energyHistogram.maxEnergy 500000 \
                         --<species>_energyHistogram.filter all"


# Calculate a 2D phase space
# - requires parallel libSplash for HDF5 output
# - momentum range in m_<species> c
TBG_<species>_PSxpx="--<species>_phaseSpace.period 10 --<species>_phaseSpace.filter all --<species>_phaseSpace.space x --<species>_phaseSpace.momentum px --<species>_phaseSpace.min -1.0 --<species>_phaseSpace.max 1.0"
TBG_<species>_PSxpz="--<species>_phaseSpace.period 10 --<species>_phaseSpace.filter all --<species>_phaseSpace.space x --<species>_phaseSpace.momentum pz --<species>_phaseSpace.min -1.0 --<species>_phaseSpace.max 1.0"
TBG_<species>_PSypx="--<species>_phaseSpace.period 10 --<species>_phaseSpace.filter all --<species>_phaseSpace.space y --<species>_phaseSpace.momentum px --<species>_phaseSpace.min -1.0 --<species>_phaseSpace.max 1.0"
TBG_<species>_PSypy="--<species>_phaseSpace.period 10 --<species>_phaseSpace.filter all --<species>_phaseSpace.space y --<species>_phaseSpace.momentum py --<species>_phaseSpace.min -1.0 --<species>_phaseSpace.max 1.0"
TBG_<species>_PSypz="--<species>_phaseSpace.period 10 --<species>_phaseSpace.filter all --<species>_phaseSpace.space y --<species>_phaseSpace.momentum pz --<species>_phaseSpace.min -1.0 --<species>_phaseSpace.max 1.0"

# Write out slices of field data for every .period step
TBG_EField_slice="--E_slice.period 100 --E_slice.fileName sliceE --E_slice.plane 2 --E_slice.slicePoint 0.5"
TBG_BField_slice="--B_slice.period 100 --B_slice.fileName sliceB --B_slice.plane 2 --B_slice.slicePoint 0.5"
TBG_JField_slice="--J_slice.period 100 --J_slice.fileName sliceJ --J_slice.plane 2 --J_slice.slicePoint 0.5"

# Sum up total energy every .period steps for
# - species   (--<species>_energy)
# - fields    (--fields_energy)
TBG_sumEnergy="--fields_energy.period 10 --<species>_energy.period 10 --<species>_energy.filter all"


# Count the number of macro particles per species for every .period steps
TBG_macroCount="--<species>_macroParticlesCount.period 100"


# Count makro particles of a species per super cell
TBG_countPerSuper="--<species>_macroParticlesPerSuperCell.period 100 --<species>_macroParticlesPerSuperCell.period 100"

# Dump simulation data (fields and particles) to HDF5 files using libSplash.
# Data selected in .source is dumped every .period steps to the fileset .file.
TBG_hdf5="--hdf5.period 100 --hdf5.file simData --hdf5.source 'species_all,fields_all'"

# Dump simulation data (fields and particles) to ADIOS files.
# Data is dumped every .period steps to the fileset .file.
TBG_adios="--adios.period 100 --adios.file simData --adios.source 'species_all,fields_all'"
# see 'adios_config -m', e.g., for on-the-fly zlib compression
#     (compile ADIOS with --with-zlib=<ZLIB_ROOT>)
#   --adios.compression zlib
# or
#   --adios.compression blosc:threshold=2048,shuffle=bit,lvl=1,threads=6,compressor=zstd
# for parallel large-scale parallel file-systems:
#   --adios.aggregators <N * 3> --adios.ost <N>
# avoid writing meta file on massively parallel runs
#   --adios.disable-meta <B>
#   B = 0 is equal to false, B = 1 is true
# specify further options for the transports, see ADIOS manual
# chapter 6.1.5, e.g., 'random_offset=1;stripe_count=4'
#                      (FS chooses OST;user chooses striping factor)
#   --adios.transport-params "semicolon_separated_list"
# select data sources for the dump
#   --adios.source <comma_separated_list_of_data_sources>

# Create a checkpoint that is restartable every --checkpoint.period steps
#   http://git.io/PToFYg
TBG_checkpoint="--checkpoint.period 1000"
# Select the backend for the checkpoint, available are hdf5 and adios
#    --checkpoint.backend adios
#                         hdf5
# Available backend options are exactly as in --adios.* and --hdf5.* and can be set
# via:
#   --checkpoint.<IO-backend>.* <value>
# e.g.:
#   --checkpoint.adios.compression zlib
#   --checkpoint.adios.disable-meta 1
# Note: if you disable ADIOS meta files in checkpoints, make sure to run
#       `bpmeta` on your checkpoints before restarting from them!

# Restart the simulation from checkpoint created using TBG_checkpoint
TBG_restart="--checkpoint.restart"
# Select the backend for the restart (must fit the created checkpoint)
#    --checkpoint.restart.backend adios
#                                 hdf5
# By default, the last checkpoint is restarted if not specified via
#   --checkpoint.restart.step 1000
# To restart in a new run directory point to the old run where to start from
#   --checkpoint.restart.directory /path/to/simOutput/checkpoints

# Presentation mode: loop a simulation via restarts
#   does either start from 0 again or from the checkpoint specified with
#   --checkpoint.restart.step as soon as the simulation reached the last time step;
#   in the example below, the simulation is run 5000 times before it shuts down
# Note: does currently not work with `Radiation` plugin
TBG_restartLoop="--checkpoint.restart.loop 5000"

# Live in situ visualization using ISAAC
#   Initial period in which a image shall be rendered
#     --isaac.period PERIOD
#   Name of the simulation run as seen for the connected clients
#     --isaac.name NAME
#   URL of the server
#     --isaac.url URL
#   Number from 1 to 100 decribing the quality of the transceived jpeg image.
#   Smaller values are faster sent, but of lower quality
#     --isaac.quality QUALITY
#   Resolution of the rendered image. Default is 1024x768
#     --isaac.width WIDTH
#     --isaac.height HEIGHT
#   Pausing directly after the start of the simulation
#     --isaac.directPause
#   By default the ISAAC Plugin tries to reconnect if the sever is not available
#   at start or the servers crashes. This can be deactivated with this option
#     --isaac.reconnect false
TBG_isaac="--isaac.period 1 --isaac.name !TBG_jobName --isaac.url <server_url>"
TBG_isaac_quality="--isaac.quality 90"
TBG_isaac_resolution="--isaac.width 1024 --isaac.height 768"
TBG_isaac_pause="--isaac.directPause"
TBG_isaac_reconnect="--isaac.reconnect false"

# Print the maximum charge deviation between particles and div E to textfile 'chargeConservation.dat':
TBG_chargeConservation="--chargeConservation.period 100"

# Particle calorimeter: (virtually) propagates and collects particles to infinite distance
TBG_<species>_calorimeter="--<species>_calorimeter.period 100 --<species>_calorimeter.openingYaw 90 --<species>_calorimeter.openingPitch 30
                        --<species>_calorimeter.numBinsEnergy 32 --<species>_calorimeter.minEnergy 10 --<species>_calorimeter.maxEnergy 1000
                        --<species>_calorimeter.logScale 1 --<species>_calorimeter.file filePrefix --<species>_calorimeter.filter all"

# Resource log: log resource information to streams or files
# set the resources to log by --resourceLog.properties [rank, position, currentStep, particleCount, cellCount]
# set the output stream by --resourceLog.stream [stdout, stderr, file]
# set the prefix of filestream --resourceLog.prefix [prefix]
# set the output format by (pp == pretty print) --resourceLog.format jsonpp [json,jsonpp,xml,xmlpp]
# The example below logs all resources for each time step to stdout in the pretty print json format
TBG_resourceLog="--resourceLog.period 1 --resourceLog.stream stdout
                 --resourceLog.properties rank position currentStep particleCount cellCount
                 --resourceLog.format jsonpp"

################################################################################
## Section: Program Parameters
## This section contains TBG internal variables, often composed from required
## variables. These should not be modified except when you know what you are doing!
################################################################################

# Number of compute devices in each dimension as "X Y Z"
TBG_deviceDist="!TBG_devices_x !TBG_devices_y !TBG_devices_z"


# Combines all declared variables. These are passed to PIConGPU as command line flags.
# The program output (stdout) is stored in a file called output.stdout.
TBG_programParams="-d !TBG_deviceDist \
                   -g !TBG_gridSize   \
                   -s !TBG_steps      \
                   !TBG_plugins"

# Total number of devices
TBG_tasks="$(( TBG_devices_x * TBG_devices_y * TBG_devices_z ))"

Batch System Examples

Section author: Axel Huebl, Richard Pausch

Slurm

Slurm is a modern batch system, e.g. installed on the Taurus cluster at TU Dresden.

Job Submission

PIConGPU job submission on the Taurus cluster at TU Dresden:

  • tbg -s sbatch -c etc/picongpu/0008gpus.cfg -t etc/picongpu/taurus-tud/k80.tpl $SCRATCH/runs/test-001

Job Control

  • interactive job:
    • salloc --time=1:00:00 --nodes=1 --ntasks-per-node=2 --cpus-per-task=8 --partition gpu-interactive
    • e.g. srun "hostname"
    • GPU allocation on taurus requires an additional flag, e.g. for two GPUs --gres=gpu:2
  • details for my jobs:
    • scontrol -d show job 12345 all details for job with <job id> 12345
    • squeue -u $(whoami) -l all jobs under my user name
  • details for queues:
    • squeue -p queueName -l list full queue
    • squeue -p queueName --start (show start times for pending jobs)
    • squeue -p queueName -l -t R (only show running jobs in queue)
    • sinfo -p queueName (show online/offline nodes in queue)
    • sview (alternative on taurus: module load llview and llview)
    • scontrol show partition queueName
  • communicate with job:
    • scancel <job id> abort job
    • scancel -s <signal number> <job id> send signal or signal name to job
    • scontrol update timelimit=4:00:00 jobid=12345 change the walltime of a job
    • scontrol update jobid=12345 dependency=afterany:54321 only start job 12345 after job with id 54321 has finished
    • scontrol hold <job id> prevent the job from starting
    • scontrol release <job id> release the job to be eligible for run (after it was set on hold)

PBS

PBS (for Portable Batch System) is a widely distributed batch system that comes in several implementations (open, professional, etc.). It is used, e.g. on Hypnos at HZDR.

Job Submission

PIConGPU job submission on the Hypnos cluster at HZDR:

  • tbg -s qsub -c etc/picongpu/0008gpus.cfg -t etc/picongpu/hypnos-hzdr/k20.tpl /bigdata/hplsim/<...>/test-001

Where <...> is one of:

  • external/$(whoami)
  • internal:
    • scratch/$(whoami)
    • development/$(whoami)
    • production/<project name>

Job Control

  • interactive job:
    • qsub -I -q k20 -lwalltime=12:00:00 -lnodes=1:ppn=8
  • details for my jobs:
    • qstat -f 12345 all details for job with <job id> 12345
    • qstat -u $(whoami) all jobs under my user name
  • details for queues:
    • qstat -a queueName show all jobs in a queue
    • pbs_free -l compact view on free and busy nodes
    • pbsnodes list all nodes and their detailed state (free, busy/job-exclusive, offline)
  • communicate with job:
    • qdel <job id> abort job
    • qsig -s <signal number> <job id> send signal or signal name to job
    • qalter -lwalltime=12:00:00 <job id> change the walltime of a job
    • qalter -Wdepend=afterany:54321 12345 only start job 12345 after job with id 54321 has finished
    • qhold <job id> prevent the job from starting
    • qrls <job id> release the job to be eligible for run (after it was set on hold)