TBG

todo: explain idea and use case

  • what is a batch system
  • cfg files
  • tpl files
  • behaviour (existing dirs, submission, environment)

Usage

TBG (template batch generator)
create a new folder for a batch job and copy in all important files

usage: tbg -c [cfgFile] [-s [submitsystem]] [-t [templateFile]]
          [-o "VARNAME1=10 VARNAME2=5"] [-h]
          [projectPath] destinationPath

-c | --cfg      [file]         - Configuration file to set up batch file.
                                 Default: [cfgFile] via export TBG_CFGFILE
-s | --submit   [command]      - Submit command (qsub, "qsub -h", sbatch, ...)
                                 Default: [submitsystem] via export TBG_SUBMIT
-t | --tpl      [file]         - Template to create a batch file from.
                                 tbg will use stdin, if no file is specified.
                                 Default: [templateFile] via export TBG_TPLFILE
-o                             - Overwrite any template variable:
                                 spaces within the right side of assign are not allowed
                                 e.g. -o "VARNAME1=10 VARNAME2=5"
                                 Overwriting is done after cfg file was executed
-h | --help                    - Shows help (this output).

[projectPath]                  - Project directory containing source code and
                                 binaries
                                 Default: current directory
destinationPath                - Directory for simulation output. 
 
 
TBG exports the following variables, which can be used in cfg and tpl files at
any time:
 TBG_jobName                   - name of the job
 TBG_jobNameShort              - short name of the job, without blanks
 TBG_cfgPath                   - absolute path to cfg file
 TBG_cfgFile                   - full absolute path and name of cfg file
 TBG_projectPath               - absolute project path (see optional parameter
                                 projectPath)
 TBG_dstPath                   - absolute path to destination directory

Example with Slurm

Job Submission

PIConGPU job submission on the Taurus cluster at TU Dresden:

  • tbg -s sbatch -c submit/0008gpus.cfg -t submit/taurus-tud/k80_profile.tpl $SCRATCH/runs/test123

Job Control

  • interactive job:
    • salloc –time=1:00:00 –nodes=1 –ntasks-per-node=2 –cpus-per-task=8 –partition gpu-interactive
    • e.g. srun “hostname”
    • GPU allocation on taurus requires an additional flag, e.g. for two GPUs –gres=gpu:2
  • details for my jobs:
    • scontrol -d show job 12345
    • ` squeue -u `whoami` -l`
  • details for queues:
    • squeue -p queueName -l (list full queue)
    • squeue -p queueName –start (show start times for pending jobs)
    • squeue -p queueName -l -t R (only show running jobs in queue)
    • sinfo -p queueName (show online/offline nodes in queue)
    • sview (alternative on taurus: module load llview and llview)
    • scontrol show partition queueName
  • communicate with job:
    • scancel 12345 abort job
    • scancel -s Number 12345 send signal or signal name to job
    • scontrol update timelimit=4:00:00 jobid=12345 change the walltime of the job
    • scontrol update jobid=12345 dependency=afterany:54321 only start the job after job with id 54321 has finished
    • scontrol hold jobid=12345 prevent the job from starting
    • scontrol release jobid=12345 or in short scontrol release 12345 release the job to be eligible for run (after it was set on hold)

.cfg File Macros

Feel free to copy & paste sections of the files below into your .cfg, e.g. to configure complex plugins:

# Copyright 2014-2017 Felix Schmitt, Axel Huebl, Richard Pausch, Heiko Burau
#
# This file is part of PIConGPU.
#
# PIConGPU is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# PIConGPU is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with PIConGPU.
# If not, see <http://www.gnu.org/licenses/>.

################################################################################
## This file describes sections and variables for PIConGPU's
## TBG batch file generator.
## These variables basically wrap PIConGPU command line flags.
## To see all flags available for your PIConGPU binary, run
## picongpu --help. The avalable flags depend on your configuration flags.
##
## Flags that target a specific species e.g. electrons (--e_png) or ions
## (--i_png) must only be used if the respective species is activated (configure flags).
##
## If not stated otherwise, variables/flags must not be used more than once!
################################################################################

################################################################################
## Section: Required Variables
## Variables in this secton are necessary for PIConGPU to work properly and should not
## be removed. However, you are free to adjust them to your needs, e.g. setting
## the number of GPUs in each dimension.
################################################################################

# Batch system walltime
TBG_wallTime="1:00:00"

# Number of GPUs in each dimension (x,y,z) to use for the simulation
TBG_gpu_x=1
TBG_gpu_y=2
TBG_gpu_z=1

# Size of the simulation grid in cells as "-g X Y Z"
# note: the number of cells needs to be an exact multiple of a supercell
#       and has to be at least 3 supercells per GPU,
#       the size of a supercell (in cells) is defined in `memory.param`
TBG_gridSize="-g 128 256 128"

# Number of simulation steps/iterations as "-s N"
TBG_steps="-s 100"


################################################################################
## Section: Optional Variables
## You are free to add and remove variables here as you like.
## The only exception is TBG_plugins which is used to forward your variables
## to the TBG program. This variable can be modified but should not be removed!
##
## Please add all variables you define in this section to TBG_plugins.
################################################################################

# Variables which are created by TBG (should be self-descriptive)
TBG_jobName
TBG_jobNameShort
TBG_cfgPath
TBG_cfgFile
TBG_projectPath
TBG_dstPath


# Regex to describe the static distribution of the cells for each GPU
# default: equal distribution over all GPUs
# example for -d 2 4 1 -g 128 192 12
TBG_gridDist="--gridDist '64{2}' '64,32{2},64'"


# Specifies whether the grid is periodic (1) or not (0) in each dimension (X,Y,Z).
# Default: no periodic dimensions
TBG_periodic="--periodic 1 0 1"


# Enables moving window (sliding) in your simulation
TBG_movingWindow="-m"

################################################################################
## Placeholder for multi data plugins:
##
## placeholders must be substituted with the real data name
##
## <species> = species name e.g. e (electrons), i (ions)
## <field>  = field names e.g. FieldE, FieldB, FieldJ
################################################################################

# The following flags are available for the radiation plugin.
# For a full description, see the plugins section in the online wiki.
#--<species>_radiation.period     Radiation is calculated every .period steps. Currently 0 or 1
#--<species>_radiation.dump     Period, after which the calculated radiation data should be dumped to the file system
#--<species>_radiation.lastRadiation     If flag is set, the spectra summed between the last and the current dump-time-step are stored
#--<species>_radiation.folderLastRad     Folder in which the summed spectra are stored
#--<species>_radiation.totalRadiation     If flag is set, store spectra summed from simulation start till current time step
#--<species>_radiation.folderTotalRad     Folder in which total radiation spectra are stored
#--<species>_radiation.start     Time step to start calculating the radition
#--<species>_radiation.end     Time step to stop calculating the radiation
#--<species>_radiation.omegaList     If spectrum frequencies are taken from a file, this gives the path to this list
#--<species>_radiation.radPerGPU     If flag is set, each GPU stores its own spectra without summing the entire simulation area
#--<species>_radiation.folderRadPerGPU     Folder where the GPU specific spectras are stored
#--e_<species>_radiation.compression    If flag is set, the hdf5 output will be compressed.
TBG_radiation="--<species>_radiation.period 1 --<species>_radiation.dump 2 --<species>_radiation.totalRadiation \
               --<species>_radiation.lastRadiation --<species>_radiation.start 2800 --<species>_radiation.end 3000"


# Create 2D images in PNG format every .period steps.
# The slice plane is defined using .axis [yx,yz] and .slicePoint (offset from origin
# as a float within [0.0,1.0].
# The output folder can be set with .folder.
# Can be used more than once to print different images, e.g. for YZ and YX planes.
TBG_<species>_pngYZ="--<species>_png.period 10 --<species>_png.axis yz --<species>_png.slicePoint 0.5 --<species>_png.folder pngElectronsYZ"
TBG_<species>_pngYX="--<species>_png.period 10 --<species>_png.axis yx --<species>_png.slicePoint 0.5 --<species>_png.folder pngElectronsYX"


# Notification period of position plugin (single-particle debugging)
TBG_<species>_pos_dbg="--<species>_position.period 1"


# Create a particle-energy histogram [in keV] per species for every .period steps
TBG_<species>_histogram="--<species>_energyHistogram.period 500 --<species>_energyHistogram.binCount 1024    \
                         --<species>_energyHistogram.minEnergy 0 --<species>_energyHistogram.maxEnergy 500000"


# Calculate a 2D phase space
# - requires parallel libSplash for HDF5 output
# - momentum range in m_<species> c
TBG_<species>_PSxpx="--<species>_phaseSpace.period 10 --<species>_phaseSpace.space x --<species>_phaseSpace.momentum px --<species>_phaseSpace.min -1.0 --<species>_phaseSpace.max 1.0"
TBG_<species>_PSxpz="--<species>_phaseSpace.period 10 --<species>_phaseSpace.space x --<species>_phaseSpace.momentum pz --<species>_phaseSpace.min -1.0 --<species>_phaseSpace.max 1.0"
TBG_<species>_PSypx="--<species>_phaseSpace.period 10 --<species>_phaseSpace.space y --<species>_phaseSpace.momentum px --<species>_phaseSpace.min -1.0 --<species>_phaseSpace.max 1.0"
TBG_<species>_PSypy="--<species>_phaseSpace.period 10 --<species>_phaseSpace.space y --<species>_phaseSpace.momentum py --<species>_phaseSpace.min -1.0 --<species>_phaseSpace.max 1.0"
TBG_<species>_PSypz="--<species>_phaseSpace.period 10 --<species>_phaseSpace.space y --<species>_phaseSpace.momentum pz --<species>_phaseSpace.min -1.0 --<species>_phaseSpace.max 1.0"


# Sum up total energy every .period steps for
# - species   (--<species>_energy)
# - fields    (--fields_energy)
TBG_sumEnergy="--fields_energy.period 10 --<species>_energy.period 10"


# Count the number of macro particles per species for every .period steps
TBG_macroCount="--<species>_macroParticlesCount.period 100"


# Count makro particles of a species per super cell
TBG_countPerSuper="--<species>_macroParticlesPerSuperCell.period 100 --<species>_macroParticlesPerSuperCell.period 100"

# Dump simulation data (fields and particles) to HDF5 files using libSplash.
# Data is dumped every .period steps to the fileset .file.
TBG_hdf5="--hdf5.period 100 --hdf5.file simData"

# Dump simulation data (fields and particles) to ADIOS files.
# Data is dumped every .period steps to the fileset .file.
TBG_adios="--adios.period 100 --adios.file simData"
# see 'adios_config -m', e.g., for on-the-fly zlib compression
#     (compile ADIOS with --with-zlib=<ZLIB_ROOT>)
#   --adios.compression zlib
# for parallel large-scale parallel file-systems:
#   --adios.aggregators <N * 3> --adios.ost <N>
# avoid writing meta file on massively parallel runs
#   --adios.disable-meta
# specify further options for the transports, see ADIOS manual
# chapter 6.1.5, e.g., 'random_offset=1;stripe_count=4'
#                      (FS chooses OST;user chooses striping factor)
#   --adios.transport-params "semicolon_separated_list"

# Create a checkpoint that is restartable every --checkpoints steps
#   http://git.io/PToFYg
TBG_checkpoints="--checkpoints 1000"

# Restart the simulation from checkpoints created using TBG_checkpoints
TBG_restart="--restart"
# By default, the last checkpoint is restarted if not specified via
#   --restart-step 1000
# To restart in a new run directory point to the old run where to start from
#   --restart-directory /path/to/simOutput/checkpoints

# Presentation mode: loop a simulation via soft restart
#   does either start from 0 again or from the checkpoint specified with
#   --restart-step as soon as the simulation reached the last time step;
#   in the example below, the simulation is run 5000 times before it shuts down
# Note: does currently not work with `Radiation` plugin
TBG_softRestarts="--softRestarts 5000"

# Live in situ visualization using ISAAC
#   Initial period in which a image shall be rendered
#     --isaac.period PERIOD
#   Name of the simulation run as seen for the connected clients
#     --isaac.name NAME
#   URL of the server
#     --isaac.url URL
#   Number from 1 to 100 decribing the quality of the transceived jpeg image.
#   Smaller values are faster sent, but of lower quality
#     --isaac.quality QUALITY
#   Resolution of the rendered image. Default is 1024x768
#     --isaac.width WIDTH
#     --isaac.height HEIGHT
#   Pausing directly after the start of the simulation
#     --isaac.directPause
#   By default the ISAAC Plugin tries to reconnect if the sever is not available
#   at start or the servers crashes. This can be deactivated with this option
#     --isaac.reconnect false
TBG_isaac="--isaac.period 1 --isaac.name !TBG_jobName --isaac.url <server_url>"
TBG_isaac_quality="--isaac.quality 90"
TBG_isaac_resolution="--isaac.width 1024 --isaac.height 768"
TBG_isaac_pause="--isaac.directPause"
TBG_isaac_reconnect="--isaac.reconnect false"

# Connect to a live-view server (start the server in advance)
TBG_liveViewYX="--<species>_liveView.period 1 --<species>_liveView.slicePoint 0.5 --<species>_liveView.ip 10.0.2.254 \
                --<species>_liveView.port 2020 --<species>_liveView.axis yx"
TBG_liveViewYZ="--<species>_liveView.period 1 --<species>_liveView.slicePoint 0.5 --<species>_liveView.ip 10.0.2.254 \
                --<species>_liveView.port 2021 --<species>_liveView.axis yz"


# Print the maximum charge deviation between particles and div E to textfile 'chargeConservation.dat':
TBG_chargeConservation="--chargeConservation.period 100"

# Particle calorimeter: (virtually) propagates and collects particles to infinite distance
TBG_<species>_calorimeter="--<species>_calorimeter.period 100 --<species>_calorimeter.openingYaw 90 --<species>_calorimeter.openingPitch 30
                        --<species>_calorimeter.numBinsEnergy 32 --<species>_calorimeter.minEnergy 10 --<species>_calorimeter.maxEnergy 1000
                        --<species>_calorimeter.logScale"

# Resource log: log resource information to streams or files
# set the resources to log by --resourceLog.properties [rank, position, currentStep, particleCount, cellCount]
# set the output stream by --resourceLog.stream [stdout, stderr, file]
# set the prefix of filestream --resourceLog.prefix [prefix]
# set the output format by (pp == pretty print) --resourceLog.format jsonpp [json,jsonpp,xml,xmlpp]
# The example below logs all resources for each time step to stdout in the pretty print json format
TBG_resourceLog="--resourceLog.period 1 --resourceLog.stream stdout
                 --resourceLog.properties rank position currentStep particleCount cellCount
                 --resourceLog.format jsonpp"

################################################################################
## Section: Program Parameters
## This section contains TBG internal variables, often composed from required
## variables. These should not be modified except when you know what you are doing!
################################################################################

# Number of compute devices in each dimension as "-d X Y Z"
TBG_devices="-d !TBG_gpu_x !TBG_gpu_y !TBG_gpu_z"


# Combines all declared variables. These are passed to PIConGPU as command line flags.
# The program output (stdout) is stored in a file called output.stdout.
TBG_programParams="!TBG_devices     \
                   !TBG_gridSize    \
                   !TBG_steps       \
                   !TBG_plugins"

# Total number of GPUs
TBG_tasks="$(( TBG_gpu_x * TBG_gpu_y * TBG_gpu_z ))"