Define low-level memory settings for compute devices.

Settings for memory layout for supercells and particle frame-lists, data exchanges in multi-device domain-decomposition and reserved fields for temporarily derived quantities are defined here.

namespace picongpu


using SuperCellSize = typename mCT::shrinkTo<mCT::Int<8, 8, 4>, simDim>::type

size of a superCell

volume of a superCell must be <= 1024

using MappingDesc = MappingDescription<simDim, SuperCellSize>

define mapper which is used for kernel call mappings

using GuardSize = typename mCT::shrinkTo<mCT::Int<1, 1, 1>, simDim>::type

define the size of the core, border and guard area

PIConGPU uses spatial domain-decomposition for parallelization over multiple devices with non-shared memory architecture. The global spatial domain is organized per device in three sections: the GUARD area contains copies of neighboring devices (also known as “halo”/”ghost”). The BORDER area is the outermost layer of cells of a device, equally to what neighboring devices see as GUARD area. The CORE area is the innermost area of a device. In union with the BORDER area it defines the “active” spatial domain on a device.

GuardSize is defined in units of SuperCellSize per dimension.


constexpr size_t reservedGpuMemorySize = 350 * 1024 * 1024
constexpr uint32_t fieldTmpNumSlots = 1

number of scalar fields that are reserved as temporary fields

constexpr bool fieldTmpSupportGatherCommunication = true

can FieldTmp gather neighbor information

If true it is possible to call the method asyncCommunicationGather() to copy data from the border of neighboring GPU into the local guard. This is also known as building up a “ghost” or “halo” region in domain decomposition and only necessary for specific algorithms that extend the basic PIC cycle, e.g. with dependence on derived density or energy fields.

struct DefaultExchangeMemCfg

bytes reserved for species exchange buffer

This is the default configuration for species exchanges buffer sizes. The default exchange buffer sizes can be changed per species by adding the alias exchangeMemCfg with similar members like in DefaultExchangeMemCfg to its flag list.

Public Static Attributes

constexpr uint32_t BYTES_EXCHANGE_X = 1 * 1024 * 1024
constexpr uint32_t BYTES_EXCHANGE_Y = 3 * 1024 * 1024
constexpr uint32_t BYTES_EXCHANGE_Z = 1 * 1024 * 1024
constexpr uint32_t BYTES_EDGES = 32 * 1024
constexpr uint32_t BYTES_CORNER = 8 * 1024


Define the precision of typically used floating point types in the simulation.

PIConGPU normalizes input automatically, allowing to use single-precision by default for the core algorithms. Note that implementations of various algorithms (usually plugins or non-core components) might still decide to hard-code a different (mixed) precision for some critical operations.


Fine-tuning of the particle heap for GPUs: When running on GPUs, we use a high-performance parallel “new” allocator (mallocMC) which can be parametrized here.

namespace picongpu


using DeviceHeap = mallocMC::Allocator<mallocMC::CreationPolicies::Scatter<DeviceHeapConfig>, mallocMC::DistributionPolicies::Noop, mallocMC::OOMPolicies::ReturnNull, mallocMC::ReservePoolPolicies::SimpleCudaMalloc, mallocMC::AlignmentPolicies::Shrink<>>

Define a new allocator.

This is an allocator resembling the behaviour of the ScatterAlloc algorithm.

struct DeviceHeapConfig

configure the CreationPolicy “Scatter”

Public Types

using pagesize = boost::mpl::int_<2 * 1024 * 1024>

2MiB page can hold around 256 particle frames

using accessblocks = boost::mpl::int_<4>

accessblocks, regionsize and wastefactor are not conclusively investigated and might be performance sensitive for multiple particle species with heavily varying attributes (frame sizes)

using regionsize = boost::mpl::int_<8>
using wastefactor = boost::mpl::int_<2>
using resetfreedpages = boost::mpl::bool_<true>

resetfreedpages is used to minimize memory fragmentation with varying frame sizes