Memory
memory.param
Define low-level memory settings for compute devices.
Settings for memory layout for supercells and particle frame-lists, data exchanges in multi-device domain-decomposition and reserved fields for temporarily derived quantities are defined here.
-
namespace picongpu
Note
this file uses the same naming convention for updated and incident field as Solver.kernel.
Note
In this file we use camelCase “updatedField” in both code and comments to denote field E or B that is being updated (i.e. corrected) in the kernel. The other of the two fields is called “incidentField”. And for the incidentField source we explicitly use “functor” to not confuse it with the field itself. Please refer to https://picongpu.readthedocs.io/en/latest/models/total_field_scattered_field.html for theoretical background of this procedure.
Typedefs
-
using SuperCellSize = typename mCT::shrinkTo<mCT::Int<8, 8, 4>, simDim>::type
size of a superCell
volume of a superCell must be <= 1024
-
using MappingDesc = MappingDescription<simDim, SuperCellSize>
define mapper which is used for kernel call mappings
-
using GuardSize = typename mCT::shrinkTo<mCT::Int<1, 1, 1>, simDim>::type
define the size of the core, border and guard area
PIConGPU uses spatial domain-decomposition for parallelization over multiple devices with non-shared memory architecture. The global spatial domain is organized per device in three sections: the GUARD area contains copies of neighboring devices (also known as “halo”/”ghost”). The BORDER area is the outermost layer of cells of a device, equally to what neighboring devices see as GUARD area. The CORE area is the innermost area of a device. In union with the BORDER area it defines the “active” spatial domain on a device.
GuardSize is defined in units of SuperCellSize per dimension.
Variables
-
constexpr size_t reservedGpuMemorySize = 350 * 1024 * 1024
-
static constexpr uint32_t numFrameSlots = pmacc::math::CT::volume<SuperCellSize>::type::value
number of slots for particles within a frame
-
constexpr uint32_t fieldTmpNumSlots = 1
number of scalar fields that are reserved as temporary fields
-
constexpr bool fieldTmpSupportGatherCommunication = true
can
FieldTmp
gather neighbor informationIf
true
it is possible to call the methodasyncCommunicationGather()
to copy data from the border of neighboring GPU into the local guard. This is also known as building up a “ghost” or “halo” region in domain decomposition and only necessary for specific algorithms that extend the basic PIC cycle, e.g. with dependence on derived density or energy fields.
-
struct DefaultExchangeMemCfg
bytes reserved for species exchange buffer
This is the default configuration for species exchanges buffer sizes when performing a simulation with 32bit precision (default for PIConGPU). For double precision the amount of memory used for exchanges will be automatically doubled. The default exchange buffer sizes can be changed per species by adding the alias exchangeMemCfg with similar members like in DefaultExchangeMemCfg to its flag list.
Public Types
-
using REF_LOCAL_DOM_SIZE = mCT::Int<0, 0, 0>
Reference local domain size.
The size of the local domain for which the exchange sizes
BYTES_*
are configured for. The required size of each exchange will be calculated at runtime based on the local domain size and the reference size. The exchange size will be scaled only up and not down. Zero means that there is no reference domain size, exchanges will not be scaled.
Public Members
-
const std::array<float_X, 3> DIR_SCALING_FACTOR = {{0.0, 0.0, 0.0}}
Scaling rate per direction.
1.0 means it scales linear with the ratio between the local domain size at runtime and the reference local domain size.
Public Static Attributes
-
static constexpr uint32_t BYTES_EXCHANGE_X = 1 * 1024 * 1024
-
static constexpr uint32_t BYTES_EXCHANGE_Y = 3 * 1024 * 1024
-
static constexpr uint32_t BYTES_EXCHANGE_Z = 1 * 1024 * 1024
-
static constexpr uint32_t BYTES_EDGES = 32 * 1024
-
static constexpr uint32_t BYTES_CORNER = 8 * 1024
-
using REF_LOCAL_DOM_SIZE = mCT::Int<0, 0, 0>
-
using SuperCellSize = typename mCT::shrinkTo<mCT::Int<8, 8, 4>, simDim>::type
precision.param
Define the precision of typically used floating point types in the simulation.
PIConGPU normalizes input automatically, allowing to use single-precision by default for the core algorithms. Note that implementations of various algorithms (usually plugins or non-core components) might still decide to hard-code a different (mixed) precision for some critical operations.
-
namespace picongpu
Note
this file uses the same naming convention for updated and incident field as Solver.kernel.
Note
In this file we use camelCase “updatedField” in both code and comments to denote field E or B that is being updated (i.e. corrected) in the kernel. The other of the two fields is called “incidentField”. And for the incidentField source we explicitly use “functor” to not confuse it with the field itself. Please refer to https://picongpu.readthedocs.io/en/latest/models/total_field_scattered_field.html for theoretical background of this procedure.
mallocMC.param
Fine-tuning of the particle heap for GPUs: When running on GPUs, we use a high-performance parallel “new” allocator (mallocMC) which can be parametrized here.
-
namespace picongpu
Note
this file uses the same naming convention for updated and incident field as Solver.kernel.
Note
In this file we use camelCase “updatedField” in both code and comments to denote field E or B that is being updated (i.e. corrected) in the kernel. The other of the two fields is called “incidentField”. And for the incidentField source we explicitly use “functor” to not confuse it with the field itself. Please refer to https://picongpu.readthedocs.io/en/latest/models/total_field_scattered_field.html for theoretical background of this procedure.
Typedefs
-
using DeviceHeap = mallocMC::Allocator<pmacc::Acc<DIM1>, mallocMC::CreationPolicies::Scatter<DeviceHeapConfig>, mallocMC::DistributionPolicies::Noop, mallocMC::OOMPolicies::ReturnNull, mallocMC::ReservePoolPolicies::AlpakaBuf<pmacc::Acc<DIM1>>, mallocMC::AlignmentPolicies::Shrink<>>
Define a new allocator.
This is an allocator resembling the behaviour of the ScatterAlloc algorithm.
-
struct DeviceHeapConfig
configure the CreationPolicy “Scatter”
Public Static Attributes
-
static constexpr uint32_t pagesize = 2u * 1024u * 1024u
2MiB page can hold around 256 particle frames
-
static constexpr uint32_t accessblocksize = 2u * 1024u * 1024u * 1024u
accessblocksize, regionsize and wastefactor are not conclusively investigated and might be performance sensitive for multiple particle species with heavily varying attributes (frame sizes)
-
static constexpr uint32_t regionsize = 16u
-
static constexpr uint32_t wastefactor = 2u
-
static constexpr bool resetfreedpages = true
resetfreedpages is used to minimize memory fragmentation with varying frame sizes
-
static constexpr uint32_t pagesize = 2u * 1024u * 1024u
-
using DeviceHeap = mallocMC::Allocator<pmacc::Acc<DIM1>, mallocMC::CreationPolicies::Scatter<DeviceHeapConfig>, mallocMC::DistributionPolicies::Noop, mallocMC::OOMPolicies::ReturnNull, mallocMC::ReservePoolPolicies::AlpakaBuf<pmacc::Acc<DIM1>>, mallocMC::AlignmentPolicies::Shrink<>>