Memory¶
memory.param¶
Define low-level memory settings for compute devices.
Settings for memory layout for supercells and particle frame-lists, data exchanges in multi-device domain-decomposition and reserved fields for temporarily derived quantities are defined here.
-
namespace
picongpu
Typedefs
-
using
SuperCellSize
= typename mCT::shrinkTo<mCT::Int<8, 8, 4>, simDim>::type size of a superCell
volume of a superCell must be <= 1024
-
using
MappingDesc
= MappingDescription<simDim, SuperCellSize> define mapper which is used for kernel call mappings
-
using
GuardSize
= typename mCT::shrinkTo<mCT::Int<1, 1, 1>, simDim>::type define the size of the core, border and guard area
PIConGPU uses spatial domain-decomposition for parallelization over multiple devices with non-shared memory architecture. The global spatial domain is organized per device in three sections: the GUARD area contains copies of neighboring devices (also known as “halo”/”ghost”). The BORDER area is the outermost layer of cells of a device, equally to what neighboring devices see as GUARD area. The CORE area is the innermost area of a device. In union with the BORDER area it defines the “active” spatial domain on a device.
GuardSize is defined in units of SuperCellSize per dimension.
Variables
-
constexpr size_t
reservedGpuMemorySize
= 350 * 1024 * 1024
-
constexpr uint32_t
fieldTmpNumSlots
= 1 number of scalar fields that are reserved as temporary fields
-
constexpr bool
fieldTmpSupportGatherCommunication
= true can
FieldTmp
gather neighbor informationIf
true
it is possible to call the methodasyncCommunicationGather()
to copy data from the border of neighboring GPU into the local guard. This is also known as building up a “ghost” or “halo” region in domain decomposition and only necessary for specific algorithms that extend the basic PIC cycle, e.g. with dependence on derived density or energy fields.
-
struct
DefaultExchangeMemCfg
bytes reserved for species exchange buffer
This is the default configuration for species exchanges buffer sizes. The default exchange buffer sizes can be changed per species by adding the alias exchangeMemCfg with similar members like in DefaultExchangeMemCfg to its flag list.
Public Types
-
using
REF_LOCAL_DOM_SIZE
= mCT::Int<0, 0, 0> Reference local domain size.
The size of the local domain for which the exchange sizes
BYTES_*
are configured for. The required size of each exchange will be calculated at runtime based on the local domain size and the reference size. The exchange size will be scaled only up and not down. Zero means that there is no reference domain size, exchanges will not be scaled.
Public Members
-
const std::array<float_X, 3> picongpu::DefaultExchangeMemCfg::DIR_SCALING_FACTOR = {{0.0, 0.0, 0.0}}
Scaling rate per direction.
1.0 means it scales linear with the ratio between the local domain size at runtime and the reference local domain size.
Public Static Attributes
-
constexpr uint32_t
BYTES_EXCHANGE_X
= 1 * 1024 * 1024
-
constexpr uint32_t
BYTES_EXCHANGE_Y
= 3 * 1024 * 1024
-
constexpr uint32_t
BYTES_EXCHANGE_Z
= 1 * 1024 * 1024
-
constexpr uint32_t
BYTES_EDGES
= 32 * 1024
-
constexpr uint32_t
BYTES_CORNER
= 8 * 1024
-
using
-
using
precision.param¶
Define the precision of typically used floating point types in the simulation.
PIConGPU normalizes input automatically, allowing to use single-precision by default for the core algorithms. Note that implementations of various algorithms (usually plugins or non-core components) might still decide to hard-code a different (mixed) precision for some critical operations.
mallocMC.param¶
Fine-tuning of the particle heap for GPUs: When running on GPUs, we use a high-performance parallel “new” allocator (mallocMC) which can be parametrized here.
-
namespace
picongpu
Typedefs
-
using
DeviceHeap
= mallocMC::Allocator<cupla::Acc, mallocMC::CreationPolicies::Scatter<DeviceHeapConfig>, mallocMC::DistributionPolicies::Noop, mallocMC::OOMPolicies::ReturnNull, mallocMC::ReservePoolPolicies::AlpakaBuf<cupla::Acc>, mallocMC::AlignmentPolicies::Shrink<>> Define a new allocator.
This is an allocator resembling the behaviour of the ScatterAlloc algorithm.
-
struct
DeviceHeapConfig
configure the CreationPolicy “Scatter”
Public Static Attributes
-
constexpr uint32_t
pagesize
= 2u * 1024u * 1024u 2MiB page can hold around 256 particle frames
-
constexpr uint32_t
accessblocksize
= 2u * 1024u * 1024u * 1024u accessblocksize, regionsize and wastefactor are not conclusively investigated and might be performance sensitive for multiple particle species with heavily varying attributes (frame sizes)
-
constexpr uint32_t
regionsize
= 16u
-
constexpr uint32_t
wastefactor
= 2u
-
constexpr bool
resetfreedpages
= true resetfreedpages is used to minimize memory fragmentation with varying frame sizes
-
constexpr uint32_t
-
using