Memory

memory.param

Define low-level memory settings for compute devices.

Settings for memory layout for supercells and particle frame-lists, data exchanges in multi-device domain-decomposition and reserved fields for temporarily derived quantities are defined here.

namespace picongpu

rate calculation from given atomic data, extracted from flylite, based on FLYCHK

References:

  • Axel Huebl flylite, not yet published

    • R. Mewe. “Interpolation formulae for the electron impact excitation of ions in

      the H-, He-, Li-, and Ne-sequences.” Astronomy and Astrophysics 20, 215 (1972)

    • H.-K. Chung, R.W. Lee, M.H. Chen. “A fast method to generate collisional excitation cross-sections of

      highly charged ions in a hot dense matter” High Energy Dennsity Physics 3, 342-352 (2007)

Note

this file uses the same naming convention for updated and incident field as Solver.kernel.

Note

In this file we use camelCase “updatedField” in both code and comments to denote field E or B that is being updated (i.e. corrected) in the kernel. The other of the two fields is called “incidentField”. And for the incidentField source we explicitly use “functor” to not confuse it with the field itself. Please refer to https://picongpu.readthedocs.io/en/latest/models/total_field_scattered_field.html for theoretical background of this procedure.

Typedefs

using SuperCellSize = typename mCT::shrinkTo<mCT::Int<8, 8, 4>, simDim>::type

size of a superCell

volume of a superCell must be <= 1024

using MappingDesc = MappingDescription<simDim, SuperCellSize>

define mapper which is used for kernel call mappings

using GuardSize = typename mCT::shrinkTo<mCT::Int<1, 1, 1>, simDim>::type

define the size of the core, border and guard area

PIConGPU uses spatial domain-decomposition for parallelization over multiple devices with non-shared memory architecture. The global spatial domain is organized per device in three sections: the GUARD area contains copies of neighboring devices (also known as “halo”/”ghost”). The BORDER area is the outermost layer of cells of a device, equally to what neighboring devices see as GUARD area. The CORE area is the innermost area of a device. In union with the BORDER area it defines the “active” spatial domain on a device.

GuardSize is defined in units of SuperCellSize per dimension.

Variables

constexpr size_t reservedGpuMemorySize = 350 * 1024 * 1024
static constexpr uint32_t numFrameSlots = pmacc::math::CT::volume<SuperCellSize>::type::value

number of slots for particles within a frame

constexpr uint32_t fieldTmpNumSlots = 1

number of scalar fields that are reserved as temporary fields

constexpr bool fieldTmpSupportGatherCommunication = true

can FieldTmp gather neighbor information

If true it is possible to call the method asyncCommunicationGather() to copy data from the border of neighboring GPU into the local guard. This is also known as building up a “ghost” or “halo” region in domain decomposition and only necessary for specific algorithms that extend the basic PIC cycle, e.g. with dependence on derived density or energy fields.

struct DefaultExchangeMemCfg

bytes reserved for species exchange buffer

This is the default configuration for species exchanges buffer sizes when performing a simulation with 32bit precision (default for PIConGPU). For double precision the amount of memory used for exchanges will be automatically doubled. The default exchange buffer sizes can be changed per species by adding the alias exchangeMemCfg with similar members like in DefaultExchangeMemCfg to its flag list.

Public Types

using REF_LOCAL_DOM_SIZE = mCT::Int<0, 0, 0>

Reference local domain size.

The size of the local domain for which the exchange sizes BYTES_* are configured for. The required size of each exchange will be calculated at runtime based on the local domain size and the reference size. The exchange size will be scaled only up and not down. Zero means that there is no reference domain size, exchanges will not be scaled.

Public Members

const std::array<float_X, 3> DIR_SCALING_FACTOR = {{0.0, 0.0, 0.0}}

Scaling rate per direction.

1.0 means it scales linear with the ratio between the local domain size at runtime and the reference local domain size.

Public Static Attributes

static constexpr uint32_t BYTES_EXCHANGE_X = 1 * 1024 * 1024
static constexpr uint32_t BYTES_EXCHANGE_Y = 3 * 1024 * 1024
static constexpr uint32_t BYTES_EXCHANGE_Z = 1 * 1024 * 1024
static constexpr uint32_t BYTES_EDGES = 32 * 1024
static constexpr uint32_t BYTES_CORNER = 8 * 1024

precision.param

Define the precision of typically used floating point types in the simulation.

PIConGPU normalizes input automatically, allowing to use single-precision by default for the core algorithms. Note that implementations of various algorithms (usually plugins or non-core components) might still decide to hard-code a different (mixed) precision for some critical operations.

namespace picongpu

rate calculation from given atomic data, extracted from flylite, based on FLYCHK

References:

  • Axel Huebl flylite, not yet published

    • R. Mewe. “Interpolation formulae for the electron impact excitation of ions in

      the H-, He-, Li-, and Ne-sequences.” Astronomy and Astrophysics 20, 215 (1972)

    • H.-K. Chung, R.W. Lee, M.H. Chen. “A fast method to generate collisional excitation cross-sections of

      highly charged ions in a hot dense matter” High Energy Dennsity Physics 3, 342-352 (2007)

Note

this file uses the same naming convention for updated and incident field as Solver.kernel.

Note

In this file we use camelCase “updatedField” in both code and comments to denote field E or B that is being updated (i.e. corrected) in the kernel. The other of the two fields is called “incidentField”. And for the incidentField source we explicitly use “functor” to not confuse it with the field itself. Please refer to https://picongpu.readthedocs.io/en/latest/models/total_field_scattered_field.html for theoretical background of this procedure.

mallocMC.param

Fine-tuning of the particle heap for GPUs: When running on GPUs, we use a high-performance parallel “new” allocator (mallocMC) which can be parametrized here.

namespace picongpu

rate calculation from given atomic data, extracted from flylite, based on FLYCHK

References:

  • Axel Huebl flylite, not yet published

    • R. Mewe. “Interpolation formulae for the electron impact excitation of ions in

      the H-, He-, Li-, and Ne-sequences.” Astronomy and Astrophysics 20, 215 (1972)

    • H.-K. Chung, R.W. Lee, M.H. Chen. “A fast method to generate collisional excitation cross-sections of

      highly charged ions in a hot dense matter” High Energy Dennsity Physics 3, 342-352 (2007)

Note

this file uses the same naming convention for updated and incident field as Solver.kernel.

Note

In this file we use camelCase “updatedField” in both code and comments to denote field E or B that is being updated (i.e. corrected) in the kernel. The other of the two fields is called “incidentField”. And for the incidentField source we explicitly use “functor” to not confuse it with the field itself. Please refer to https://picongpu.readthedocs.io/en/latest/models/total_field_scattered_field.html for theoretical background of this procedure.

Typedefs

using DeviceHeap = mallocMC::Allocator<pmacc::Acc<DIM1>, mallocMC::CreationPolicies::Scatter<DeviceHeapConfig>, mallocMC::DistributionPolicies::Noop, mallocMC::OOMPolicies::ReturnNull, mallocMC::ReservePoolPolicies::AlpakaBuf<pmacc::Acc<DIM1>>, mallocMC::AlignmentPolicies::Shrink<>>

Define a new allocator.

This is an allocator resembling the behaviour of the ScatterAlloc algorithm.

struct DeviceHeapConfig

configure the CreationPolicy “Scatter”

Public Static Attributes

static constexpr uint32_t pagesize = 2u * 1024u * 1024u

2MiB page can hold around 256 particle frames

static constexpr uint32_t accessblocksize = 2u * 1024u * 1024u * 1024u

accessblocksize, regionsize and wastefactor are not conclusively investigated and might be performance sensitive for multiple particle species with heavily varying attributes (frame sizes)

static constexpr uint32_t regionsize = 16u
static constexpr uint32_t wastefactor = 2u
static constexpr bool resetfreedpages = true

resetfreedpages is used to minimize memory fragmentation with varying frame sizes