On x86_64, when crash triggered, an OOM can always be observed in kdump
kernel as below:
~~~~~~~~~
DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations
swapper/0: page allocation failure: order:5, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.13.0-0.rc5.20210611git929d931f2b40.42.fc35.x86_64 #1
Hardware name: Dell Inc. PowerEdge R910/0P658H, BIOS 2.12.0 06/04/2018
Call Trace:
dump_stack+0x7f/0xa1
warn_alloc.cold+0x72/0xd6
? _raw_spin_unlock_irq+0x24/0x40
? __alloc_pages_direct_compact+0x90/0x1b0
__alloc_pages_slowpath.constprop.0+0xf29/0xf50
? __cond_resched+0x16/0x50
? prepare_alloc_pages.constprop.0+0x19d/0x1b0
__alloc_pages+0x24d/0x2c0
? __dma_atomic_pool_init+0x93/0x93
alloc_page_interleave+0x13/0xb0
atomic_pool_expand+0x118/0x210
? __dma_atomic_pool_init+0x93/0x93
__dma_atomic_pool_init+0x45/0x93
dma_atomic_pool_init+0xdb/0x176
do_one_initcall+0x67/0x320
? rcu_read_lock_sched_held+0x3f/0x80
kernel_init_freeable+0x290/0x2dc
? rest_init+0x24f/0x24f
kernel_init+0xa/0x111
ret_from_fork+0x22/0x30
Mem-Info:
......
DMA: failed to allocate 128 KiB GFP_KERNEL|GFP_DMA pool for atomic allocation
DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations
~~~~~~~~~~~
This OOM can be noticed since commit f1d4d47c5851 ("x86/setup: Always
reserve the first 1M of RAM") is merged. The root cause is there's no
available memory in DMA zone in kdump kernel after commit f1d4d47c5851.
In the current atomic pool implementation, there are three atomic mem
pools: atomic_pool_kernel, atomic_pool_dma, atomic_pool_dma32, initialized
with flag GFP_KERNEL, GFP_KERNEL|GFP_DMA, GFP_KERNEL|GFP_DMA32. On x86_64,
normal kernel can allocate all three of them. While, kdump kernel can't
satisfy atomic_pool_dma initialization because there's only low-1M present
for DMA zone, and locked in commit f1d4d47c5851 so that the low-1M won't be
put in buddy allocator.
The atomic pool is generic, and enabled always no matter if
coherent_pool is specify in kernel cmdline or not. Seems the always enabling
of atomic pool is required by AMD MEM ENCRYPTION if CONFIG_DMA_DIRECT_REMAP
is not set, even though the system is non-AMD cpu, or non-x86 ARCHes.
AFAIK, SME requires swiotlb by default. Not sure if atomic has to be
provided, can we disable it in some cases, e.g in kdump kernel?
In this RFC patch, I change the current coherent_pool kernel parameter
to make it allow user to disable atomic pool if not needed with
coherent_pool=0.
If enabling atomic pool is mandatory for SME, maybe we can adjust and
add kernel parameter like, coherent_pool= to specify which pool is
needed, coherent_pool_size= to specify the initialization size:
coherent_pool= (bit0:kernel, bit1: dma, bit2:dma32,
coherent_pool_size= size (range from 128K to 4M).
Any comment or suggestion is appreciated.
Baoquan He (2):
docs: kernel-parameters: Update to reflect the current default size of
atomic pool
dma-pool: allow user to disable atomic pool
Documentation/admin-guide/kernel-parameters.txt | 4 +++-
kernel/dma/pool.c | 7 +++++--
2 files changed, 8 insertions(+), 3 deletions(-)
--
2.17.2
Since commit 1d659236fb43("dma-pool: scale the default DMA coherent pool
size with memory capacity"), the default size of atomic pool has been
changed to take by scaling with system memory capacity. So update the
document in kerenl-parameter.txt accordingly.
Signed-off-by: Baoquan He <[email protected]>
---
Documentation/admin-guide/kernel-parameters.txt | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index cb89dbdedc46..2c5017db2621 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -623,7 +623,9 @@
coherent_pool=nn[KMG] [ARM,KNL]
Sets the size of memory pool for coherent, atomic dma
- allocations, by default set to 256K.
+ allocations. Otherwise the default size will be scaled
+ with memory capacity, while clamped between 128K and
+ 1 << (PAGE_SHIFT + MAX_ORDER-1).
com20020= [HW,NET] ARCnet - COM20020 chipset
Format:
--
2.17.2
In the current code, three atomic memory pools are provided,
atomic_pool_kernel, atomic_pool_dma, atomic_pool_dma32, initialized
with flag GFP_KERNEL, GFP_KERNEL|GFP_DMA, GFP_KERNEL|GFP_DMA32.
And they are always enabled, even though 'coherent_pool=0' is
specified in kernel command line.
In some cases, atomic pool may not be needed. And worse, it even will
cause problem. E.g in kdump kernel of x86_64, it will cause OOM for
atomic_pool_dma initialization. Because there isn't available memory
for buddy allocatory in DMA zone of kdump kernel since commit
f1d4d47c5851 ("x86/setup: Always reserve the first 1M of RAM").
The OOM will cause panic if panic_on_oom is added into kdump kernel.
So change code to adjust the existing coherent_pool to allow user
to disable atomic pool by specifying 'coherent_pool=0'.
Signed-off-by: Baoquan He <[email protected]>
---
kernel/dma/pool.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
index 5f84e6cdb78e..5a85804b5beb 100644
--- a/kernel/dma/pool.c
+++ b/kernel/dma/pool.c
@@ -21,7 +21,7 @@ static struct gen_pool *atomic_pool_kernel __ro_after_init;
static unsigned long pool_size_kernel;
/* Size can be defined by the coherent_pool command line */
-static size_t atomic_pool_size;
+static unsigned long atomic_pool_size = -1;
/* Dynamic background expansion when the atomic pool is near capacity */
static struct work_struct atomic_pool_work;
@@ -188,11 +188,14 @@ static int __init dma_atomic_pool_init(void)
{
int ret = 0;
+ if (!atomic_pool_size)
+ return 0;
+
/*
* If coherent_pool was not used on the command line, default the pool
* sizes to 128KB per 1GB of memory, min 128KB, max MAX_ORDER-1.
*/
- if (!atomic_pool_size) {
+ if (atomic_pool_size == -1) {
unsigned long pages = totalram_pages() / (SZ_1G / SZ_128K);
pages = min_t(unsigned long, pages, MAX_ORDER_NR_PAGES);
atomic_pool_size = max_t(size_t, pages << PAGE_SHIFT, SZ_128K);
--
2.17.2
So reduce the amount allocated. But the pool is needed for proper
operation on systems with memory encryption. And please add the right
maintainer or at least mailing list for the code you're touching next
time.
On 06/24/21 at 08:40am, Christoph Hellwig wrote:
> So reduce the amount allocated. But the pool is needed for proper
> operation on systems with memory encryption. And please add the right
> maintainer or at least mailing list for the code you're touching next
> time.
Oh, I thoutht it's memory issue only, should have run
./scripts/get_maintainer.pl. sorry.
About reducing the amount allocated, it may not help. Because on x86_64,
kdump kernel doesn't put any page of memory into buddy allocator of DMA
zone. Means it will defenitely OOM for atomic_pool_dma initialization.
Wondering in which case or on which device the atomic pool is needed on
AMD system with mem encrytion enabled. As we can see, the OOM will
happen too in kdump kernel on Intel system, even though it's not
necessary.
Thanks
Baoquan
On 2021-06-24 10:29, Baoquan He wrote:
> On 06/24/21 at 08:40am, Christoph Hellwig wrote:
>> So reduce the amount allocated. But the pool is needed for proper
>> operation on systems with memory encryption. And please add the right
>> maintainer or at least mailing list for the code you're touching next
>> time.
>
> Oh, I thoutht it's memory issue only, should have run
> ./scripts/get_maintainer.pl. sorry.
>
> About reducing the amount allocated, it may not help. Because on x86_64,
> kdump kernel doesn't put any page of memory into buddy allocator of DMA
> zone. Means it will defenitely OOM for atomic_pool_dma initialization.
>
> Wondering in which case or on which device the atomic pool is needed on
> AMD system with mem encrytion enabled. As we can see, the OOM will
> happen too in kdump kernel on Intel system, even though it's not
> necessary.
Hmm, I think the Kconfig reshuffle has actually left a slight wrinkle
here. For DMA_DIRECT_REMAP=y we can assume an atomic pool is always
needed, since that was the original behaviour anyway. However the
implications of AMD_MEM_ENCRYPT=y are different - even if support is
enabled, it still should only be relevant if mem_encrypt_active(), so it
probably does make sense to have an additional runtime gate on that.
From a quick scan, use of dma_alloc_from_pool() already depends on
force_dma_unencrypted() so that's probably fine already, but I think
we'd need a bit of extra protection around dma_free_from_pool() to
prevent gen_pool_has_addr() dereferencing NULL if the pools are
uninitialised, even with your proposed patch as it is. Presumably
nothing actually called dma_direct_free() when you tested this?
Robin.
On Thu, Jun 24, 2021 at 11:47:31AM +0100, Robin Murphy wrote:
> Hmm, I think the Kconfig reshuffle has actually left a slight wrinkle here.
> For DMA_DIRECT_REMAP=y we can assume an atomic pool is always needed, since
> that was the original behaviour anyway. However the implications of
> AMD_MEM_ENCRYPT=y are different - even if support is enabled, it still
> should only be relevant if mem_encrypt_active(), so it probably does make
> sense to have an additional runtime gate on that.
Yeah, a check for that would probably be useful.
On 06/24/21 at 11:47am, Robin Murphy wrote:
> On 2021-06-24 10:29, Baoquan He wrote:
> > On 06/24/21 at 08:40am, Christoph Hellwig wrote:
> > > So reduce the amount allocated. But the pool is needed for proper
> > > operation on systems with memory encryption. And please add the right
> > > maintainer or at least mailing list for the code you're touching next
> > > time.
> >
> > Oh, I thoutht it's memory issue only, should have run
> > ./scripts/get_maintainer.pl. sorry.
> >
> > About reducing the amount allocated, it may not help. Because on x86_64,
> > kdump kernel doesn't put any page of memory into buddy allocator of DMA
> > zone. Means it will defenitely OOM for atomic_pool_dma initialization.
> >
> > Wondering in which case or on which device the atomic pool is needed on
> > AMD system with mem encrytion enabled. As we can see, the OOM will
> > happen too in kdump kernel on Intel system, even though it's not
> > necessary.
Sorry for very late response, and thank both for your comments.
>
> Hmm, I think the Kconfig reshuffle has actually left a slight wrinkle here.
> For DMA_DIRECT_REMAP=y we can assume an atomic pool is always needed, since
> that was the original behaviour anyway. However the implications of
> AMD_MEM_ENCRYPT=y are different - even if support is enabled, it still
> should only be relevant if mem_encrypt_active(), so it probably does make
> sense to have an additional runtime gate on that.
>
> From a quick scan, use of dma_alloc_from_pool() already depends on
> force_dma_unencrypted() so that's probably fine already, but I think we'd
> need a bit of extra protection around dma_free_from_pool() to prevent
> gen_pool_has_addr() dereferencing NULL if the pools are uninitialised, even
> with your proposed patch as it is. Presumably nothing actually called
> dma_direct_free() when you tested this?
Yes, enforcing the conditional check of force_dma_unencrypted() around
dma_free_from_pool sounds reasonable, just as we have done in
dma_alloc_from_pool().
I have tested this patchset on normal x86_64 systems and one amd system
with SME support, disabling atomic pool can fix the issue that there's no
managed pages in dma zone then requesting page from dma zone will cause
allocation failure. And even disabling atomic pool in 1st kernel didn't
cause any problem on one AMD EPYC system which supports SME. I am not
expert of DMA area, wondering how atomic pool is supposed to do in
SME/SEV system.
Besides, even though atomic pool is disabled, slub page for allocation
of dma-kmalloc also triggers page allocation failure. So I change to
take another way to fix them, please check v2 post. The atomic pool
disabling an be a good to have change.
Thanks
Baoquan
On 8/5/21 1:54 AM, Baoquan He wrote:
> On 06/24/21 at 11:47am, Robin Murphy wrote:
>> On 2021-06-24 10:29, Baoquan He wrote:
>>> On 06/24/21 at 08:40am, Christoph Hellwig wrote:
>>>> So reduce the amount allocated. But the pool is needed for proper
>>>> operation on systems with memory encryption. And please add the right
>>>> maintainer or at least mailing list for the code you're touching next
>>>> time.
>>>
>>> Oh, I thoutht it's memory issue only, should have run
>>> ./scripts/get_maintainer.pl. sorry.
>>>
>>> About reducing the amount allocated, it may not help. Because on x86_64,
>>> kdump kernel doesn't put any page of memory into buddy allocator of DMA
>>> zone. Means it will defenitely OOM for atomic_pool_dma initialization.
>>>
>>> Wondering in which case or on which device the atomic pool is needed on
>>> AMD system with mem encrytion enabled. As we can see, the OOM will
>>> happen too in kdump kernel on Intel system, even though it's not
>>> necessary.
>
> Sorry for very late response, and thank both for your comments.
>
>>
>> Hmm, I think the Kconfig reshuffle has actually left a slight wrinkle here.
>> For DMA_DIRECT_REMAP=y we can assume an atomic pool is always needed, since
>> that was the original behaviour anyway. However the implications of
>> AMD_MEM_ENCRYPT=y are different - even if support is enabled, it still
>> should only be relevant if mem_encrypt_active(), so it probably does make
>> sense to have an additional runtime gate on that.
>
>>
>> From a quick scan, use of dma_alloc_from_pool() already depends on
>> force_dma_unencrypted() so that's probably fine already, but I think we'd
>> need a bit of extra protection around dma_free_from_pool() to prevent
>> gen_pool_has_addr() dereferencing NULL if the pools are uninitialised, even
>> with your proposed patch as it is. Presumably nothing actually called
>> dma_direct_free() when you tested this?
>
> Yes, enforcing the conditional check of force_dma_unencrypted() around
> dma_free_from_pool sounds reasonable, just as we have done in
> dma_alloc_from_pool().
>
> I have tested this patchset on normal x86_64 systems and one amd system
> with SME support, disabling atomic pool can fix the issue that there's no
> managed pages in dma zone then requesting page from dma zone will cause
> allocation failure. And even disabling atomic pool in 1st kernel didn't
> cause any problem on one AMD EPYC system which supports SME. I am not
> expert of DMA area, wondering how atomic pool is supposed to do in
> SME/SEV system.
I think the atomic pool is used by the NVMe driver. My understanding is
that driver will do a dma_alloc_coherent() from interrupt context, so it
needs to use GFP_ATOMIC. The pool was created because dma_alloc_coherent()
would perform a set_memory_decrypted() call, which can sleep. The pool
eliminates that issue (David can correct me if I got that wrong).
Thanks,
Tom
>
> Besides, even though atomic pool is disabled, slub page for allocation
> of dma-kmalloc also triggers page allocation failure. So I change to
> take another way to fix them, please check v2 post. The atomic pool
> disabling an be a good to have change.
>
> Thanks
> Baoquan
>
On 08/10/21 at 03:52pm, Tom Lendacky wrote:
> On 8/5/21 1:54 AM, Baoquan He wrote:
> > On 06/24/21 at 11:47am, Robin Murphy wrote:
> >> On 2021-06-24 10:29, Baoquan He wrote:
> >>> On 06/24/21 at 08:40am, Christoph Hellwig wrote:
> >>>> So reduce the amount allocated. But the pool is needed for proper
> >>>> operation on systems with memory encryption. And please add the right
> >>>> maintainer or at least mailing list for the code you're touching next
> >>>> time.
> >>>
> >>> Oh, I thoutht it's memory issue only, should have run
> >>> ./scripts/get_maintainer.pl. sorry.
> >>>
> >>> About reducing the amount allocated, it may not help. Because on x86_64,
> >>> kdump kernel doesn't put any page of memory into buddy allocator of DMA
> >>> zone. Means it will defenitely OOM for atomic_pool_dma initialization.
> >>>
> >>> Wondering in which case or on which device the atomic pool is needed on
> >>> AMD system with mem encrytion enabled. As we can see, the OOM will
> >>> happen too in kdump kernel on Intel system, even though it's not
> >>> necessary.
> >
> > Sorry for very late response, and thank both for your comments.
> >
> >>
> >> Hmm, I think the Kconfig reshuffle has actually left a slight wrinkle here.
> >> For DMA_DIRECT_REMAP=y we can assume an atomic pool is always needed, since
> >> that was the original behaviour anyway. However the implications of
> >> AMD_MEM_ENCRYPT=y are different - even if support is enabled, it still
> >> should only be relevant if mem_encrypt_active(), so it probably does make
> >> sense to have an additional runtime gate on that.
> >
> >>
> >> From a quick scan, use of dma_alloc_from_pool() already depends on
> >> force_dma_unencrypted() so that's probably fine already, but I think we'd
> >> need a bit of extra protection around dma_free_from_pool() to prevent
> >> gen_pool_has_addr() dereferencing NULL if the pools are uninitialised, even
> >> with your proposed patch as it is. Presumably nothing actually called
> >> dma_direct_free() when you tested this?
> >
> > Yes, enforcing the conditional check of force_dma_unencrypted() around
> > dma_free_from_pool sounds reasonable, just as we have done in
> > dma_alloc_from_pool().
> >
> > I have tested this patchset on normal x86_64 systems and one amd system
> > with SME support, disabling atomic pool can fix the issue that there's no
> > managed pages in dma zone then requesting page from dma zone will cause
> > allocation failure. And even disabling atomic pool in 1st kernel didn't
> > cause any problem on one AMD EPYC system which supports SME. I am not
> > expert of DMA area, wondering how atomic pool is supposed to do in
> > SME/SEV system.
>
> I think the atomic pool is used by the NVMe driver. My understanding is
> that driver will do a dma_alloc_coherent() from interrupt context, so it
> needs to use GFP_ATOMIC. The pool was created because dma_alloc_coherent()
> would perform a set_memory_decrypted() call, which can sleep. The pool
> eliminates that issue (David can correct me if I got that wrong).
Thanks for the information.
While from the code comment around which atomic pool is requested,
on amd system it's used to satisfy decrypting memory because that
can't block. Combined with your saying, it's because NVMe driver
need decrypted memory on AMD system?
void *dma_direct_alloc(struct device *dev, size_t size,
dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs)
{
......
/*
* Remapping or decrypting memory may block. If either is required and
* we can't block, allocate the memory from the atomic pools.
*/
if (IS_ENABLED(CONFIG_DMA_COHERENT_POOL) &&
!gfpflags_allow_blocking(gfp) &&
(force_dma_unencrypted(dev) ||
(IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && !dev_is_dma_coherent(dev))))
return dma_direct_alloc_from_pool(dev, size, dma_handle, gfp);
......
}
Looking at the those related commits, the below one from David tells
that atomic dma pool is used when device require non-blocking and
unencrypted buffer. When I checked the system I borrowed, it's AMD EYPC
and SME is enabled. And it has many pci devices, as you can see, its 'ls
pci' outputs 113 lines. But disabling the three atomic pools didn't
trigger any error on that AMD system. Does it mean only specific devices
need this atomic pool in SME/SEV enabling case? Should we add more
details in document or code comment to make clear this?
commit 82fef0ad811fb5976cf36ccc3d2c3bc0195dfb72
Author: David Rientjes <[email protected]>
Date: Tue Apr 14 17:05:01 2020 -0700
x86/mm: unencrypted non-blocking DMA allocations use coherent pools
When CONFIG_AMD_MEM_ENCRYPT is enabled and a device requires unencrypted
DMA, all non-blocking allocations must originate from the atomic DMA
coherent pools.
Select CONFIG_DMA_COHERENT_POOL for CONFIG_AMD_MEM_ENCRYPT.
Signed-off-by: David Rientjes <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 1d6104ea8af0..2bf2222819d3 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1520,6 +1520,7 @@ config X86_CPA_STATISTICS
config AMD_MEM_ENCRYPT
bool "AMD Secure Memory Encryption (SME) support"
depends on X86_64 && CPU_SUP_AMD
+ select DMA_COHERENT_POOL
select DYNAMIC_PHYSICAL_MASK
select ARCH_USE_MEMREMAP_PROT
select ARCH_HAS_FORCE_DMA_UNENCRYPTED
[~]# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-31
Thread(s) per core: 2
Core(s) per socket: 16
Socket(s): 1
NUMA node(s): 4
Vendor ID: AuthenticAMD
BIOS Vendor ID: AMD
CPU family: 23
Model: 1
Model name: AMD EPYC 7351P 16-Core Processor
BIOS Model name: AMD EPYC 7351P 16-Core Processor
Stepping: 2
CPU MHz: 2395.439
BogoMIPS: 4790.87
Virtualization: AMD-V
L1d cache: 32K
L1i cache: 64K
L2 cache: 512K
L3 cache: 8192K
......
[~]# lspci| wc -l
113
>
>
> >
> > Besides, even though atomic pool is disabled, slub page for allocation
> > of dma-kmalloc also triggers page allocation failure. So I change to
> > take another way to fix them, please check v2 post. The atomic pool
> > disabling an be a good to have change.
> >
> > Thanks
> > Baoquan
> >
>
On Tue, Aug 10, 2021 at 03:52:25PM -0500, Tom Lendacky via iommu wrote:
> I think the atomic pool is used by the NVMe driver. My understanding is
> that driver will do a dma_alloc_coherent() from interrupt context, so it
> needs to use GFP_ATOMIC. The pool was created because dma_alloc_coherent()
> would perform a set_memory_decrypted() call, which can sleep. The pool
> eliminates that issue (David can correct me if I got that wrong).
Not just the NVMe driver. We have plenty of drivers doing that, just
do a quick grep for dma_alloc_* dma_poll_alloc, dma_pool_zalloc with
GFP_ATOMIC (and that won't even find multi-line strings).
On 8/10/21 9:23 PM, Baoquan He wrote:
> On 08/10/21 at 03:52pm, Tom Lendacky wrote:
>> On 8/5/21 1:54 AM, Baoquan He wrote:
>>> On 06/24/21 at 11:47am, Robin Murphy wrote:
>>>> On 2021-06-24 10:29, Baoquan He wrote:
>>>>> On 06/24/21 at 08:40am, Christoph Hellwig wrote:
...
> Looking at the those related commits, the below one from David tells
> that atomic dma pool is used when device require non-blocking and
> unencrypted buffer. When I checked the system I borrowed, it's AMD EYPC
> and SME is enabled. And it has many pci devices, as you can see, its 'ls
> pci' outputs 113 lines. But disabling the three atomic pools didn't
> trigger any error on that AMD system. Does it mean only specific devices
> need this atomic pool in SME/SEV enabling case? Should we add more
> details in document or code comment to make clear this?
It very well could be just the devices being used. Under SME (bare metal),
if a device supports 64-bit DMA, then bounce buffers aren't used and the
DMA can be performed directly to encrypted memory, so there is no need to
issue a set_memory_decrypted() call, so I would assume it likely isn't
using the pool.
Under SEV, however, all DMA has to go through guest un-encrypted memory.
If you pass through a device that does dma_alloc_coherent() calls with
GFP_ATOMIC, then the pool will be needed.
Thanks,
Tom