2021-08-10 13:09:28

by Baoquan He

[permalink] [raw]
Subject: [RFC PATCH v2 0/5] Avoid requesting page from DMA zone when no managed pages

In some places of the current kernel, it assumes that DMA zone must have
managed pages and try to request pages if CONFIG_ZONE_DMA is enabled.
While this is not always true. E.g in kdump kernel of x86_64, only low 1M
is presented and locked down at very early stage of boot, so that there's
no managed pages at all in DMA zone. This exception will always cause page
allocation failure if page is requested from DMA zone.

E.g in kdump kernel of x86_64, atomic_pool_dma which is created with GFP_DMA
will cause page allocation failure, and dma-kmalloc initialization also
caused page allocation failure.

In this v2 patchset:

* Patch 1, 2 are clean up patches of atomic pool code when read code.
* Patch 3 introduces helper functions to help check if DMA zone with managed
pages exists.
* Patch 4 is to check if managed DMA zone exists, then create atomic_pool_dma
if yes.
* Patch 5 is to check if managed DMA zone exists, then create
* dma-kmalloc cache if yes.

This is v1 post:
https://lore.kernel.org/lkml/[email protected]/

v1->v2:
In v1, I tried to adjust code to allow user to disable atomic pool
completely with "coherent_pool=0" kernek parameter, then expect to add
this into kdump kernel to mute the page allocation failure. However,
later found atomic pool is needed when DMA_DIRECT_REMAP=y or
mem_encrypt_active() is true, and dma-kmalloc also always caused page
allocation failure.

So in this v2, change to check if managed DMA zone exists. If DMA zone
has managed pages, we go further to request page from DMA zone to
initialize. Otherwise, just skip to initialize stuffs which need pages
from DMA zone.

Baoquan He (5):
docs: kernel-parameters: Update to reflect the current default size of
atomic pool
dma-pool: allow user to disable atomic pool
mm_zone: add function to check if managed dma zone exists
dma/pool: create dma atomic pool only if dma zone has mamaged pages
mm/slub: do not create dma-kmalloc if no managed pages in DMA zone

.../admin-guide/kernel-parameters.txt | 5 ++++-
include/linux/mmzone.h | 21 +++++++++++++++++++
kernel/dma/pool.c | 11 ++++++----
mm/page_alloc.c | 11 ++++++++++
mm/slab_common.c | 6 ++++++
5 files changed, 49 insertions(+), 5 deletions(-)

--
2.17.2


2021-08-10 16:04:15

by Baoquan He

[permalink] [raw]
Subject: [RFC PATCH v2 5/5] mm/slub: do not create dma-kmalloc if no managed pages in DMA zone

Dma-kmalloc will be created as long as CONFIG_ZONE_DMA is enabled.
However, it will fail if DMA zone has no managed pages. The failure
can be seen in kdump kernel of x86_64 as below:

CPU: 0 PID: 65 Comm: kworker/u2:1 Not tainted 5.14.0-rc2+ #9
Hardware name: Intel Corporation SandyBridge Platform/To be filled by O.E.M., BIOS RMLSDP.86I.R2.28.D690.1306271008 06/27/2013
Workqueue: events_unbound async_run_entry_fn
Call Trace:
dump_stack_lvl+0x57/0x72
warn_alloc.cold+0x72/0xd6
__alloc_pages_slowpath.constprop.0+0xf56/0xf70
__alloc_pages+0x23b/0x2b0
allocate_slab+0x406/0x630
___slab_alloc+0x4b1/0x7e0
? sr_probe+0x200/0x600
? lock_acquire+0xc4/0x2e0
? fs_reclaim_acquire+0x4d/0xe0
? lock_is_held_type+0xa7/0x120
? sr_probe+0x200/0x600
? __slab_alloc+0x67/0x90
__slab_alloc+0x67/0x90
? sr_probe+0x200/0x600
? sr_probe+0x200/0x600
kmem_cache_alloc_trace+0x259/0x270
sr_probe+0x200/0x600
......
bus_probe_device+0x9f/0xb0
device_add+0x3d2/0x970
......
__scsi_add_device+0xea/0x100
ata_scsi_scan_host+0x97/0x1d0
async_run_entry_fn+0x30/0x130
process_one_work+0x2b0/0x5c0
worker_thread+0x55/0x3c0
? process_one_work+0x5c0/0x5c0
kthread+0x149/0x170
? set_kthread_struct+0x40/0x40
ret_from_fork+0x22/0x30
Mem-Info:
......

The above failure happened when calling kmalloc() to allocate buffer with
GFP_DMA. It requests to allocate slab page from DMA zone while no managed
pages in there.
sr_probe()
--> get_capabilities()
--> buffer = kmalloc(512, GFP_KERNEL | GFP_DMA);

The DMA zone should be checked if it has managed pages, then try to create
dma-kmalloc.

Signed-off-by: Baoquan He <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Vlastimil Babka <[email protected]>

---
mm/slab_common.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/mm/slab_common.c b/mm/slab_common.c
index 1c673c323baf..22350bef3bae 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -911,10 +911,16 @@ void __init create_kmalloc_caches(slab_flags_t flags)
slab_state = UP;

#ifdef CONFIG_ZONE_DMA
+ bool managed_dma = has_managed_dma();
+
for (i = 0; i <= KMALLOC_SHIFT_HIGH; i++) {
struct kmem_cache *s = kmalloc_caches[KMALLOC_NORMAL][i];

if (s) {
+ if (!managed_dma) {
+ kmalloc_caches[KMALLOC_DMA][i] = kmalloc_caches[KMALLOC_NORMAL][i];
+ continue;
+ }
kmalloc_caches[KMALLOC_DMA][i] = create_kmalloc_cache(
kmalloc_info[i].name[KMALLOC_DMA],
kmalloc_info[i].size,
--
2.17.2

2021-08-10 16:04:18

by Baoquan He

[permalink] [raw]
Subject: [RFC PATCH v2 2/5] dma-pool: allow user to disable atomic pool

In the current code, three atomic memory pools are always created,
atomic_pool_kernel|dma|dma32, even though 'coherent_pool=0' is
specified in kernel command line. In fact, atomic pool is only
necessary when CONFIG_DMA_DIRECT_REMAP=y or mem_encrypt_active=y
which are needed on few ARCHes.

So change code to allow user to disable atomic pool by specifying
'coherent_pool=0'.

Meanwhile, update the relevant document in kernel-parameter.txt.

Signed-off-by: Baoquan He <[email protected]>
---
Documentation/admin-guide/kernel-parameters.txt | 3 ++-
kernel/dma/pool.c | 7 +++++--
2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 970ed65db89f..620d38b5ce2d 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -653,7 +653,8 @@

coherent_pool=nn[KMG] [ARM,KNL]
Sets the size of memory pool for coherent, atomic dma
- allocations. Otherwise the default size will be scaled
+ allocations. A value of 0 disables the three atomic
+ memory pool. Otherwise the default size will be scaled
with memory capacity, while clamped between 128K and
1 << (PAGE_SHIFT + MAX_ORDER-1).

diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
index 5f84e6cdb78e..5a85804b5beb 100644
--- a/kernel/dma/pool.c
+++ b/kernel/dma/pool.c
@@ -21,7 +21,7 @@ static struct gen_pool *atomic_pool_kernel __ro_after_init;
static unsigned long pool_size_kernel;

/* Size can be defined by the coherent_pool command line */
-static size_t atomic_pool_size;
+static unsigned long atomic_pool_size = -1;

/* Dynamic background expansion when the atomic pool is near capacity */
static struct work_struct atomic_pool_work;
@@ -188,11 +188,14 @@ static int __init dma_atomic_pool_init(void)
{
int ret = 0;

+ if (!atomic_pool_size)
+ return 0;
+
/*
* If coherent_pool was not used on the command line, default the pool
* sizes to 128KB per 1GB of memory, min 128KB, max MAX_ORDER-1.
*/
- if (!atomic_pool_size) {
+ if (atomic_pool_size == -1) {
unsigned long pages = totalram_pages() / (SZ_1G / SZ_128K);
pages = min_t(unsigned long, pages, MAX_ORDER_NR_PAGES);
atomic_pool_size = max_t(size_t, pages << PAGE_SHIFT, SZ_128K);
--
2.17.2

2021-08-10 16:04:24

by Baoquan He

[permalink] [raw]
Subject: [RFC PATCH v2 1/5] docs: kernel-parameters: Update to reflect the current default size of atomic pool

Since commit 1d659236fb43("dma-pool: scale the default DMA coherent pool
size with memory capacity"), the default size of atomic pool has been
changed to take by scaling with system memory capacity. So update the
document in kerenl-parameter.txt accordingly.

Signed-off-by: Baoquan He <[email protected]>
---
Documentation/admin-guide/kernel-parameters.txt | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index bdb22006f713..970ed65db89f 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -653,7 +653,9 @@

coherent_pool=nn[KMG] [ARM,KNL]
Sets the size of memory pool for coherent, atomic dma
- allocations, by default set to 256K.
+ allocations. Otherwise the default size will be scaled
+ with memory capacity, while clamped between 128K and
+ 1 << (PAGE_SHIFT + MAX_ORDER-1).

com20020= [HW,NET] ARCnet - COM20020 chipset
Format:
--
2.17.2

2021-08-10 17:21:27

by Baoquan He

[permalink] [raw]
Subject: [RFC PATCH v2 4/5] dma/pool: create dma atomic pool only if dma zone has mamaged pages

Currently three dma atomic pools are initialized as long as the relevant
kernel codes are built in. While in kdump kernel of x86_64, this is not
right when trying to create atomic_pool_dma, because there's no managed
pages in DMA zone. In the case, DMA zone only has low 1M memory presented
and locked down by memblock allocator. So no pages are added into buddy
of DMA zone. Please check commit f1d4d47c5851 ("x86/setup: Always reserve
the first 1M of RAM").

Then in kdump kernel of x86_64, it always prints below failure message:

DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations
swapper/0: page allocation failure: order:5, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.13.0-0.rc5.20210611git929d931f2b40.42.fc35.x86_64 #1
Hardware name: Dell Inc. PowerEdge R910/0P658H, BIOS 2.12.0 06/04/2018
Call Trace:
dump_stack+0x7f/0xa1
warn_alloc.cold+0x72/0xd6
? _raw_spin_unlock_irq+0x24/0x40
? __alloc_pages_direct_compact+0x90/0x1b0
__alloc_pages_slowpath.constprop.0+0xf29/0xf50
? __cond_resched+0x16/0x50
? prepare_alloc_pages.constprop.0+0x19d/0x1b0
__alloc_pages+0x24d/0x2c0
? __dma_atomic_pool_init+0x93/0x93
alloc_page_interleave+0x13/0xb0
atomic_pool_expand+0x118/0x210
? __dma_atomic_pool_init+0x93/0x93
__dma_atomic_pool_init+0x45/0x93
dma_atomic_pool_init+0xdb/0x176
do_one_initcall+0x67/0x320
? rcu_read_lock_sched_held+0x3f/0x80
kernel_init_freeable+0x290/0x2dc
? rest_init+0x24f/0x24f
kernel_init+0xa/0x111
ret_from_fork+0x22/0x30
Mem-Info:
......
DMA: failed to allocate 128 KiB GFP_KERNEL|GFP_DMA pool for atomic allocation
DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations

Here, let's check if DMA zone has managed pages, then create atomic_pool_dma
if yes. Otherwise just skip it.

Signed-off-by: Baoquan He <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Marek Szyprowski <[email protected]>
Cc: Robin Murphy <[email protected]>
Cc: [email protected]
---
kernel/dma/pool.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
index 5a85804b5beb..00df3edd6c5d 100644
--- a/kernel/dma/pool.c
+++ b/kernel/dma/pool.c
@@ -206,7 +206,7 @@ static int __init dma_atomic_pool_init(void)
GFP_KERNEL);
if (!atomic_pool_kernel)
ret = -ENOMEM;
- if (IS_ENABLED(CONFIG_ZONE_DMA)) {
+ if (has_managed_dma()) {
atomic_pool_dma = __dma_atomic_pool_init(atomic_pool_size,
GFP_KERNEL | GFP_DMA);
if (!atomic_pool_dma)
@@ -229,7 +229,7 @@ static inline struct gen_pool *dma_guess_pool(struct gen_pool *prev, gfp_t gfp)
if (prev == NULL) {
if (IS_ENABLED(CONFIG_ZONE_DMA32) && (gfp & GFP_DMA32))
return atomic_pool_dma32;
- if (IS_ENABLED(CONFIG_ZONE_DMA) && (gfp & GFP_DMA))
+ if (atomic_pool_dma && (gfp & GFP_DMA))
return atomic_pool_dma;
return atomic_pool_kernel;
}
--
2.17.2

2021-08-10 17:21:42

by Baoquan He

[permalink] [raw]
Subject: [RFC PATCH v2 3/5] mm_zone: add function to check if managed dma zone exists

In some places of the current kernel, it assumes that dma zone must have
managed pages if CONFIG_ZONE_DMA is enabled. While this is not always true.
E.g in kdump kernel of x86_64, only low 1M is presented and locked down
at very early stage of boot, so that there's no managed pages at all in
DMA zone. This exception will always cause page allocation failure if page
is requested from DMA zone.

Here add function has_managed_dma() and the relevant helper functions to
check if there's DMA zone with managed pages. It will be used in later
patches.

Signed-off-by: Baoquan He <[email protected]>
---
include/linux/mmzone.h | 21 +++++++++++++++++++++
mm/page_alloc.c | 11 +++++++++++
2 files changed, 32 insertions(+)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index fcb535560028..e3cd23fc5f64 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -975,6 +975,18 @@ static inline bool zone_is_zone_device(struct zone *zone)
}
#endif

+#ifdef CONFIG_ZONE_DMA
+static inline bool zone_is_dma(struct zone *zone)
+{
+ return zone_idx(zone) == ZONE_DMA;
+}
+#else
+static inline bool zone_is_dma(struct zone *zone)
+{
+ return false;
+}
+#endif
+
/*
* Returns true if a zone has pages managed by the buddy allocator.
* All the reclaim decisions have to use this function rather than
@@ -1023,6 +1035,7 @@ static inline int is_highmem_idx(enum zone_type idx)
#endif
}

+bool has_managed_dma(void);
/**
* is_highmem - helper function to quickly check if a struct zone is a
* highmem zone or not. This is an attempt to keep references
@@ -1108,6 +1121,14 @@ extern struct zone *next_zone(struct zone *zone);
; /* do nothing */ \
else

+#define for_each_managed_zone(zone) \
+ for (zone = (first_online_pgdat())->node_zones; \
+ zone; \
+ zone = next_zone(zone)) \
+ if (!managed_zone(zone)) \
+ ; /* do nothing */ \
+ else
+
static inline struct zone *zonelist_zone(struct zoneref *zoneref)
{
return zoneref->zone;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3e97e68aef7a..45dd1295416a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -9441,4 +9441,15 @@ bool take_page_off_buddy(struct page *page)
spin_unlock_irqrestore(&zone->lock, flags);
return ret;
}
+
+bool has_managed_dma(void)
+{
+ struct zone *zone;
+
+ for_each_managed_zone(zone) {
+ if (zone_is_dma(zone))
+ return true;
+ }
+ return false;
+}
#endif
--
2.17.2