2020-01-17 03:59:00

by Qian Cai

[permalink] [raw]
Subject: [PATCH -next v4] mm/hotplug: silence a lockdep splat with printk()

It is not that hard to trigger lockdep splats by calling printk from
under zone->lock. Most of them are false positives caused by lock chains
introduced early in the boot process and they do not cause any real
problems (although some of the early boot lock dependencies could
happenn after boot as well). There are some console drivers which do
allocate from the printk context as well and those should be fixed. In
any case false positives are not that trivial to workaround and it is
far from optimal to lose lockdep functionality for something that is a
non-issue.

So change has_unmovable_pages() so that it no longer calls dump_page()
itself - instead it returns a "struct page *" of the unmovable page back
to the caller so that in the case of a has_unmovable_pages() failure,
the caller can call dump_page() after releasing zone->lock. Also, make
dump_page() is able to report a CMA page as well, so the reason string
from has_unmovable_pages() can be removed.

Even though has_unmovable_pages doesn't hold any reference to the
returned page this should be reasonably safe for the purpose of
reporting the page (dump_page) because it cannot be hotremoved. The
state of the page might change but that is the case even with the
existing code as zone->lock only plays role for free pages.

While at it, remove a similar but unnecessary debug-only printk() as
well.

WARNING: possible circular locking dependency detected
------------------------------------------------------
test.sh/8653 is trying to acquire lock:
ffffffff865a4460 (console_owner){-.-.}, at:
console_unlock+0x207/0x750

but task is already holding lock:
ffff88883fff3c58 (&(&zone->lock)->rlock){-.-.}, at:
__offline_isolated_pages+0x179/0x3e0

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&(&zone->lock)->rlock){-.-.}:
__lock_acquire+0x5b3/0xb40
lock_acquire+0x126/0x280
_raw_spin_lock+0x2f/0x40
rmqueue_bulk.constprop.21+0xb6/0x1160
get_page_from_freelist+0x898/0x22c0
__alloc_pages_nodemask+0x2f3/0x1cd0
alloc_pages_current+0x9c/0x110
allocate_slab+0x4c6/0x19c0
new_slab+0x46/0x70
___slab_alloc+0x58b/0x960
__slab_alloc+0x43/0x70
__kmalloc+0x3ad/0x4b0
__tty_buffer_request_room+0x100/0x250
tty_insert_flip_string_fixed_flag+0x67/0x110
pty_write+0xa2/0xf0
n_tty_write+0x36b/0x7b0
tty_write+0x284/0x4c0
__vfs_write+0x50/0xa0
vfs_write+0x105/0x290
redirected_tty_write+0x6a/0xc0
do_iter_write+0x248/0x2a0
vfs_writev+0x106/0x1e0
do_writev+0xd4/0x180
__x64_sys_writev+0x45/0x50
do_syscall_64+0xcc/0x76c
entry_SYSCALL_64_after_hwframe+0x49/0xbe

-> #2 (&(&port->lock)->rlock){-.-.}:
__lock_acquire+0x5b3/0xb40
lock_acquire+0x126/0x280
_raw_spin_lock_irqsave+0x3a/0x50
tty_port_tty_get+0x20/0x60
tty_port_default_wakeup+0xf/0x30
tty_port_tty_wakeup+0x39/0x40
uart_write_wakeup+0x2a/0x40
serial8250_tx_chars+0x22e/0x440
serial8250_handle_irq.part.8+0x14a/0x170
serial8250_default_handle_irq+0x5c/0x90
serial8250_interrupt+0xa6/0x130
__handle_irq_event_percpu+0x78/0x4f0
handle_irq_event_percpu+0x70/0x100
handle_irq_event+0x5a/0x8b
handle_edge_irq+0x117/0x370
do_IRQ+0x9e/0x1e0
ret_from_intr+0x0/0x2a
cpuidle_enter_state+0x156/0x8e0
cpuidle_enter+0x41/0x70
call_cpuidle+0x5e/0x90
do_idle+0x333/0x370
cpu_startup_entry+0x1d/0x1f
start_secondary+0x290/0x330
secondary_startup_64+0xb6/0xc0

-> #1 (&port_lock_key){-.-.}:
__lock_acquire+0x5b3/0xb40
lock_acquire+0x126/0x280
_raw_spin_lock_irqsave+0x3a/0x50
serial8250_console_write+0x3e4/0x450
univ8250_console_write+0x4b/0x60
console_unlock+0x501/0x750
vprintk_emit+0x10d/0x340
vprintk_default+0x1f/0x30
vprintk_func+0x44/0xd4
printk+0x9f/0xc5

-> #0 (console_owner){-.-.}:
check_prev_add+0x107/0xea0
validate_chain+0x8fc/0x1200
__lock_acquire+0x5b3/0xb40
lock_acquire+0x126/0x280
console_unlock+0x269/0x750
vprintk_emit+0x10d/0x340
vprintk_default+0x1f/0x30
vprintk_func+0x44/0xd4
printk+0x9f/0xc5
__offline_isolated_pages.cold.52+0x2f/0x30a
offline_isolated_pages_cb+0x17/0x30
walk_system_ram_range+0xda/0x160
__offline_pages+0x79c/0xa10
offline_pages+0x11/0x20
memory_subsys_offline+0x7e/0xc0
device_offline+0xd5/0x110
state_store+0xc6/0xe0
dev_attr_store+0x3f/0x60
sysfs_kf_write+0x89/0xb0
kernfs_fop_write+0x188/0x240
__vfs_write+0x50/0xa0
vfs_write+0x105/0x290
ksys_write+0xc6/0x160
__x64_sys_write+0x43/0x50
do_syscall_64+0xcc/0x76c
entry_SYSCALL_64_after_hwframe+0x49/0xbe

other info that might help us debug this:

Chain exists of:
console_owner --> &(&port->lock)->rlock --> &(&zone->lock)-

>rlock

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(&(&zone->lock)->rlock);
lock(&(&port->lock)->rlock);
lock(&(&zone->lock)->rlock);
lock(console_owner);

*** DEADLOCK ***

9 locks held by test.sh/8653:
#0: ffff88839ba7d408 (sb_writers#4){.+.+}, at:
vfs_write+0x25f/0x290
#1: ffff888277618880 (&of->mutex){+.+.}, at:
kernfs_fop_write+0x128/0x240
#2: ffff8898131fc218 (kn->count#115){.+.+}, at:
kernfs_fop_write+0x138/0x240
#3: ffffffff86962a80 (device_hotplug_lock){+.+.}, at:
lock_device_hotplug_sysfs+0x16/0x50
#4: ffff8884374f4990 (&dev->mutex){....}, at:
device_offline+0x70/0x110
#5: ffffffff86515250 (cpu_hotplug_lock.rw_sem){++++}, at:
__offline_pages+0xbf/0xa10
#6: ffffffff867405f0 (mem_hotplug_lock.rw_sem){++++}, at:
percpu_down_write+0x87/0x2f0
#7: ffff88883fff3c58 (&(&zone->lock)->rlock){-.-.}, at:
__offline_isolated_pages+0x179/0x3e0
#8: ffffffff865a4920 (console_lock){+.+.}, at:
vprintk_emit+0x100/0x340

stack backtrace:
Hardware name: HPE ProLiant DL560 Gen10/ProLiant DL560 Gen10,
BIOS U34 05/21/2019
Call Trace:
dump_stack+0x86/0xca
print_circular_bug.cold.31+0x243/0x26e
check_noncircular+0x29e/0x2e0
check_prev_add+0x107/0xea0
validate_chain+0x8fc/0x1200
__lock_acquire+0x5b3/0xb40
lock_acquire+0x126/0x280
console_unlock+0x269/0x750
vprintk_emit+0x10d/0x340
vprintk_default+0x1f/0x30
vprintk_func+0x44/0xd4
printk+0x9f/0xc5
__offline_isolated_pages.cold.52+0x2f/0x30a
offline_isolated_pages_cb+0x17/0x30
walk_system_ram_range+0xda/0x160
__offline_pages+0x79c/0xa10
offline_pages+0x11/0x20
memory_subsys_offline+0x7e/0xc0
device_offline+0xd5/0x110
state_store+0xc6/0xe0
dev_attr_store+0x3f/0x60
sysfs_kf_write+0x89/0xb0
kernfs_fop_write+0x188/0x240
__vfs_write+0x50/0xa0
vfs_write+0x105/0x290
ksys_write+0xc6/0x160
__x64_sys_write+0x43/0x50
do_syscall_64+0xcc/0x76c
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Acked-by: Michal Hocko <[email protected]>
Signed-off-by: Qian Cai <[email protected]>
---

v4: Update the commit log again thanks to Michal.
v3: Rebase to next-20200115 for the mm/debug change and update some
comments thanks to Michal.
v2: Improve the commit log and report CMA in dump_page() per Andrew.
has_unmovable_pages() returns a "struct page *" to the caller.

include/linux/page-isolation.h | 4 ++--
mm/debug.c | 4 +++-
mm/memory_hotplug.c | 6 ++++--
mm/page_alloc.c | 22 +++++++++-------------
mm/page_isolation.c | 11 ++++++++++-
5 files changed, 28 insertions(+), 19 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 148e65a9c606..da043ae86488 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -33,8 +33,8 @@ static inline bool is_migrate_isolate(int migratetype)
#define MEMORY_OFFLINE 0x1
#define REPORT_FAILURE 0x2

-bool has_unmovable_pages(struct zone *zone, struct page *page, int migratetype,
- int flags);
+struct page *has_unmovable_pages(struct zone *zone, struct page *page, int
+ migratetype, int flags);
void set_pageblock_migratetype(struct page *page, int migratetype);
int move_freepages_block(struct zone *zone, struct page *page,
int migratetype, int *num_movable);
diff --git a/mm/debug.c b/mm/debug.c
index 6a52316af839..784f9da711b0 100644
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -46,6 +46,7 @@ void __dump_page(struct page *page, const char *reason)
{
struct address_space *mapping;
bool page_poisoned = PagePoisoned(page);
+ bool page_cma = is_migrate_cma_page(page);
int mapcount;
char *type = "";

@@ -92,7 +93,8 @@ void __dump_page(struct page *page, const char *reason)
}
BUILD_BUG_ON(ARRAY_SIZE(pageflag_names) != __NR_PAGEFLAGS + 1);

- pr_warn("%sflags: %#lx(%pGp)\n", type, page->flags, &page->flags);
+ pr_warn("%sflags: %#lx(%pGp)%s", type, page->flags, &page->flags,
+ page_cma ? " CMA\n" : "\n");

hex_only:
print_hex_dump(KERN_WARNING, "raw: ", DUMP_PREFIX_NONE, 32,
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 7a6de9b0dcab..06e7dd3eb9a9 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1148,8 +1148,10 @@ static bool is_pageblock_removable_nolock(unsigned long pfn)
if (!zone_spans_pfn(zone, pfn))
return false;

- return !has_unmovable_pages(zone, page, MIGRATE_MOVABLE,
- MEMORY_OFFLINE);
+ if (has_unmovable_pages(zone, page, MIGRATE_MOVABLE, MEMORY_OFFLINE))
+ return false;
+
+ return true;
}

/* Checks if this range of memory is likely to be hot-removable. */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e56cd1f33242..e90140e879e6 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8202,13 +8202,16 @@ void *__init alloc_large_system_hash(const char *tablename,
* MIGRATE_MOVABLE block might include unmovable pages. And __PageMovable
* check without lock_page also may miss some movable non-lru pages at
* race condition. So you can't expect this function should be exact.
+ *
+ * It returns a page without holding a reference. It should be safe here
+ * because the page cannot go away because it is unmovable, but it must not to
+ * be used for anything else rather than dumping its state.
*/
-bool has_unmovable_pages(struct zone *zone, struct page *page, int migratetype,
- int flags)
+struct page *has_unmovable_pages(struct zone *zone, struct page *page,
+ int migratetype, int flags)
{
unsigned long iter = 0;
unsigned long pfn = page_to_pfn(page);
- const char *reason = "unmovable page";

/*
* TODO we could make this much more efficient by not checking every
@@ -8225,9 +8228,8 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int migratetype,
* so consider them movable here.
*/
if (is_migrate_cma(migratetype))
- return false;
+ return NULL;

- reason = "CMA page";
goto unmovable;
}

@@ -8302,12 +8304,10 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int migratetype,
*/
goto unmovable;
}
- return false;
+ return NULL;
unmovable:
WARN_ON_ONCE(zone_idx(zone) == ZONE_MOVABLE);
- if (flags & REPORT_FAILURE)
- dump_page(pfn_to_page(pfn + iter), reason);
- return true;
+ return pfn_to_page(pfn + iter);
}

#ifdef CONFIG_CONTIG_ALLOC
@@ -8711,10 +8711,6 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
BUG_ON(!PageBuddy(page));
order = page_order(page);
offlined_pages += 1 << order;
-#ifdef CONFIG_DEBUG_VM
- pr_info("remove from free list %lx %d %lx\n",
- pfn, 1 << order, end_pfn);
-#endif
del_page_from_free_area(page, &zone->free_area[order]);
pfn += (1 << order);
}
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 1f8b9dfecbe8..f3af65bac1e0 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -20,6 +20,7 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_
struct zone *zone;
unsigned long flags;
int ret = -EBUSY;
+ struct page *unmovable = NULL;

zone = page_zone(page);

@@ -37,7 +38,8 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_
* FIXME: Now, memory hotplug doesn't call shrink_slab() by itself.
* We just check MOVABLE pages.
*/
- if (!has_unmovable_pages(zone, page, migratetype, isol_flags)) {
+ unmovable = has_unmovable_pages(zone, page, migratetype, isol_flags);
+ if (!unmovable) {
unsigned long nr_pages;
int mt = get_pageblock_migratetype(page);

@@ -54,6 +56,13 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_
spin_unlock_irqrestore(&zone->lock, flags);
if (!ret)
drain_all_pages(zone);
+ else if (isol_flags & REPORT_FAILURE && unmovable)
+ /*
+ * printk() with zone->lock held will guarantee to trigger a
+ * lockdep splat, so defer it here.
+ */
+ dump_page(unmovable, "unmovable page");
+
return ret;
}

--
2.21.0 (Apple Git-122.2)


2020-01-17 04:53:07

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [PATCH -next v4] mm/hotplug: silence a lockdep splat with printk()

On (20/01/16 21:21), Qian Cai wrote:
> It is not that hard to trigger lockdep splats by calling printk from
> under zone->lock. Most of them are false positives caused by lock chains
> introduced early in the boot process and they do not cause any real
> problems (although some of the early boot lock dependencies could
> happenn after boot as well). There are some console drivers which do
> allocate from the printk context as well and those should be fixed. In
> any case false positives are not that trivial to workaround and it is
> far from optimal to lose lockdep functionality for something that is a
> non-issue.
[..]
>
> Acked-by: Michal Hocko <[email protected]>
> Signed-off-by: Qian Cai <[email protected]>

FWIW,
Reviewed-by: Sergey Senozhatsky <[email protected]>

-ss

2020-01-17 09:01:07

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH -next v4] mm/hotplug: silence a lockdep splat with printk()

On Fri 17-01-20 09:51:05, David Hildenbrand wrote:
> On 17.01.20 03:21, Qian Cai wrote:
[...]
> > Even though has_unmovable_pages doesn't hold any reference to the
> > returned page this should be reasonably safe for the purpose of
> > reporting the page (dump_page) because it cannot be hotremoved. The
>
> This is only true in the context of memory unplug, but not in the
> context of is_mem_section_removable()-> is_pageblock_removable_nolock().

Well, the above should hold for that path as well AFAICS. If the page is
unmovable then a racing hotplug cannot remove it, right? Or do you
consider a temporary unmovability to be a problem?
--
Michal Hocko
SUSE Labs

2020-01-17 09:07:02

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH -next v4] mm/hotplug: silence a lockdep splat with printk()

Thanks for the updated changelog. Btw. do you plan to send a patch to
move WARN_ON_ONCE as well, or should I do it?
--
Michal Hocko
SUSE Labs

2020-01-17 09:23:23

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH -next v4] mm/hotplug: silence a lockdep splat with printk()

On 17.01.20 03:21, Qian Cai wrote:
> It is not that hard to trigger lockdep splats by calling printk from
> under zone->lock. Most of them are false positives caused by lock chains
> introduced early in the boot process and they do not cause any real
> problems (although some of the early boot lock dependencies could
> happenn after boot as well). There are some console drivers which do

s/happenn/happen/

> allocate from the printk context as well and those should be fixed. In
> any case false positives are not that trivial to workaround and it is
> far from optimal to lose lockdep functionality for something that is a
> non-issue.
>
> So change has_unmovable_pages() so that it no longer calls dump_page()
> itself - instead it returns a "struct page *" of the unmovable page back
> to the caller so that in the case of a has_unmovable_pages() failure,
> the caller can call dump_page() after releasing zone->lock. Also, make
> dump_page() is able to report a CMA page as well, so the reason string
> from has_unmovable_pages() can be removed.
>
> Even though has_unmovable_pages doesn't hold any reference to the
> returned page this should be reasonably safe for the purpose of
> reporting the page (dump_page) because it cannot be hotremoved. The

This is only true in the context of memory unplug, but not in the
context of is_mem_section_removable()-> is_pageblock_removable_nolock().
But well, that function is already racy as hell (and I dislike it very
much) :)

> state of the page might change but that is the case even with the
> existing code as zone->lock only plays role for free pages.
>
> While at it, remove a similar but unnecessary debug-only printk() as
> well.
>
> WARNING: possible circular locking dependency detected
> ------------------------------------------------------
> test.sh/8653 is trying to acquire lock:
> ffffffff865a4460 (console_owner){-.-.}, at:
> console_unlock+0x207/0x750
>
> but task is already holding lock:
> ffff88883fff3c58 (&(&zone->lock)->rlock){-.-.}, at:
> __offline_isolated_pages+0x179/0x3e0
>
> which lock already depends on the new lock.
>
> the existing dependency chain (in reverse order) is:
>
> -> #3 (&(&zone->lock)->rlock){-.-.}:
> __lock_acquire+0x5b3/0xb40
> lock_acquire+0x126/0x280
> _raw_spin_lock+0x2f/0x40
> rmqueue_bulk.constprop.21+0xb6/0x1160
> get_page_from_freelist+0x898/0x22c0
> __alloc_pages_nodemask+0x2f3/0x1cd0
> alloc_pages_current+0x9c/0x110
> allocate_slab+0x4c6/0x19c0
> new_slab+0x46/0x70
> ___slab_alloc+0x58b/0x960
> __slab_alloc+0x43/0x70
> __kmalloc+0x3ad/0x4b0
> __tty_buffer_request_room+0x100/0x250
> tty_insert_flip_string_fixed_flag+0x67/0x110
> pty_write+0xa2/0xf0
> n_tty_write+0x36b/0x7b0
> tty_write+0x284/0x4c0
> __vfs_write+0x50/0xa0
> vfs_write+0x105/0x290
> redirected_tty_write+0x6a/0xc0
> do_iter_write+0x248/0x2a0
> vfs_writev+0x106/0x1e0
> do_writev+0xd4/0x180
> __x64_sys_writev+0x45/0x50
> do_syscall_64+0xcc/0x76c
> entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> -> #2 (&(&port->lock)->rlock){-.-.}:
> __lock_acquire+0x5b3/0xb40
> lock_acquire+0x126/0x280
> _raw_spin_lock_irqsave+0x3a/0x50
> tty_port_tty_get+0x20/0x60
> tty_port_default_wakeup+0xf/0x30
> tty_port_tty_wakeup+0x39/0x40
> uart_write_wakeup+0x2a/0x40
> serial8250_tx_chars+0x22e/0x440
> serial8250_handle_irq.part.8+0x14a/0x170
> serial8250_default_handle_irq+0x5c/0x90
> serial8250_interrupt+0xa6/0x130
> __handle_irq_event_percpu+0x78/0x4f0
> handle_irq_event_percpu+0x70/0x100
> handle_irq_event+0x5a/0x8b
> handle_edge_irq+0x117/0x370
> do_IRQ+0x9e/0x1e0
> ret_from_intr+0x0/0x2a
> cpuidle_enter_state+0x156/0x8e0
> cpuidle_enter+0x41/0x70
> call_cpuidle+0x5e/0x90
> do_idle+0x333/0x370
> cpu_startup_entry+0x1d/0x1f
> start_secondary+0x290/0x330
> secondary_startup_64+0xb6/0xc0
>
> -> #1 (&port_lock_key){-.-.}:
> __lock_acquire+0x5b3/0xb40
> lock_acquire+0x126/0x280
> _raw_spin_lock_irqsave+0x3a/0x50
> serial8250_console_write+0x3e4/0x450
> univ8250_console_write+0x4b/0x60
> console_unlock+0x501/0x750
> vprintk_emit+0x10d/0x340
> vprintk_default+0x1f/0x30
> vprintk_func+0x44/0xd4
> printk+0x9f/0xc5
>
> -> #0 (console_owner){-.-.}:
> check_prev_add+0x107/0xea0
> validate_chain+0x8fc/0x1200
> __lock_acquire+0x5b3/0xb40
> lock_acquire+0x126/0x280
> console_unlock+0x269/0x750
> vprintk_emit+0x10d/0x340
> vprintk_default+0x1f/0x30
> vprintk_func+0x44/0xd4
> printk+0x9f/0xc5
> __offline_isolated_pages.cold.52+0x2f/0x30a
> offline_isolated_pages_cb+0x17/0x30
> walk_system_ram_range+0xda/0x160
> __offline_pages+0x79c/0xa10
> offline_pages+0x11/0x20
> memory_subsys_offline+0x7e/0xc0
> device_offline+0xd5/0x110
> state_store+0xc6/0xe0
> dev_attr_store+0x3f/0x60
> sysfs_kf_write+0x89/0xb0
> kernfs_fop_write+0x188/0x240
> __vfs_write+0x50/0xa0
> vfs_write+0x105/0x290
> ksys_write+0xc6/0x160
> __x64_sys_write+0x43/0x50
> do_syscall_64+0xcc/0x76c
> entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> other info that might help us debug this:
>
> Chain exists of:
> console_owner --> &(&port->lock)->rlock --> &(&zone->lock)-
>
>> rlock
>
> Possible unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(&(&zone->lock)->rlock);
> lock(&(&port->lock)->rlock);
> lock(&(&zone->lock)->rlock);
> lock(console_owner);
>
> *** DEADLOCK ***
>
> 9 locks held by test.sh/8653:
> #0: ffff88839ba7d408 (sb_writers#4){.+.+}, at:
> vfs_write+0x25f/0x290
> #1: ffff888277618880 (&of->mutex){+.+.}, at:
> kernfs_fop_write+0x128/0x240
> #2: ffff8898131fc218 (kn->count#115){.+.+}, at:
> kernfs_fop_write+0x138/0x240
> #3: ffffffff86962a80 (device_hotplug_lock){+.+.}, at:
> lock_device_hotplug_sysfs+0x16/0x50
> #4: ffff8884374f4990 (&dev->mutex){....}, at:
> device_offline+0x70/0x110
> #5: ffffffff86515250 (cpu_hotplug_lock.rw_sem){++++}, at:
> __offline_pages+0xbf/0xa10
> #6: ffffffff867405f0 (mem_hotplug_lock.rw_sem){++++}, at:
> percpu_down_write+0x87/0x2f0
> #7: ffff88883fff3c58 (&(&zone->lock)->rlock){-.-.}, at:
> __offline_isolated_pages+0x179/0x3e0
> #8: ffffffff865a4920 (console_lock){+.+.}, at:
> vprintk_emit+0x100/0x340
>
> stack backtrace:
> Hardware name: HPE ProLiant DL560 Gen10/ProLiant DL560 Gen10,
> BIOS U34 05/21/2019
> Call Trace:
> dump_stack+0x86/0xca
> print_circular_bug.cold.31+0x243/0x26e
> check_noncircular+0x29e/0x2e0
> check_prev_add+0x107/0xea0
> validate_chain+0x8fc/0x1200
> __lock_acquire+0x5b3/0xb40
> lock_acquire+0x126/0x280
> console_unlock+0x269/0x750
> vprintk_emit+0x10d/0x340
> vprintk_default+0x1f/0x30
> vprintk_func+0x44/0xd4
> printk+0x9f/0xc5
> __offline_isolated_pages.cold.52+0x2f/0x30a
> offline_isolated_pages_cb+0x17/0x30
> walk_system_ram_range+0xda/0x160
> __offline_pages+0x79c/0xa10
> offline_pages+0x11/0x20
> memory_subsys_offline+0x7e/0xc0
> device_offline+0xd5/0x110
> state_store+0xc6/0xe0
> dev_attr_store+0x3f/0x60
> sysfs_kf_write+0x89/0xb0
> kernfs_fop_write+0x188/0x240
> __vfs_write+0x50/0xa0
> vfs_write+0x105/0x290
> ksys_write+0xc6/0x160
> __x64_sys_write+0x43/0x50
> do_syscall_64+0xcc/0x76c
> entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> Acked-by: Michal Hocko <[email protected]>
> Signed-off-by: Qian Cai <[email protected]>
> ---
>
> v4: Update the commit log again thanks to Michal.
> v3: Rebase to next-20200115 for the mm/debug change and update some
> comments thanks to Michal.
> v2: Improve the commit log and report CMA in dump_page() per Andrew.
> has_unmovable_pages() returns a "struct page *" to the caller.
>
> include/linux/page-isolation.h | 4 ++--
> mm/debug.c | 4 +++-
> mm/memory_hotplug.c | 6 ++++--
> mm/page_alloc.c | 22 +++++++++-------------
> mm/page_isolation.c | 11 ++++++++++-
> 5 files changed, 28 insertions(+), 19 deletions(-)
>
> diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
> index 148e65a9c606..da043ae86488 100644
> --- a/include/linux/page-isolation.h
> +++ b/include/linux/page-isolation.h
> @@ -33,8 +33,8 @@ static inline bool is_migrate_isolate(int migratetype)
> #define MEMORY_OFFLINE 0x1
> #define REPORT_FAILURE 0x2
>
> -bool has_unmovable_pages(struct zone *zone, struct page *page, int migratetype,
> - int flags);
> +struct page *has_unmovable_pages(struct zone *zone, struct page *page, int
> + migratetype, int flags);
> void set_pageblock_migratetype(struct page *page, int migratetype);
> int move_freepages_block(struct zone *zone, struct page *page,
> int migratetype, int *num_movable);
> diff --git a/mm/debug.c b/mm/debug.c
> index 6a52316af839..784f9da711b0 100644
> --- a/mm/debug.c
> +++ b/mm/debug.c
> @@ -46,6 +46,7 @@ void __dump_page(struct page *page, const char *reason)
> {
> struct address_space *mapping;
> bool page_poisoned = PagePoisoned(page);
> + bool page_cma = is_migrate_cma_page(page);

-> you are accessing the pageblock without the zone lock. It could
change to "isolate" again in the meantime if I am not wrong!

> int mapcount;
> char *type = "";
>
> @@ -92,7 +93,8 @@ void __dump_page(struct page *page, const char *reason)
> }
> BUILD_BUG_ON(ARRAY_SIZE(pageflag_names) != __NR_PAGEFLAGS + 1);
>
> - pr_warn("%sflags: %#lx(%pGp)\n", type, page->flags, &page->flags);
> + pr_warn("%sflags: %#lx(%pGp)%s", type, page->flags, &page->flags,
> + page_cma ? " CMA\n" : "\n");

I'd do a

pr_warn("%sflags: %#lx(%pGp)%s\n", type, page->flags, &page->flags,
page_cma ? " CMA" : "");

>
> hex_only:
> print_hex_dump(KERN_WARNING, "raw: ", DUMP_PREFIX_NONE, 32,
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 7a6de9b0dcab..06e7dd3eb9a9 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1148,8 +1148,10 @@ static bool is_pageblock_removable_nolock(unsigned long pfn)
> if (!zone_spans_pfn(zone, pfn))
> return false;
>
> - return !has_unmovable_pages(zone, page, MIGRATE_MOVABLE,
> - MEMORY_OFFLINE);
> + if (has_unmovable_pages(zone, page, MIGRATE_MOVABLE, MEMORY_OFFLINE))
> + return false;
> +
> + return true;

if it returns NULL, !NULL converts it to "true"
if it returns PTR, !PTR converts it to "false"

Is this change really necessary?


> }
>
> /* Checks if this range of memory is likely to be hot-removable. */
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index e56cd1f33242..e90140e879e6 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -8202,13 +8202,16 @@ void *__init alloc_large_system_hash(const char *tablename,
> * MIGRATE_MOVABLE block might include unmovable pages. And __PageMovable
> * check without lock_page also may miss some movable non-lru pages at
> * race condition. So you can't expect this function should be exact.
> + *
> + * It returns a page without holding a reference. It should be safe here
> + * because the page cannot go away because it is unmovable, but it must not to
> + * be used for anything else rather than dumping its state.

I think something like this would be better:

"Returns a page without holding a reference. If the caller wants to
dereference that page (e.g., dumping), it has to make sure that that it
cannot get removed (e.g., via memory unplug) concurrently."


> */
> -bool has_unmovable_pages(struct zone *zone, struct page *page, int migratetype,
> - int flags)
> +struct page *has_unmovable_pages(struct zone *zone, struct page *page,
> + int migratetype, int flags)
> {
> unsigned long iter = 0;
> unsigned long pfn = page_to_pfn(page);
> - const char *reason = "unmovable page";
>
> /*
> * TODO we could make this much more efficient by not checking every
> @@ -8225,9 +8228,8 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int migratetype,
> * so consider them movable here.
> */
> if (is_migrate_cma(migratetype))
> - return false;
> + return NULL;
>
> - reason = "CMA page";
> goto unmovable;
> }
>
> @@ -8302,12 +8304,10 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int migratetype,
> */
> goto unmovable;
> }
> - return false;
> + return NULL;
> unmovable:
> WARN_ON_ONCE(zone_idx(zone) == ZONE_MOVABLE);
> - if (flags & REPORT_FAILURE)
> - dump_page(pfn_to_page(pfn + iter), reason);
> - return true;
> + return pfn_to_page(pfn + iter);
> }
>
> #ifdef CONFIG_CONTIG_ALLOC
> @@ -8711,10 +8711,6 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
> BUG_ON(!PageBuddy(page));
> order = page_order(page);
> offlined_pages += 1 << order;
> -#ifdef CONFIG_DEBUG_VM
> - pr_info("remove from free list %lx %d %lx\n",
> - pfn, 1 << order, end_pfn);
> -#endif
> del_page_from_free_area(page, &zone->free_area[order]);
> pfn += (1 << order);
> }
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index 1f8b9dfecbe8..f3af65bac1e0 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -20,6 +20,7 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_
> struct zone *zone;
> unsigned long flags;
> int ret = -EBUSY;
> + struct page *unmovable = NULL;

nit: reverse christmas tree please :) (move it to the top)

>
> zone = page_zone(page);
>
> @@ -37,7 +38,8 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_
> * FIXME: Now, memory hotplug doesn't call shrink_slab() by itself.
> * We just check MOVABLE pages.
> */
> - if (!has_unmovable_pages(zone, page, migratetype, isol_flags)) {
> + unmovable = has_unmovable_pages(zone, page, migratetype, isol_flags);
> + if (!unmovable) {
> unsigned long nr_pages;
> int mt = get_pageblock_migratetype(page);
>
> @@ -54,6 +56,13 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_
> spin_unlock_irqrestore(&zone->lock, flags);
> if (!ret)
> drain_all_pages(zone);
> + else if (isol_flags & REPORT_FAILURE && unmovable)

(isol_flags & REPORT_FAILURE) please for readability

> + /*
> + * printk() with zone->lock held will guarantee to trigger a
> + * lockdep splat, so defer it here.
> + */
> + dump_page(unmovable, "unmovable page");
> +
> return ret;
> }


--
Thanks,

David / dhildenb

2020-01-17 09:34:43

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH -next v4] mm/hotplug: silence a lockdep splat with printk()

On 17.01.20 09:59, Michal Hocko wrote:
> On Fri 17-01-20 09:51:05, David Hildenbrand wrote:
>> On 17.01.20 03:21, Qian Cai wrote:
> [...]
>>> Even though has_unmovable_pages doesn't hold any reference to the
>>> returned page this should be reasonably safe for the purpose of
>>> reporting the page (dump_page) because it cannot be hotremoved. The
>>
>> This is only true in the context of memory unplug, but not in the
>> context of is_mem_section_removable()-> is_pageblock_removable_nolock().
>
> Well, the above should hold for that path as well AFAICS. If the page is
> unmovable then a racing hotplug cannot remove it, right? Or do you
> consider a temporary unmovability to be a problem?

Somebody could test /sys/devices/system/memory/memoryX/removable. While
returning the unmovable page, it could become movable and
offlining+removing could succeed. Just a matter of where you put a
longer sleep after is_mem_section_removable().

Very unlikely, and it's racy as hell already (e.g., offlining
concurrently while testing for removability etc.).

--
Thanks,

David / dhildenb

2020-01-17 09:41:21

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH -next v4] mm/hotplug: silence a lockdep splat with printk()

On Fri 17-01-20 10:25:06, David Hildenbrand wrote:
> On 17.01.20 09:59, Michal Hocko wrote:
> > On Fri 17-01-20 09:51:05, David Hildenbrand wrote:
> >> On 17.01.20 03:21, Qian Cai wrote:
> > [...]
> >>> Even though has_unmovable_pages doesn't hold any reference to the
> >>> returned page this should be reasonably safe for the purpose of
> >>> reporting the page (dump_page) because it cannot be hotremoved. The
> >>
> >> This is only true in the context of memory unplug, but not in the
> >> context of is_mem_section_removable()-> is_pageblock_removable_nolock().
> >
> > Well, the above should hold for that path as well AFAICS. If the page is
> > unmovable then a racing hotplug cannot remove it, right? Or do you
> > consider a temporary unmovability to be a problem?
>
> Somebody could test /sys/devices/system/memory/memoryX/removable. While
> returning the unmovable page, it could become movable and
> offlining+removing could succeed.

Doesn't this path use device lock or something? If not than the new code
is not more racy then the existing one. Just look at
is_pageblock_removable_nolock and how it dereferences struct page
(page_zonenum in page_zone.)
--
Michal Hocko
SUSE Labs

2020-01-17 09:43:23

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH -next v4] mm/hotplug: silence a lockdep splat with printk()

On 17.01.20 10:40, Michal Hocko wrote:
> On Fri 17-01-20 10:25:06, David Hildenbrand wrote:
>> On 17.01.20 09:59, Michal Hocko wrote:
>>> On Fri 17-01-20 09:51:05, David Hildenbrand wrote:
>>>> On 17.01.20 03:21, Qian Cai wrote:
>>> [...]
>>>>> Even though has_unmovable_pages doesn't hold any reference to the
>>>>> returned page this should be reasonably safe for the purpose of
>>>>> reporting the page (dump_page) because it cannot be hotremoved. The
>>>>
>>>> This is only true in the context of memory unplug, but not in the
>>>> context of is_mem_section_removable()-> is_pageblock_removable_nolock().
>>>
>>> Well, the above should hold for that path as well AFAICS. If the page is
>>> unmovable then a racing hotplug cannot remove it, right? Or do you
>>> consider a temporary unmovability to be a problem?
>>
>> Somebody could test /sys/devices/system/memory/memoryX/removable. While
>> returning the unmovable page, it could become movable and
>> offlining+removing could succeed.
>
> Doesn't this path use device lock or something? If not than the new code
> is not more racy then the existing one. Just look at
> is_pageblock_removable_nolock and how it dereferences struct page
> (page_zonenum in page_zone.)
>

AFAIK no device lock, no device hotplug lock, no memory hotplug lock. I
think it holds a reference to the device and to the kernelfs node. But
AFAIK that does not block removal of offlining/memory, just when the
objects get freed.

--
Thanks,

David / dhildenb

2020-01-17 10:18:58

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH -next v4] mm/hotplug: silence a lockdep splat with printk()

On Fri 17-01-20 10:42:10, David Hildenbrand wrote:
> On 17.01.20 10:40, Michal Hocko wrote:
> > On Fri 17-01-20 10:25:06, David Hildenbrand wrote:
> >> On 17.01.20 09:59, Michal Hocko wrote:
> >>> On Fri 17-01-20 09:51:05, David Hildenbrand wrote:
> >>>> On 17.01.20 03:21, Qian Cai wrote:
> >>> [...]
> >>>>> Even though has_unmovable_pages doesn't hold any reference to the
> >>>>> returned page this should be reasonably safe for the purpose of
> >>>>> reporting the page (dump_page) because it cannot be hotremoved. The
> >>>>
> >>>> This is only true in the context of memory unplug, but not in the
> >>>> context of is_mem_section_removable()-> is_pageblock_removable_nolock().
> >>>
> >>> Well, the above should hold for that path as well AFAICS. If the page is
> >>> unmovable then a racing hotplug cannot remove it, right? Or do you
> >>> consider a temporary unmovability to be a problem?
> >>
> >> Somebody could test /sys/devices/system/memory/memoryX/removable. While
> >> returning the unmovable page, it could become movable and
> >> offlining+removing could succeed.
> >
> > Doesn't this path use device lock or something? If not than the new code
> > is not more racy then the existing one. Just look at
> > is_pageblock_removable_nolock and how it dereferences struct page
> > (page_zonenum in page_zone.)
> >
>
> AFAIK no device lock, no device hotplug lock, no memory hotplug lock. I
> think it holds a reference to the device and to the kernelfs node. But
> AFAIK that does not block removal of offlining/memory, just when the
> objects get freed.

OK, so we are bug compatible after this patch ;)
--
Michal Hocko
SUSE Labs

2020-01-17 10:19:58

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH -next v4] mm/hotplug: silence a lockdep splat with printk()

On 17.01.20 11:17, Michal Hocko wrote:
> On Fri 17-01-20 10:42:10, David Hildenbrand wrote:
>> On 17.01.20 10:40, Michal Hocko wrote:
>>> On Fri 17-01-20 10:25:06, David Hildenbrand wrote:
>>>> On 17.01.20 09:59, Michal Hocko wrote:
>>>>> On Fri 17-01-20 09:51:05, David Hildenbrand wrote:
>>>>>> On 17.01.20 03:21, Qian Cai wrote:
>>>>> [...]
>>>>>>> Even though has_unmovable_pages doesn't hold any reference to the
>>>>>>> returned page this should be reasonably safe for the purpose of
>>>>>>> reporting the page (dump_page) because it cannot be hotremoved. The
>>>>>>
>>>>>> This is only true in the context of memory unplug, but not in the
>>>>>> context of is_mem_section_removable()-> is_pageblock_removable_nolock().
>>>>>
>>>>> Well, the above should hold for that path as well AFAICS. If the page is
>>>>> unmovable then a racing hotplug cannot remove it, right? Or do you
>>>>> consider a temporary unmovability to be a problem?
>>>>
>>>> Somebody could test /sys/devices/system/memory/memoryX/removable. While
>>>> returning the unmovable page, it could become movable and
>>>> offlining+removing could succeed.
>>>
>>> Doesn't this path use device lock or something? If not than the new code
>>> is not more racy then the existing one. Just look at
>>> is_pageblock_removable_nolock and how it dereferences struct page
>>> (page_zonenum in page_zone.)
>>>
>>
>> AFAIK no device lock, no device hotplug lock, no memory hotplug lock. I
>> think it holds a reference to the device and to the kernelfs node. But
>> AFAIK that does not block removal of offlining/memory, just when the
>> objects get freed.
>
> OK, so we are bug compatible after this patch ;)
>

:D I'm cooking something to refactor that ... nice code :)

--
Thanks,

David / dhildenb

2020-01-17 12:34:33

by Qian Cai

[permalink] [raw]
Subject: Re: [PATCH -next v4] mm/hotplug: silence a lockdep splat with printk()



> On Jan 17, 2020, at 3:59 AM, Michal Hocko <[email protected]> wrote:
>
> Thanks for the updated changelog. Btw. do you plan to send a patch to
> move WARN_ON_ONCE as well, or should I do it?

I’ll send a v5 anyway, so I’ll could remove that for you.

2020-01-17 12:41:41

by Qian Cai

[permalink] [raw]
Subject: Re: [PATCH -next v4] mm/hotplug: silence a lockdep splat with printk()



> On Jan 17, 2020, at 3:51 AM, David Hildenbrand <[email protected]> wrote:
>
> -> you are accessing the pageblock without the zone lock. It could
> change to "isolate" again in the meantime if I am not wrong!

Since we are just dumping the state for debugging, it should be fine to accept a bit inaccuracy here due to racing. I could put a bit comments over there.

2020-01-17 12:54:43

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH -next v4] mm/hotplug: silence a lockdep splat with printk()

On 17.01.20 13:40, Qian Cai wrote:
>
>
>> On Jan 17, 2020, at 3:51 AM, David Hildenbrand <[email protected]> wrote:
>>
>> -> you are accessing the pageblock without the zone lock. It could
>> change to "isolate" again in the meantime if I am not wrong!
>
> Since we are just dumping the state for debugging, it should be fine to accept a bit inaccuracy here due to racing. I could put a bit comments over there.
>

Can't we just use the zone_idx() stored in the memmap instead?

--
Thanks,

David / dhildenb

2020-01-17 13:32:17

by Qian Cai

[permalink] [raw]
Subject: Re: [PATCH -next v4] mm/hotplug: silence a lockdep splat with printk()



> On Jan 17, 2020, at 7:53 AM, David Hildenbrand <[email protected]> wrote:
>
> Can't we just use the zone_idx() stored in the memmap instead?

I am having a hard time to guess what that means. Care to elaborate?

2020-01-17 13:43:55

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH -next v4] mm/hotplug: silence a lockdep splat with printk()

On 17.01.20 14:30, Qian Cai wrote:
>
>
>> On Jan 17, 2020, at 7:53 AM, David Hildenbrand <[email protected]> wrote:
>>
>> Can't we just use the zone_idx() stored in the memmap instead?
>
> I am having a hard time to guess what that means. Care to elaborate?
>

I was messing something up and thought for a second there would be a
ZONE_CMA :) It's really just the migratetype.

So yeah, a comment would be great.

--
Thanks,

David / dhildenb

2020-01-17 14:40:49

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH -next v4] mm/hotplug: silence a lockdep splat with printk()

On Fri 17-01-20 07:32:15, Qian Cai wrote:
>
>
> > On Jan 17, 2020, at 3:59 AM, Michal Hocko <[email protected]> wrote:
> >
> > Thanks for the updated changelog. Btw. do you plan to send a patch to
> > move WARN_ON_ONCE as well, or should I do it?
>
> I’ll send a v5 anyway, so I’ll could remove that for you.

Thanks a lot. Having it in a separate patch would be great.

--
Michal Hocko
SUSE Labs

2020-01-17 14:44:43

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH -next v4] mm/hotplug: silence a lockdep splat with printk()

On Fri 17-01-20 07:40:15, Qian Cai wrote:
>
>
> > On Jan 17, 2020, at 3:51 AM, David Hildenbrand <[email protected]> wrote:
> >
> > -> you are accessing the pageblock without the zone lock. It could
> > change to "isolate" again in the meantime if I am not wrong!
>
> Since we are just dumping the state for debugging, it should be fine
> to accept a bit inaccuracy here due to racing. I could put a bit
> comments over there.

Sorry, I could have been more specific. The race I was talking about is
not about accuracy. The current code is racy in that sense already
because you are looking at a struct page you do not own so its state can
change at any time. Please note that the zone->lock doesn't really
prevent from the state transition because that applies only to free
pages and those are obviously OK. So this is not really different.

The race I've had in mind is a when a parallel hotplug would simply
hotremove the section along with the memmap so the struct page was a
complete garbage.

--
Michal Hocko
SUSE Labs

2020-01-17 14:45:25

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH -next v4] mm/hotplug: silence a lockdep splat with printk()

On 17.01.20 15:42, Michal Hocko wrote:
> On Fri 17-01-20 07:40:15, Qian Cai wrote:
>>
>>
>>> On Jan 17, 2020, at 3:51 AM, David Hildenbrand <[email protected]> wrote:
>>>
>>> -> you are accessing the pageblock without the zone lock. It could
>>> change to "isolate" again in the meantime if I am not wrong!
>>
>> Since we are just dumping the state for debugging, it should be fine
>> to accept a bit inaccuracy here due to racing. I could put a bit
>> comments over there.
>
> Sorry, I could have been more specific. The race I was talking about is
> not about accuracy. The current code is racy in that sense already
> because you are looking at a struct page you do not own so its state can
> change at any time. Please note that the zone->lock doesn't really

The pageblock state cannot change with the zone->lock. That's what I was
referring to here. (this specific check)

> prevent from the state transition because that applies only to free
> pages and those are obviously OK. So this is not really different.


--
Thanks,

David / dhildenb

2020-01-17 15:08:47

by Qian Cai

[permalink] [raw]
Subject: Re: [PATCH -next v4] mm/hotplug: silence a lockdep splat with printk()



> On Jan 17, 2020, at 9:39 AM, Michal Hocko <[email protected]> wrote:
>
> Thanks a lot. Having it in a separate patch would be great.

I was thinking about removing that WARN together in this v5 patch, so there is less churn to touch the same function again. However, I am fine either way, so just shout out if you feel strongly towards a separate patch.

2020-01-17 15:28:55

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH -next v4] mm/hotplug: silence a lockdep splat with printk()

On Fri 17-01-20 15:43:58, David Hildenbrand wrote:
> On 17.01.20 15:42, Michal Hocko wrote:
> > On Fri 17-01-20 07:40:15, Qian Cai wrote:
> >>
> >>
> >>> On Jan 17, 2020, at 3:51 AM, David Hildenbrand <[email protected]> wrote:
> >>>
> >>> -> you are accessing the pageblock without the zone lock. It could
> >>> change to "isolate" again in the meantime if I am not wrong!
> >>
> >> Since we are just dumping the state for debugging, it should be fine
> >> to accept a bit inaccuracy here due to racing. I could put a bit
> >> comments over there.
> >
> > Sorry, I could have been more specific. The race I was talking about is
> > not about accuracy. The current code is racy in that sense already
> > because you are looking at a struct page you do not own so its state can
> > change at any time. Please note that the zone->lock doesn't really
>
> The pageblock state cannot change with the zone->lock. That's what I was
> referring to here. (this specific check)

CMA pages tend to have a stable pageblock state. More importantly this
is not the most important information from my experience. It is usually
reference count and page flags that are of interest. Or details about
what kind of filesystem is owning the page and it doesn't want to
release it.
--
Michal Hocko
SUSE Labs

2020-01-17 16:18:14

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH -next v4] mm/hotplug: silence a lockdep splat with printk()

On Fri 17-01-20 10:05:12, Qian Cai wrote:
>
>
> > On Jan 17, 2020, at 9:39 AM, Michal Hocko <[email protected]> wrote:
> >
> > Thanks a lot. Having it in a separate patch would be great.
>
> I was thinking about removing that WARN together in this v5 patch,
> so there is less churn to touch the same function again. However, I
> am fine either way, so just shout out if you feel strongly towards a
> separate patch.

I hope you meant moving rather than removing ;). The warning is useful
because we shouldn't see unmovable pages in the movable zone. And a
separate patch makes more sense because the justification is slightly
different. We do not want to have a way for userspace to trigger the
warning from userspace - even though it shouldn't be possible, but
still. Only the offlining path should complain.

--
Michal Hocko
SUSE Labs

2020-01-17 18:51:21

by Qian Cai

[permalink] [raw]
Subject: Re: [PATCH -next v4] mm/hotplug: silence a lockdep splat with printk()



> On Jan 17, 2020, at 10:46 AM, Michal Hocko <[email protected]> wrote:
>
> On Fri 17-01-20 10:05:12, Qian Cai wrote:
>>
>>
>>> On Jan 17, 2020, at 9:39 AM, Michal Hocko <[email protected]> wrote:
>>>
>>> Thanks a lot. Having it in a separate patch would be great.
>>
>> I was thinking about removing that WARN together in this v5 patch,
>> so there is less churn to touch the same function again. However, I
>> am fine either way, so just shout out if you feel strongly towards a
>> separate patch.
>
> I hope you meant moving rather than removing ;). The warning is useful
> because we shouldn't see unmovable pages in the movable zone. And a
> separate patch makes more sense because the justification is slightly
> different. We do not want to have a way for userspace to trigger the
> warning from userspace - even though it shouldn't be possible, but
> still. Only the offlining path should complain.

Something like this?

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 621716a25639..32c854851e1f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8307,7 +8307,6 @@ struct page *has_unmovable_pages(struct zone *zone, struct page *page,
}
return NULL;
unmovable:
- WARN_ON_ONCE(zone_idx(zone) == ZONE_MOVABLE);
return pfn_to_page(pfn + iter);
}

diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index e70586523ca3..08571b515d9f 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -54,9 +54,11 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_

out:
spin_unlock_irqrestore(&zone->lock, flags);
+
if (!ret)
drain_all_pages(zone);
else if ((isol_flags & REPORT_FAILURE) && unmovable)
+ WARN_ON_ONCE(zone_idx(zone) == ZONE_MOVABLE);
/*
* printk() with zone->lock held will guarantee to trigger a
* lockdep splat, so defer it here.

2020-01-17 19:17:20

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH -next v4] mm/hotplug: silence a lockdep splat with printk()



> Am 17.01.2020 um 19:49 schrieb Qian Cai <[email protected]>:
>
> 
>
>> On Jan 17, 2020, at 10:46 AM, Michal Hocko <[email protected]> wrote:
>>
>>> On Fri 17-01-20 10:05:12, Qian Cai wrote:
>>>
>>>
>>>> On Jan 17, 2020, at 9:39 AM, Michal Hocko <[email protected]> wrote:
>>>>
>>>> Thanks a lot. Having it in a separate patch would be great.
>>>
>>> I was thinking about removing that WARN together in this v5 patch,
>>> so there is less churn to touch the same function again. However, I
>>> am fine either way, so just shout out if you feel strongly towards a
>>> separate patch.
>>
>> I hope you meant moving rather than removing ;). The warning is useful
>> because we shouldn't see unmovable pages in the movable zone. And a
>> separate patch makes more sense because the justification is slightly
>> different. We do not want to have a way for userspace to trigger the
>> warning from userspace - even though it shouldn't be possible, but
>> still. Only the offlining path should complain.
>
> Something like this?
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 621716a25639..32c854851e1f 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -8307,7 +8307,6 @@ struct page *has_unmovable_pages(struct zone *zone, struct page *page,
> }
> return NULL;
> unmovable:
> - WARN_ON_ONCE(zone_idx(zone) == ZONE_MOVABLE);
> return pfn_to_page(pfn + iter);
> }
>
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index e70586523ca3..08571b515d9f 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -54,9 +54,11 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_
>
> out:
> spin_unlock_irqrestore(&zone->lock, flags);
> +
> if (!ret)
> drain_all_pages(zone);
> else if ((isol_flags & REPORT_FAILURE) && unmovable)

We have a dedicated flag for the offlining part.

> + WARN_ON_ONCE(zone_idx(zone) == ZONE_MOVABLE);
> /*
> * printk() with zone->lock held will guarantee to trigger a
> * lockdep splat, so defer it here.
>

So, are we fine with unmovable data ending up in ZONE_MOVABLE as long as we can offline it?

This might make my life in virtio-mem a little easier (I can unplug chunks falling into ZONE_MOVABLE).

2020-01-17 19:43:58

by Qian Cai

[permalink] [raw]
Subject: Re: [PATCH -next v4] mm/hotplug: silence a lockdep splat with printk()



> On Jan 17, 2020, at 2:15 PM, David Hildenbrand <[email protected]> wrote:
>
>
>
>> Am 17.01.2020 um 19:49 schrieb Qian Cai <[email protected]>:
>>
>> 
>>
>>> On Jan 17, 2020, at 10:46 AM, Michal Hocko <[email protected]> wrote:
>>>
>>>> On Fri 17-01-20 10:05:12, Qian Cai wrote:
>>>>
>>>>
>>>>> On Jan 17, 2020, at 9:39 AM, Michal Hocko <[email protected]> wrote:
>>>>>
>>>>> Thanks a lot. Having it in a separate patch would be great.
>>>>
>>>> I was thinking about removing that WARN together in this v5 patch,
>>>> so there is less churn to touch the same function again. However, I
>>>> am fine either way, so just shout out if you feel strongly towards a
>>>> separate patch.
>>>
>>> I hope you meant moving rather than removing ;). The warning is useful
>>> because we shouldn't see unmovable pages in the movable zone. And a
>>> separate patch makes more sense because the justification is slightly
>>> different. We do not want to have a way for userspace to trigger the
>>> warning from userspace - even though it shouldn't be possible, but
>>> still. Only the offlining path should complain.
>>
>> Something like this?
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 621716a25639..32c854851e1f 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -8307,7 +8307,6 @@ struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>> }
>> return NULL;
>> unmovable:
>> - WARN_ON_ONCE(zone_idx(zone) == ZONE_MOVABLE);
>> return pfn_to_page(pfn + iter);
>> }
>>
>> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
>> index e70586523ca3..08571b515d9f 100644
>> --- a/mm/page_isolation.c
>> +++ b/mm/page_isolation.c
>> @@ -54,9 +54,11 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_
>>
>> out:
>> spin_unlock_irqrestore(&zone->lock, flags);
>> +
>> if (!ret)
>> drain_all_pages(zone);
>> else if ((isol_flags & REPORT_FAILURE) && unmovable)
>
> We have a dedicated flag for the offlining part.

This should do the trick then,

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 621716a25639..4bb3e503cb9e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8231,7 +8231,7 @@ struct page *has_unmovable_pages(struct zone *zone, struct page *page,
if (is_migrate_cma(migratetype))
return NULL;

- goto unmovable;
+ return page;
}

for (; iter < pageblock_nr_pages; iter++) {
@@ -8241,7 +8241,7 @@ struct page *has_unmovable_pages(struct zone *zone, struct page *page,
page = pfn_to_page(pfn + iter);

if (PageReserved(page))
- goto unmovable;
+ return page;

/*
* If the zone is movable and we have ruled out all reserved
@@ -8303,12 +8303,9 @@ struct page *has_unmovable_pages(struct zone *zone, struct page *page,
* is set to both of a memory hole page and a _used_ kernel
* page at boot.
*/
- goto unmovable;
+ return pfn_to_page(pfn + iter);
}
return NULL;
-unmovable:
- WARN_ON_ONCE(zone_idx(zone) == ZONE_MOVABLE);
- return pfn_to_page(pfn + iter);
}

#ifdef CONFIG_CONTIG_ALLOC
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index e70586523ca3..e140eaa901b2 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -54,14 +54,20 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_

out:
spin_unlock_irqrestore(&zone->lock, flags);
- if (!ret)
+
+ if (!ret) {
drain_all_pages(zone);
- else if ((isol_flags & REPORT_FAILURE) && unmovable)
- /*
- * printk() with zone->lock held will guarantee to trigger a
- * lockdep splat, so defer it here.
- */
- dump_page(unmovable, "unmovable page");
+ } else {
+ if (isol_flags & MEMORY_OFFLINE)
+ WARN_ON_ONCE(zone_idx(zone) == ZONE_MOVABLE);
+
+ if ((isol_flags & REPORT_FAILURE) && unmovable)
+ /*
+ * printk() with zone->lock held will likely trigger a
+ * lockdep splat, so defer it here.
+ */
+ dump_page(unmovable, "unmovable page");
+ }

return ret;
}

>
>> + WARN_ON_ONCE(zone_idx(zone) == ZONE_MOVABLE);
>> /*
>> * printk() with zone->lock held will guarantee to trigger a
>> * lockdep splat, so defer it here.
>>
>
> So, are we fine with unmovable data ending up in ZONE_MOVABLE as long as we can offline it?
>
> This might make my life in virtio-mem a little easier (I can unplug chunks falling into ZONE_MOVABLE).