2018-03-06 19:22:15

by Pavel Tatashin

[permalink] [raw]
Subject: [PATCH] mm: might_sleep warning

Robot reported this issue:
https://lkml.org/lkml/2018/2/27/851

That is introduced by:
mm: initialize pages on demand during boot

The problem is caused by changing static branch value within spin lock.
Spin lock disables preemption, and changing static branch value takes
mutex lock in its path, and thus may sleep.

The fix is to add another boolean variable to avoid the need to change
static branch within spinlock.

Signed-off-by: Pavel Tatashin <[email protected]>
---
mm/page_alloc.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b337a026007c..52edc6695b2b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1579,6 +1579,7 @@ static int __init deferred_init_memmap(void *data)
* page_alloc_init_late() soon after smp_init() is complete.
*/
static __initdata DEFINE_SPINLOCK(deferred_zone_grow_lock);
+static bool deferred_zone_grow __initdata = true;
static DEFINE_STATIC_KEY_TRUE(deferred_pages);

/*
@@ -1616,7 +1617,7 @@ deferred_grow_zone(struct zone *zone, unsigned int order)
* Bail if we raced with another thread that disabled on demand
* initialization.
*/
- if (!static_branch_unlikely(&deferred_pages)) {
+ if (!static_branch_unlikely(&deferred_pages) || !deferred_zone_grow) {
spin_unlock_irqrestore(&deferred_zone_grow_lock, flags);
return false;
}
@@ -1683,10 +1684,15 @@ void __init page_alloc_init_late(void)
/*
* We are about to initialize the rest of deferred pages, permanently
* disable on-demand struct page initialization.
+ *
+ * Note: it is prohibited to modify static branches in non-preemptible
+ * context. Since, spin_lock() disables preemption, we must use an
+ * extra boolean deferred_zone_grow.
*/
spin_lock(&deferred_zone_grow_lock);
- static_branch_disable(&deferred_pages);
+ deferred_zone_grow = false;
spin_unlock(&deferred_zone_grow_lock);
+ static_branch_disable(&deferred_pages);

/* There will be num_node_state(N_MEMORY) threads */
atomic_set(&pgdat_init_n_undone, num_node_state(N_MEMORY));
--
2.16.2



2018-03-06 20:38:09

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] mm: might_sleep warning

On Tue, 6 Mar 2018 14:20:22 -0500 Pavel Tatashin <[email protected]> wrote:

> Robot reported this issue:
> https://lkml.org/lkml/2018/2/27/851
>
> That is introduced by:
> mm: initialize pages on demand during boot
>
> The problem is caused by changing static branch value within spin lock.
> Spin lock disables preemption, and changing static branch value takes
> mutex lock in its path, and thus may sleep.
>
> The fix is to add another boolean variable to avoid the need to change
> static branch within spinlock.
>
> ...
>
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1579,6 +1579,7 @@ static int __init deferred_init_memmap(void *data)
> * page_alloc_init_late() soon after smp_init() is complete.
> */
> static __initdata DEFINE_SPINLOCK(deferred_zone_grow_lock);
> +static bool deferred_zone_grow __initdata = true;
> static DEFINE_STATIC_KEY_TRUE(deferred_pages);
>
> /*
> @@ -1616,7 +1617,7 @@ deferred_grow_zone(struct zone *zone, unsigned int order)
> * Bail if we raced with another thread that disabled on demand
> * initialization.
> */
> - if (!static_branch_unlikely(&deferred_pages)) {
> + if (!static_branch_unlikely(&deferred_pages) || !deferred_zone_grow) {
> spin_unlock_irqrestore(&deferred_zone_grow_lock, flags);
> return false;
> }
> @@ -1683,10 +1684,15 @@ void __init page_alloc_init_late(void)
> /*
> * We are about to initialize the rest of deferred pages, permanently
> * disable on-demand struct page initialization.
> + *
> + * Note: it is prohibited to modify static branches in non-preemptible
> + * context. Since, spin_lock() disables preemption, we must use an
> + * extra boolean deferred_zone_grow.
> */
> spin_lock(&deferred_zone_grow_lock);
> - static_branch_disable(&deferred_pages);
> + deferred_zone_grow = false;
> spin_unlock(&deferred_zone_grow_lock);
> + static_branch_disable(&deferred_pages);
>
> /* There will be num_node_state(N_MEMORY) threads */
> atomic_set(&pgdat_init_n_undone, num_node_state(N_MEMORY));

Kinda ugly, but I can see the logic behind the decisions.

Can we instead turn deferred_zone_grow_lock into a mutex?

2018-03-06 20:57:20

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] mm: might_sleep warning

On Tue, 6 Mar 2018 15:48:26 -0500 Pavel Tatashin <[email protected]> wrote:

> On Tue, Mar 6, 2018 at 3:36 PM, Andrew Morton <[email protected]>
> wrote:
>
> > On Tue, 6 Mar 2018 14:20:22 -0500 Pavel Tatashin <
> > [email protected]> wrote:
> >
> > > spin_lock(&deferred_zone_grow_lock);
> > > - static_branch_disable(&deferred_pages);
> > > + deferred_zone_grow = false;
> > > spin_unlock(&deferred_zone_grow_lock);
> > > + static_branch_disable(&deferred_pages);
> > >
> > > /* There will be num_node_state(N_MEMORY) threads */
> > > atomic_set(&pgdat_init_n_undone, num_node_state(N_MEMORY));
> >
> > Kinda ugly, but I can see the logic behind the decisions.
> >
> > Can we instead turn deferred_zone_grow_lock into a mutex?

(top-posting repaired. Please don't top-post).

> [CCed everyone]
>
> Hi Andrew,
>
> I afraid we cannot change this spinlock to mutex
> because deferred_grow_zone() might be called from an interrupt context if
> interrupt thread needs to allocate memory.
>

OK. But if deferred_grow_zone() can be called from interrupt then
page_alloc_init_late() should be using spin_lock_irq(), shouldn't it?
I'm surprised that lockdep didn't detect that.


--- a/mm/page_alloc.c~mm-initialize-pages-on-demand-during-boot-fix-4-fix
+++ a/mm/page_alloc.c
@@ -1689,9 +1689,9 @@ void __init page_alloc_init_late(void)
* context. Since, spin_lock() disables preemption, we must use an
* extra boolean deferred_zone_grow.
*/
- spin_lock(&deferred_zone_grow_lock);
+ spin_lock_irq(&deferred_zone_grow_lock);
deferred_zone_grow = false;
- spin_unlock(&deferred_zone_grow_lock);
+ spin_unlock_irq(&deferred_zone_grow_lock);
static_branch_disable(&deferred_pages);

/* There will be num_node_state(N_MEMORY) threads */
_


2018-03-06 21:10:34

by Pavel Tatashin

[permalink] [raw]
Subject: Re: [PATCH] mm: might_sleep warning

> > > > spin_lock(&deferred_zone_grow_lock);
> > > > - static_branch_disable(&deferred_pages);
> > > > + deferred_zone_grow = false;
> > > > spin_unlock(&deferred_zone_grow_lock);
> > > > + static_branch_disable(&deferred_pages);
> > > >
> > > > /* There will be num_node_state(N_MEMORY) threads */
> > > > atomic_set(&pgdat_init_n_undone, num_node_state(N_MEMORY));
> > >
> > > Kinda ugly, but I can see the logic behind the decisions.
> > >
> > > Can we instead turn deferred_zone_grow_lock into a mutex?
>
> (top-posting repaired. Please don't top-post).
>
> > [CCed everyone]
> >
> > Hi Andrew,
> >
> > I afraid we cannot change this spinlock to mutex
> > because deferred_grow_zone() might be called from an interrupt context if
> > interrupt thread needs to allocate memory.
> >
>
> OK. But if deferred_grow_zone() can be called from interrupt then
> page_alloc_init_late() should be using spin_lock_irq(), shouldn't it?
> I'm surprised that lockdep didn't detect that.

No, page_alloc_init_late() cannot be called from interrupt, it is
called straight from:
kernel_init_freeable(). But, I believe deferred_grow_zone(): can be called:

get_page_from_freelist()
_deferred_grow_zone()
deferred_grow_zone()


>
>
>
> --- a/mm/page_alloc.c~mm-initialize-pages-on-demand-during-boot-fix-4-fix
> +++ a/mm/page_alloc.c
> @@ -1689,9 +1689,9 @@ void __init page_alloc_init_late(void)
> * context. Since, spin_lock() disables preemption, we must use an
> * extra boolean deferred_zone_grow.
> */
> - spin_lock(&deferred_zone_grow_lock);
> + spin_lock_irq(&deferred_zone_grow_lock);
> deferred_zone_grow = false;
> - spin_unlock(&deferred_zone_grow_lock);
> + spin_unlock_irq(&deferred_zone_grow_lock);
> static_branch_disable(&deferred_pages);
>
> /* There will be num_node_state(N_MEMORY) threads */
> _
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2018-03-06 21:23:56

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] mm: might_sleep warning

On Tue, 6 Mar 2018 16:04:06 -0500 Pavel Tatashin <[email protected]> wrote:

> > > > > spin_lock(&deferred_zone_grow_lock);
> > > > > - static_branch_disable(&deferred_pages);
> > > > > + deferred_zone_grow = false;
> > > > > spin_unlock(&deferred_zone_grow_lock);
> > > > > + static_branch_disable(&deferred_pages);
> > > > >
> > > > > /* There will be num_node_state(N_MEMORY) threads */
> > > > > atomic_set(&pgdat_init_n_undone, num_node_state(N_MEMORY));
> > > >
> > > > Kinda ugly, but I can see the logic behind the decisions.
> > > >
> > > > Can we instead turn deferred_zone_grow_lock into a mutex?
> >
> > (top-posting repaired. Please don't top-post).
> >
> > > [CCed everyone]
> > >
> > > Hi Andrew,
> > >
> > > I afraid we cannot change this spinlock to mutex
> > > because deferred_grow_zone() might be called from an interrupt context if
> > > interrupt thread needs to allocate memory.
> > >
> >
> > OK. But if deferred_grow_zone() can be called from interrupt then
> > page_alloc_init_late() should be using spin_lock_irq(), shouldn't it?
> > I'm surprised that lockdep didn't detect that.
>
> No, page_alloc_init_late() cannot be called from interrupt, it is
> called straight from:
> kernel_init_freeable(). But, I believe deferred_grow_zone(): can be called:
>
> get_page_from_freelist()
> _deferred_grow_zone()
> deferred_grow_zone()

That's why page_alloc_init_late() needs spin_lock_irq(). If a CPU is
holding deferred_zone_grow_lock with enabled interrupts and an
interrupt comes in on that CPU and the CPU runs deferred_grow_zone() in
its interrupt handler, we deadlock.

lockdep knows about this bug and should have reported it.

2018-03-06 21:51:35

by Pavel Tatashin

[permalink] [raw]
Subject: Re: [PATCH] mm: might_sleep warning

> That's why page_alloc_init_late() needs spin_lock_irq(). If a CPU is
> holding deferred_zone_grow_lock with enabled interrupts and an
> interrupt comes in on that CPU and the CPU runs deferred_grow_zone() in
> its interrupt handler, we deadlock.
>
> lockdep knows about this bug and should have reported it.
>

I see what you are saying. Yes you are correct, we need spin_lock_irq()
in page_alloc_init_late(). I will update the patch. I am not sure why
lockdep has not reported it. May be it is initialized after this code is
executed?

Thank you,
Pavel