Subject: [PATCH] Revert mm/vmstat.c: fix vmstat_update() preemption BUG

This patch reverts commit c7f26ccfb2c3 ("mm/vmstat.c: fix
vmstat_update() preemption BUG").
Steven saw a "using smp_processor_id() in preemptible" message and
added a preempt_disable() section around it to keep it quiet. This is
not the right thing to do it does not fix the real problem.

vmstat_update() is invoked by a kworker on a specific CPU. This worker
it bound to this CPU. The name of the worker was "kworker/1:1" so it
should have been a worker which was bound to CPU1. A worker which can
run on any CPU would have a `u' before the first digit.

smp_processor_id() can be used in a preempt-enabled region as long as
the task is bound to a single CPU which is the case here. If it could
run on an arbitrary CPU then this is the problem we have an should seek
to resolve.
Not only this smp_processor_id() must not be migrated to another CPU but
also refresh_cpu_vm_stats() which might access wrong per-CPU variables.
Not to mention that other code relies on the fact that such a worker
runs on one specific CPU only.

Therefore I revert that commit and we should look instead what broke the
affinity mask of the kworker.

Cc: Steven J. Hill <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Andrew Morton <[email protected]>
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
---
mm/vmstat.c | 2 --
1 file changed, 2 deletions(-)

diff --git a/mm/vmstat.c b/mm/vmstat.c
index 33581be705f0..40b2db6db6b1 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1839,11 +1839,9 @@ static void vmstat_update(struct work_struct *w)
* to occur in the future. Keep on running the
* update worker thread.
*/
- preempt_disable();
queue_delayed_work_on(smp_processor_id(), mm_percpu_wq,
this_cpu_ptr(&vmstat_work),
round_jiffies_relative(sysctl_stat_interval));
- preempt_enable();
}
}

--
2.17.0



2018-04-11 14:00:27

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH] Revert mm/vmstat.c: fix vmstat_update() preemption BUG

On 04/11/2018 11:57 AM, Sebastian Andrzej Siewior wrote:
> This patch reverts commit c7f26ccfb2c3 ("mm/vmstat.c: fix
> vmstat_update() preemption BUG").
> Steven saw a "using smp_processor_id() in preemptible" message and
> added a preempt_disable() section around it to keep it quiet. This is
> not the right thing to do it does not fix the real problem.
>
> vmstat_update() is invoked by a kworker on a specific CPU. This worker
> it bound to this CPU. The name of the worker was "kworker/1:1" so it
> should have been a worker which was bound to CPU1. A worker which can
> run on any CPU would have a `u' before the first digit.

Oh my, and I have just been assured by Tejun that his cannot happen :)
And yet, in the original report [1] I see:

CPU: 0 PID: 269 Comm: kworker/1:1 Not tainted

So is this perhaps related to the cpu hotplug that [1] mentions? e.g. is
the cpu being hotplugged cpu 1, the worker started too early before
stuff can be scheduled on the CPU, so it has to run on different than
designated CPU?

[1] https://marc.info/?l=linux-mm&m=152088260625433&w=2

> smp_processor_id() can be used in a preempt-enabled region as long as
> the task is bound to a single CPU which is the case here. If it could
> run on an arbitrary CPU then this is the problem we have an should seek
> to resolve.
> Not only this smp_processor_id() must not be migrated to another CPU but
> also refresh_cpu_vm_stats() which might access wrong per-CPU variables.
> Not to mention that other code relies on the fact that such a worker
> runs on one specific CPU only.
>
> Therefore I revert that commit and we should look instead what broke the
> affinity mask of the kworker.
>
> Cc: Steven J. Hill <[email protected]>
> Cc: Tejun Heo <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
> ---
> mm/vmstat.c | 2 --
> 1 file changed, 2 deletions(-)
>
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 33581be705f0..40b2db6db6b1 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -1839,11 +1839,9 @@ static void vmstat_update(struct work_struct *w)
> * to occur in the future. Keep on running the
> * update worker thread.
> */
> - preempt_disable();
> queue_delayed_work_on(smp_processor_id(), mm_percpu_wq,
> this_cpu_ptr(&vmstat_work),
> round_jiffies_relative(sysctl_stat_interval));
> - preempt_enable();
> }
> }
>
>


2018-04-11 14:13:02

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH] Revert mm/vmstat.c: fix vmstat_update() preemption BUG

Hello,

On Wed, Apr 11, 2018 at 03:56:43PM +0200, Vlastimil Babka wrote:
> > vmstat_update() is invoked by a kworker on a specific CPU. This worker
> > it bound to this CPU. The name of the worker was "kworker/1:1" so it
> > should have been a worker which was bound to CPU1. A worker which can
> > run on any CPU would have a `u' before the first digit.
>
> Oh my, and I have just been assured by Tejun that his cannot happen :)
> And yet, in the original report [1] I see:
>
> CPU: 0 PID: 269 Comm: kworker/1:1 Not tainted
>
> So is this perhaps related to the cpu hotplug that [1] mentions? e.g. is
> the cpu being hotplugged cpu 1, the worker started too early before
> stuff can be scheduled on the CPU, so it has to run on different than
> designated CPU?
>
> [1] https://marc.info/?l=linux-mm&m=152088260625433&w=2

The report says that it happens when hotplug is attempted. Per-cpu
doesn't pin the cpu alive, so if the cpu goes down while a work item
is in flight or a work item is queued while a cpu is offline it'll end
up executing on some other cpu. So, if a piece of code doesn't want
that happening, it gotta interlock itself - ie. start queueing when
the cpu comes online and flush and prevent further queueing when its
cpu goes down.

Thanks.

--
tejun

Subject: Re: [PATCH] Revert mm/vmstat.c: fix vmstat_update() preemption BUG

On 2018-04-11 07:09:13 [-0700], Tejun Heo wrote:
> Hello,
>
> On Wed, Apr 11, 2018 at 03:56:43PM +0200, Vlastimil Babka wrote:
> > > vmstat_update() is invoked by a kworker on a specific CPU. This worker
> > > it bound to this CPU. The name of the worker was "kworker/1:1" so it
> > > should have been a worker which was bound to CPU1. A worker which can
> > > run on any CPU would have a `u' before the first digit.
> >
> > Oh my, and I have just been assured by Tejun that his cannot happen :)
> > And yet, in the original report [1] I see:
> >
> > CPU: 0 PID: 269 Comm: kworker/1:1 Not tainted
> >
> > So is this perhaps related to the cpu hotplug that [1] mentions? e.g. is
> > the cpu being hotplugged cpu 1, the worker started too early before
> > stuff can be scheduled on the CPU, so it has to run on different than
> > designated CPU?
> >
> > [1] https://marc.info/?l=linux-mm&m=152088260625433&w=2
>
> The report says that it happens when hotplug is attempted. Per-cpu
> doesn't pin the cpu alive, so if the cpu goes down while a work item
> is in flight or a work item is queued while a cpu is offline it'll end
> up executing on some other cpu. So, if a piece of code doesn't want
> that happening, it gotta interlock itself - ie. start queueing when
> the cpu comes online and flush and prevent further queueing when its
> cpu goes down.

I missed that cpuhotplug part while reading it. So in that case, let me
add a CPU-hotplug notifier which cancels that work. After all it is not
need once the CPU is gone.

> Thanks.
>

Sebastian

Subject: Re: [PATCH] Revert mm/vmstat.c: fix vmstat_update() preemption BUG

On 2018-04-11 16:42:21 [+0200], To Tejun Heo wrote:
> > > So is this perhaps related to the cpu hotplug that [1] mentions? e.g. is
> > > the cpu being hotplugged cpu 1, the worker started too early before
> > > stuff can be scheduled on the CPU, so it has to run on different than
> > > designated CPU?
> > >
> > > [1] https://marc.info/?l=linux-mm&m=152088260625433&w=2
> >
> > The report says that it happens when hotplug is attempted. Per-cpu
> > doesn't pin the cpu alive, so if the cpu goes down while a work item
> > is in flight or a work item is queued while a cpu is offline it'll end
> > up executing on some other cpu. So, if a piece of code doesn't want
> > that happening, it gotta interlock itself - ie. start queueing when
> > the cpu comes online and flush and prevent further queueing when its
> > cpu goes down.
>
> I missed that cpuhotplug part while reading it. So in that case, let me
> add a CPU-hotplug notifier which cancels that work. After all it is not
> need once the CPU is gone.

This already happens:
- vmstat_shepherd() does get_online_cpus() and within this block it does
queue_delayed_work_on(). So this has to wait until cpuhotplug
completed before it can schedule something and then it won't schedule
anything on the "off" CPU.

- The work item itself (vmstat_update()) schedules itself
(conditionally) again.

- vmstat_cpu_down_prep() is the down event and does
cancel_delayed_work_sync(). So it waits for the work-item to complete
and cancels it.

This looks all good to me.

> > Thanks.

Sebastian

Subject: Re: [PATCH] Revert mm/vmstat.c: fix vmstat_update() preemption BUG

ping.
any reason not to accept the revert?

On 2018-04-11 21:07:29 [+0200], To Tejun Heo wrote:
> On 2018-04-11 16:42:21 [+0200], To Tejun Heo wrote:
> > > > So is this perhaps related to the cpu hotplug that [1] mentions? e.g. is
> > > > the cpu being hotplugged cpu 1, the worker started too early before
> > > > stuff can be scheduled on the CPU, so it has to run on different than
> > > > designated CPU?
> > > >
> > > > [1] https://marc.info/?l=linux-mm&m=152088260625433&w=2
> > >
> > > The report says that it happens when hotplug is attempted. Per-cpu
> > > doesn't pin the cpu alive, so if the cpu goes down while a work item
> > > is in flight or a work item is queued while a cpu is offline it'll end
> > > up executing on some other cpu. So, if a piece of code doesn't want
> > > that happening, it gotta interlock itself - ie. start queueing when
> > > the cpu comes online and flush and prevent further queueing when its
> > > cpu goes down.
> >
> > I missed that cpuhotplug part while reading it. So in that case, let me
> > add a CPU-hotplug notifier which cancels that work. After all it is not
> > need once the CPU is gone.
>
> This already happens:
> - vmstat_shepherd() does get_online_cpus() and within this block it does
> queue_delayed_work_on(). So this has to wait until cpuhotplug
> completed before it can schedule something and then it won't schedule
> anything on the "off" CPU.
>
> - The work item itself (vmstat_update()) schedules itself
> (conditionally) again.
>
> - vmstat_cpu_down_prep() is the down event and does
> cancel_delayed_work_sync(). So it waits for the work-item to complete
> and cancels it.
>
> This looks all good to me.
>
> > > Thanks.

Sebastian

2018-04-18 19:55:58

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] Revert mm/vmstat.c: fix vmstat_update() preemption BUG

On Wed, 18 Apr 2018 17:44:36 +0200 Sebastian Andrzej Siewior <[email protected]> wrote:

> On 2018-04-11 21:07:29 [+0200], To Tejun Heo wrote:
> > On 2018-04-11 16:42:21 [+0200], To Tejun Heo wrote:
> > > > > So is this perhaps related to the cpu hotplug that [1] mentions? e.g. is
> > > > > the cpu being hotplugged cpu 1, the worker started too early before
> > > > > stuff can be scheduled on the CPU, so it has to run on different than
> > > > > designated CPU?
> > > > >
> > > > > [1] https://marc.info/?l=linux-mm&m=152088260625433&w=2
> > > >
> > > > The report says that it happens when hotplug is attempted. Per-cpu
> > > > doesn't pin the cpu alive, so if the cpu goes down while a work item
> > > > is in flight or a work item is queued while a cpu is offline it'll end
> > > > up executing on some other cpu. So, if a piece of code doesn't want
> > > > that happening, it gotta interlock itself - ie. start queueing when
> > > > the cpu comes online and flush and prevent further queueing when its
> > > > cpu goes down.
> > >
> > > I missed that cpuhotplug part while reading it. So in that case, let me
> > > add a CPU-hotplug notifier which cancels that work. After all it is not
> > > need once the CPU is gone.
> >
> > This already happens:
> > - vmstat_shepherd() does get_online_cpus() and within this block it does
> > queue_delayed_work_on(). So this has to wait until cpuhotplug
> > completed before it can schedule something and then it won't schedule
> > anything on the "off" CPU.
> >
> > - The work item itself (vmstat_update()) schedules itself
> > (conditionally) again.
> >
> > - vmstat_cpu_down_prep() is the down event and does
> > cancel_delayed_work_sync(). So it waits for the work-item to complete
> > and cancels it.
> >
> > This looks all good to me.
> >
> > > > Thanks.

(top-posting repaired, Please don't do that - how am I supposed to
reply to you while maintaining appropriate context?)

> ping.
> any reason not to accept the revert?
>

That will make the warnings come back. Or was the hotplug issue
addressed by other means? If so, that fix should be referred to in
the changelog.



2018-06-27 12:48:42

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] Revert mm/vmstat.c: fix vmstat_update() preemption BUG

On Wed, Apr 11, 2018 at 09:07:30PM +0200, Sebastian Andrzej Siewior wrote:
>
> This already happens:
> - vmstat_shepherd() does get_online_cpus() and within this block it does
> queue_delayed_work_on(). So this has to wait until cpuhotplug
> completed before it can schedule something and then it won't schedule
> anything on the "off" CPU.

But can't we have something like this happen: ?

CPU0 CPU1 CPU2
---- ---- ----
get_online_cpus()
queue_work(vmstat_update, cpu1)
wakeup(kworker/1)
High prio task running
put_online_cpus()
Shutdown CPU 1
migrate kworker/1
schedule kworker/1
(smp_processor_id() != 1)

-- Steve


Subject: Re: [PATCH] Revert mm/vmstat.c: fix vmstat_update() preemption BUG

On 2018-06-27 08:36:57 [-0400], Steven Rostedt wrote:
> On Wed, Apr 11, 2018 at 09:07:30PM +0200, Sebastian Andrzej Siewior wrote:
> >
> > This already happens:
> > - vmstat_shepherd() does get_online_cpus() and within this block it does
> > queue_delayed_work_on(). So this has to wait until cpuhotplug
> > completed before it can schedule something and then it won't schedule
> > anything on the "off" CPU.
>
> But can't we have something like this happen: ?
>
> CPU0 CPU1 CPU2
> ---- ---- ----
> get_online_cpus()
> queue_work(vmstat_update, cpu1)
> wakeup(kworker/1)
> High prio task running
> put_online_cpus()
> Shutdown CPU 1
> migrate kworker/1
> schedule kworker/1
> (smp_processor_id() != 1)

I don't get CPU1 doing in here. kworker/1 should not be migrated to
begin with. The work item should be done before CPU2 shutdowns CPU1 (or
0) because vmstat_cpu_down_prep() cancels the work while the CPUs goes
down so it should not be migrated. Also, the work is enqueued while
holding the CPU HP lock.

Sebastian