by Sasha Levin

[permalink] [raw]

Subject: Re: vmstat: On demand vmstat workers V8

On 07/28/2014 05:54 PM, Andrew Morton wrote:
> Also, Sasha's report showed this:
>
> [ 490.464613] kernel BUG at mm/vmstat.c:1278!
>
> That's your VM_BUG_ON() in vmstat_update(). That ain't no false
> positive!
>
>
>
> Is this code expecting that schedule_delayed_work() will schedule the
> work on the current CPU? I don't think it will do that. Maybe you
> should be looking at schedule_delayed_work_on().

I suspected that the re-queue might be wrong (schedule_delayed_work vs
schedule_delayed_work_on) and tried fixing that, but that didn't solve
the issue.

I could give it another go unless someone sees the issue, I might have
messed something up.

Thanks,
Sasha

2014-07-29 07:56:53

On Wed, 30 Jul 2014, Lai Jiangshan wrote:

> I think the bug is here, it re-queues the per_cpu(vmstat_work, cpu) which is offline
> (after vmstat_cpuup_callback(CPU_DOWN_PREPARE). And cpu_stat_off is accessed without
> proper lock.

Ok. I guess we need to make the preemption check output more information
so that it tells us that an operation was performed on a processor that is
down?

> I suggest to use get_cpu_online() or a new cpu_stat_off_mutex to protect it.

If a processor is downed then cpu_stat_off bit should be cleared but also
the worker thread should not run.

> > case CPU_DOWN_PREPARE:
> > case CPU_DOWN_PREPARE_FROZEN:
> > - cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
> > - per_cpu(vmstat_work, cpu).work.func = NULL;
> > + if (!cpumask_test_and_set_cpu(cpu, cpu_stat_off))
> > + cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
>
> It is suggest that cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu)) should
> be called unconditionally. And the cpu should be cleared from cpu_stat_off.
> (you set it, it is BUG according to vmstat_shepherd() and the semantics of the
> cpu_stat_off).

True.

Subject: vmstat ondemand: Fix online/offline races

Do not allow onlining/offlining while the shepherd task is checking
for vmstat threads.

On offlining a processor do the right thing cancelling the vmstat
worker thread if it exista and also exclude it from the shepherd
process checks.

Signed-off-by: Christoph Lameter <[email protected]>

Index: linux/mm/vmstat.c
===================================================================
--- linux.orig/mm/vmstat.c 2014-07-30 09:35:54.602662306 -0500
+++ linux/mm/vmstat.c 2014-07-30 09:43:07.109037043 -0500
@@ -1317,6 +1317,7 @@ static void vmstat_shepherd(struct work_
{
int cpu;

+ get_online_cpus();
/* Check processors whose vmstat worker threads have been disabled */
for_each_cpu(cpu, cpu_stat_off)
if (need_update(cpu) &&
@@ -1325,6 +1326,7 @@ static void vmstat_shepherd(struct work_
schedule_delayed_work_on(cpu, &per_cpu(vmstat_work, cpu),
__round_jiffies_relative(sysctl_stat_interval, cpu));

+ put_online_cpus();

schedule_delayed_work(&shepherd,
round_jiffies_relative(sysctl_stat_interval));
@@ -1380,8 +1382,8 @@ static int vmstat_cpuup_callback(struct
break;
case CPU_DOWN_PREPARE:
case CPU_DOWN_PREPARE_FROZEN:
- if (!cpumask_test_and_set_cpu(cpu, cpu_stat_off))
- cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
+ cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
+ cpumask_clear_cpu(cpu, cpu_stat_off);
break;
case CPU_DOWN_FAILED:
case CPU_DOWN_FAILED_FROZEN:

2014-07-31 00:51:05

by Lai Jiangshan

[permalink] [raw]

Subject: Re: vmstat: On demand vmstat workers V8

On 07/30/2014 10:45 PM, Christoph Lameter wrote:
> On Wed, 30 Jul 2014, Lai Jiangshan wrote:
>
>> I think the bug is here, it re-queues the per_cpu(vmstat_work, cpu) which is offline
>> (after vmstat_cpuup_callback(CPU_DOWN_PREPARE). And cpu_stat_off is accessed without
>> proper lock.
>
> Ok. I guess we need to make the preemption check output more information
> so that it tells us that an operation was performed on a processor that is
> down?

If the cpu_allows of the percpu-kworker is changed, the specific processor of the kworker
should have been down if workqueue is implemented correctly.
(the preemption check checks the cpu_allows)
>
>> I suggest to use get_cpu_online() or a new cpu_stat_off_mutex to protect it.
>
> If a processor is downed then cpu_stat_off bit should be cleared but also
> the worker thread should not run.

The kworker need to run for some reasons after the processor is down.
Peter and TJ were just discussing it.

The root cause (TO ME only) here is vmstat queues work to offline (or offlining) CPU,
so the kworker has to run it. We may add some check for queuing on offline CPU,
but we can't check for higher level user guarantees. (Example, vmstat can't queue
work to a CPU which is still online but after vmstat_cpuup_callback(CPU_DOWN_PREPARE)).

>
>>> case CPU_DOWN_PREPARE:
>>> case CPU_DOWN_PREPARE_FROZEN:
>>> - cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
>>> - per_cpu(vmstat_work, cpu).work.func = NULL;
>>> + if (!cpumask_test_and_set_cpu(cpu, cpu_stat_off))
>>> + cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
>>
>> It is suggest that cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu)) should
>> be called unconditionally. And the cpu should be cleared from cpu_stat_off.
>> (you set it, it is BUG according to vmstat_shepherd() and the semantics of the
>> cpu_stat_off).
>
> True.
>
> Subject: vmstat ondemand: Fix online/offline races
>
> Do not allow onlining/offlining while the shepherd task is checking
> for vmstat threads.
>
> On offlining a processor do the right thing cancelling the vmstat
> worker thread if it exista and also exclude it from the shepherd
> process checks.
>
> Signed-off-by: Christoph Lameter <[email protected]>
>
> Index: linux/mm/vmstat.c
> ===================================================================
> --- linux.orig/mm/vmstat.c 2014-07-30 09:35:54.602662306 -0500
> +++ linux/mm/vmstat.c 2014-07-30 09:43:07.109037043 -0500
> @@ -1317,6 +1317,7 @@ static void vmstat_shepherd(struct work_
> {
> int cpu;
>
> + get_online_cpus();
> /* Check processors whose vmstat worker threads have been disabled */
> for_each_cpu(cpu, cpu_stat_off)
> if (need_update(cpu) &&
> @@ -1325,6 +1326,7 @@ static void vmstat_shepherd(struct work_
> schedule_delayed_work_on(cpu, &per_cpu(vmstat_work, cpu),
> __round_jiffies_relative(sysctl_stat_interval, cpu));
>
> + put_online_cpus();
>
> schedule_delayed_work(&shepherd,
> round_jiffies_relative(sysctl_stat_interval));
> @@ -1380,8 +1382,8 @@ static int vmstat_cpuup_callback(struct
> break;
> case CPU_DOWN_PREPARE:
> case CPU_DOWN_PREPARE_FROZEN:
> - if (!cpumask_test_and_set_cpu(cpu, cpu_stat_off))
> - cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
> + cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
> + cpumask_clear_cpu(cpu, cpu_stat_off);

Sasha Levin's test result?

> break;
> case CPU_DOWN_FAILED:
> case CPU_DOWN_FAILED_FROZEN:
> .
>