2013-04-27 09:59:18

by Li Zhong

[permalink] [raw]
Subject: [RFC PATCH]nohz: Use raw_smp_processor_id() in tick_nohz_task_switch()

I saw following error when testing the latest nohz code on Power:

[ 85.295384] BUG: using smp_processor_id() in preemptible [00000000] code: rsyslogd/3493
[ 85.295396] caller is .tick_nohz_task_switch+0x1c/0xb8
[ 85.295402] Call Trace:
[ 85.295408] [c0000001fababab0] [c000000000012dc4] .show_stack+0x110/0x25c (unreliable)
[ 85.295420] [c0000001fababba0] [c0000000007c4b54] .dump_stack+0x20/0x30
[ 85.295430] [c0000001fababc10] [c00000000044eb74] .debug_smp_processor_id+0xf4/0x124
[ 85.295438] [c0000001fababca0] [c0000000000d7594] .tick_nohz_task_switch+0x1c/0xb8
[ 85.295447] [c0000001fababd20] [c0000000000b9748] .finish_task_switch+0x13c/0x160
[ 85.295455] [c0000001fababdb0] [c0000000000bbe50] .schedule_tail+0x50/0x124
[ 85.295463] [c0000001fababe30] [c000000000009dc8] .ret_from_fork+0x4/0x54

It seems to me that we could just use raw_smp_processor_id() here. Even
if the tick_nohz_full_cpu() check is done on a !nohz_full cpu, then the
task is moved to another nohz_full cpu, it seems the context switching
because of the task moving would call tick_nohz_task_switch() again to
evaluate the need for tick.

I don't know whether I missed something here.

Signed-off-by: Li Zhong <[email protected]>
---
kernel/time/tick-sched.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index da53c8f..0aa575b 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -251,7 +251,7 @@ void tick_nohz_task_switch(struct task_struct *tsk)
{
unsigned long flags;

- if (!tick_nohz_full_cpu(smp_processor_id()))
+ if (!tick_nohz_full_cpu(raw_smp_processor_id()))
return;

local_irq_save(flags);
--
1.7.1



2013-04-27 13:40:30

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [RFC PATCH]nohz: Use raw_smp_processor_id() in tick_nohz_task_switch()

2013/4/27 Li Zhong <[email protected]>:
> I saw following error when testing the latest nohz code on Power:
>
> [ 85.295384] BUG: using smp_processor_id() in preemptible [00000000] code: rsyslogd/3493
> [ 85.295396] caller is .tick_nohz_task_switch+0x1c/0xb8
> [ 85.295402] Call Trace:
> [ 85.295408] [c0000001fababab0] [c000000000012dc4] .show_stack+0x110/0x25c (unreliable)
> [ 85.295420] [c0000001fababba0] [c0000000007c4b54] .dump_stack+0x20/0x30
> [ 85.295430] [c0000001fababc10] [c00000000044eb74] .debug_smp_processor_id+0xf4/0x124
> [ 85.295438] [c0000001fababca0] [c0000000000d7594] .tick_nohz_task_switch+0x1c/0xb8
> [ 85.295447] [c0000001fababd20] [c0000000000b9748] .finish_task_switch+0x13c/0x160
> [ 85.295455] [c0000001fababdb0] [c0000000000bbe50] .schedule_tail+0x50/0x124
> [ 85.295463] [c0000001fababe30] [c000000000009dc8] .ret_from_fork+0x4/0x54
>
> It seems to me that we could just use raw_smp_processor_id() here. Even
> if the tick_nohz_full_cpu() check is done on a !nohz_full cpu, then the
> task is moved to another nohz_full cpu, it seems the context switching
> because of the task moving would call tick_nohz_task_switch() again to
> evaluate the need for tick.
>
> I don't know whether I missed something here.

You're right it looks safe to do so. But I suggest we rather move the
test inside local_irq_save()/restore section to avoid confusion on
reviewers minds.

Thanks!

2013-04-28 01:57:00

by Li Zhong

[permalink] [raw]
Subject: Re: [RFC PATCH]nohz: Use raw_smp_processor_id() in tick_nohz_task_switch()

On Sat, 2013-04-27 at 15:40 +0200, Frederic Weisbecker wrote:
> 2013/4/27 Li Zhong <[email protected]>:
> > I saw following error when testing the latest nohz code on Power:
> >
> > [ 85.295384] BUG: using smp_processor_id() in preemptible [00000000] code: rsyslogd/3493
> > [ 85.295396] caller is .tick_nohz_task_switch+0x1c/0xb8
> > [ 85.295402] Call Trace:
> > [ 85.295408] [c0000001fababab0] [c000000000012dc4] .show_stack+0x110/0x25c (unreliable)
> > [ 85.295420] [c0000001fababba0] [c0000000007c4b54] .dump_stack+0x20/0x30
> > [ 85.295430] [c0000001fababc10] [c00000000044eb74] .debug_smp_processor_id+0xf4/0x124
> > [ 85.295438] [c0000001fababca0] [c0000000000d7594] .tick_nohz_task_switch+0x1c/0xb8
> > [ 85.295447] [c0000001fababd20] [c0000000000b9748] .finish_task_switch+0x13c/0x160
> > [ 85.295455] [c0000001fababdb0] [c0000000000bbe50] .schedule_tail+0x50/0x124
> > [ 85.295463] [c0000001fababe30] [c000000000009dc8] .ret_from_fork+0x4/0x54
> >
> > It seems to me that we could just use raw_smp_processor_id() here. Even
> > if the tick_nohz_full_cpu() check is done on a !nohz_full cpu, then the
> > task is moved to another nohz_full cpu, it seems the context switching
> > because of the task moving would call tick_nohz_task_switch() again to
> > evaluate the need for tick.
> >
> > I don't know whether I missed something here.
>
> You're right it looks safe to do so. But I suggest we rather move the
> test inside local_irq_save()/restore section to avoid confusion on
> reviewers minds.

OK, I'll send an updated version, using local_irq_save() to protect it.
I tried using raw_* because seems it could avoid some unnecessary irq
disabling...

Thanks, Zhong

> Thanks!
>

2013-04-28 03:26:12

by Li Zhong

[permalink] [raw]
Subject: [RFC PATCH v2]nohz: protect smp_processor_id() in tick_nohz_task_switch()

I saw following error when testing the latest nohz code on Power:

[ 85.295384] BUG: using smp_processor_id() in preemptible [00000000] code: rsyslogd/3493
[ 85.295396] caller is .tick_nohz_task_switch+0x1c/0xb8
[ 85.295402] Call Trace:
[ 85.295408] [c0000001fababab0] [c000000000012dc4] .show_stack+0x110/0x25c (unreliable)
[ 85.295420] [c0000001fababba0] [c0000000007c4b54] .dump_stack+0x20/0x30
[ 85.295430] [c0000001fababc10] [c00000000044eb74] .debug_smp_processor_id+0xf4/0x124
[ 85.295438] [c0000001fababca0] [c0000000000d7594] .tick_nohz_task_switch+0x1c/0xb8
[ 85.295447] [c0000001fababd20] [c0000000000b9748] .finish_task_switch+0x13c/0x160
[ 85.295455] [c0000001fababdb0] [c0000000000bbe50] .schedule_tail+0x50/0x124
[ 85.295463] [c0000001fababe30] [c000000000009dc8] .ret_from_fork+0x4/0x54

The code below moves the test into local_irq_save/restore section to
avoid the above complaint.

Signed-off-by: Li Zhong <[email protected]>
---
kernel/time/tick-sched.c | 7 ++++---
1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index da53c8f..1c9f53b 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -251,14 +251,15 @@ void tick_nohz_task_switch(struct task_struct *tsk)
{
unsigned long flags;

- if (!tick_nohz_full_cpu(smp_processor_id()))
- return;
-
local_irq_save(flags);

+ if (!tick_nohz_full_cpu(smp_processor_id()))
+ goto out;
+
if (tick_nohz_tick_stopped() && !can_stop_full_tick())
tick_nohz_full_kick();

+out:
local_irq_restore(flags);
}

--
1.7.1

Subject: [tip:timers/nohz] nohz: Protect smp_processor_id() in tick_nohz_task_switch()

Commit-ID: 6296ace467c8640317414ba589b124323806f7ce
Gitweb: http://git.kernel.org/tip/6296ace467c8640317414ba589b124323806f7ce
Author: Li Zhong <[email protected]>
AuthorDate: Sun, 28 Apr 2013 11:25:58 +0800
Committer: Ingo Molnar <[email protected]>
CommitDate: Mon, 29 Apr 2013 13:17:33 +0200

nohz: Protect smp_processor_id() in tick_nohz_task_switch()

I saw following error when testing the latest nohz code on
Power:

[ 85.295384] BUG: using smp_processor_id() in preemptible [00000000] code: rsyslogd/3493
[ 85.295396] caller is .tick_nohz_task_switch+0x1c/0xb8
[ 85.295402] Call Trace:
[ 85.295408] [c0000001fababab0] [c000000000012dc4] .show_stack+0x110/0x25c (unreliable)
[ 85.295420] [c0000001fababba0] [c0000000007c4b54] .dump_stack+0x20/0x30
[ 85.295430] [c0000001fababc10] [c00000000044eb74] .debug_smp_processor_id+0xf4/0x124
[ 85.295438] [c0000001fababca0] [c0000000000d7594] .tick_nohz_task_switch+0x1c/0xb8
[ 85.295447] [c0000001fababd20] [c0000000000b9748] .finish_task_switch+0x13c/0x160
[ 85.295455] [c0000001fababdb0] [c0000000000bbe50] .schedule_tail+0x50/0x124
[ 85.295463] [c0000001fababe30] [c000000000009dc8] .ret_from_fork+0x4/0x54

The code below moves the test into local_irq_save/restore
section to avoid the above complaint.

Signed-off-by: Li Zhong <[email protected]>
Acked-by: Frederic Weisbecker <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Paul McKenney <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/time/tick-sched.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index da53c8f..1c9f53b 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -251,14 +251,15 @@ void tick_nohz_task_switch(struct task_struct *tsk)
{
unsigned long flags;

- if (!tick_nohz_full_cpu(smp_processor_id()))
- return;
-
local_irq_save(flags);

+ if (!tick_nohz_full_cpu(smp_processor_id()))
+ goto out;
+
if (tick_nohz_tick_stopped() && !can_stop_full_tick())
tick_nohz_full_kick();

+out:
local_irq_restore(flags);
}