2006-02-16 07:18:21

by Heiko Carstens

[permalink] [raw]
Subject: [patch 2/4] s390: fix preempt_count of idle thread with cpu hotplug

From: Heiko Carstens <[email protected]>

Set preempt_count of idle_thread to zero before switching off cpu.
Otherwise the preempt_count will be wrong if the cpu is switched on again
since the thread will be reused.

Signed-off-by: Heiko Carstens <[email protected]>
---

Looks to me like at least powerpc seems to have the same problem.

arch/s390/kernel/process.c | 4 +++-
1 files changed, 3 insertions(+), 1 deletion(-)

diff -urpN linux-2.6/arch/s390/kernel/process.c linux-2.6-patched/arch/s390/kernel/process.c
--- linux-2.6/arch/s390/kernel/process.c 2006-02-16 07:29:46.000000000 +0100
+++ linux-2.6-patched/arch/s390/kernel/process.c 2006-02-16 07:30:06.000000000 +0100
@@ -128,8 +128,10 @@ void default_idle(void)
__ctl_set_bit(8, 15);

#ifdef CONFIG_HOTPLUG_CPU
- if (cpu_is_offline(cpu))
+ if (cpu_is_offline(cpu)) {
+ preempt_enable_no_resched();
cpu_die();
+ }
#endif

local_mcck_disable();


2006-02-16 18:11:38

by Joel Schopp

[permalink] [raw]
Subject: Re: [patch 2/4] s390: fix preempt_count of idle thread with cpu hotplug

> Set preempt_count of idle_thread to zero before switching off cpu.
> Otherwise the preempt_count will be wrong if the cpu is switched on again
> since the thread will be reused.
>

I had a similar discussion back in November, that one about
/proc/interrupts stats. Rather than do that all over again below is a
cut and paste of my reply to that discussion. The executive summary is
I rather like the current behavior as is.

-------------------------------------------------------------

> When CPU2 is off-lined, the statistics for CPU2 do not appear
>(expected). However when you look at the before picture (all CPUs
>present) and after picture (all cpus present after CPU2 re-added), you
>see that the original data was returned and has incremented:

<snip>

> Results are similar for other interrupt levels.
>
> This does not seem like the correct approach. Shouldn't an
> added CPU be considered new and stats begin again from zero?
> Can you please give us guidance on this issue? Is it
> a bug or expected behavior?

There are several legitimate options here, including the current
behavior. I'm not clear other than kernel developers doing debugging
who consumes this data. If it is just kernel developers I contend we are
capable of understanding whatever data is presented.

Current Behavior:
Stats carry over. The advantage of this is that the stats history is
not lost. Consider the scenario where a cpu is moved regularly between
two partitions in response to workload. It would probably be best in
that situation to keep the full history of that cpu during all the times
it has been in the partition. This view would be consistent with the cpu
always being there, just sometimes being off. I haven't fully thought
through the impact of a different physical cpu being added in the same
logical slot, or how virtualized cpus fits into this paradigm. But I
suspect they shouldn't change the concept.

Reset every time:
The advantage of this is the cpus start with a clean slate. There is
logic to avoiding the situation where some hotplug added cpus have state
already and some hotplug added cpus start with nothing. Consistency is
not a bad thing. On the other hand this is a bit of selective amnesia,
we would be forgetting all past statistics only for cpus that had taken
some time off. It's like taking a vacation from work and having your
old email thrown away while you are gone.

Reset all every time:
It could well be argued that the old cpu statistics from every active
cpu serve only to confuse things because the cpus were in a different
state. If you didn't reset all cpus statistics, then after adding a new
cpu you couldn't say for instance that cpu 5 is taking 45% of the
interrupts. The statistics are meaningless without their context, and
by adding a cpu we have destroyed their context. Therefore we should
reset all statistics so they all have the same frame of reference. The
downside of course is that we lose all history on all cpus.

Personally, I'm fine with the current behavior. It makes the most sense
to me and is the least complex to implement (already done). We maybe
should document it (send a patch in reply to the one Ashok Raj just
posted to lkml). Unless somebody has a real user of cpu statistics that
this behavior breaks.

2006-02-17 07:27:09

by Heiko Carstens

[permalink] [raw]
Subject: Re: [patch 2/4] s390: fix preempt_count of idle thread with cpu hotplug

> >Set preempt_count of idle_thread to zero before switching off cpu.
> >Otherwise the preempt_count will be wrong if the cpu is switched on again
> >since the thread will be reused.
>
> I had a similar discussion back in November, that one about
> /proc/interrupts stats. Rather than do that all over again below is a cut
> and paste of my reply to that discussion. The executive summary is I
> rather like the current behavior as is.
>
> -------------------------------------------------------------
>
> > When CPU2 is off-lined, the statistics for CPU2 do not appear
> >(expected). However when you look at the before picture (all CPUs
> >present) and after picture (all cpus present after CPU2 re-added), you
> >see that the original data was returned and has incremented:

Think we're talking about different things. preempt_count is not statistical
stuff. Actually if you have preempt_count != 0 preemption is disabled.
At least on s390 we didn't set this counter to 0 in case of cpu hotplug thus
ending with a preempt_count > 0 if a cpu gets reenabled. Which gives us a lot
of warnings ala "scheduling while atomic: swapper/0x00000001/0".
So this is not a restriction that should be documented somewhere but a bug.

No idea how powerpc deals with this and if there is the same issue. Just
tried to give a hint since cpu hotplug code for powerpc and s390 looks quite
the same.

Heiko