2008-06-03 22:19:50

by Cliff Wickman

[permalink] [raw]
Subject: Re: [BUG] hotplug cpus on ia64


On Fri, May 30, 2008 at 03:36:54PM +0200, Peter Zijlstra wrote:
> On Thu, 2008-05-29 at 11:32 -0500, Cliff Wickman wrote:
> > >> I built an ia64 kernel from Andrew's tree (2.6.26-rc2-mm1)
> > >> and get a very predictable hotplug cpu problem.
> > >> billberry1:/tmp/cpw # ./dis
> > >> disabled cpu 17
> > >> enabled cpu 17
> > >> billberry1:/tmp/cpw # ./dis
> > >> disabled cpu 17
> > >> enabled cpu 17
> > >> billberry1:/tmp/cpw # ./dis
> > >>
> > >> The script that disables the cpu always hangs (unkillable)
> > >> on the 3rd attempt.
> >
> > > And a bit further:
> > > The kstopmachine thread always sits on the run queue (real time) for about
> > > 30 minutes before running.
> >
> > And a bit further:
> >
> > The kstopmachine thread is queued as real-time on the downed cpu:
> > >> rq -f 17
> > CPU# runq address size Lock current task time name
> > ==========================================================================
> > 17 0xe000046003059540 3 U 0xe0000360f06f8000 0 swapper
> > Total of 3 queued:
> > 3 real time tasks: px *(rt_rq *)0xe000046003059608
> > exclusive queue:
> > slot 0
> > 0xe0000760f4628000 0 migration/17
> > 0xe0000760f4708000 0 kstopmachine
> > 0xe0000760f6678000 0 watchdog/17
> >
> > I put in counters and see that schedule() is never again entered by cpu 17
> > after it is downed the 3rd time.
> > (it is entered after being up'd the first two times)
> >
> > The kstopmachine thread is bound to cpu 17 by __stop_machine_run()'s call
> > to kthread_bind().
> >
> > A cpu does not schedule after being downed, of course. But it does again
> > after being up'd.
> > Why would the second up be different? Following it, if the cpu is
> > downed it never schedules again.
> >
> > If I always bind kstopmachine to cpu 0 the problem disappears.
>
> does:
>
> echo -1 > /proc/sys/kernel/sched_rt_runtime_us
>
> fix the problem?

Yes! It does.

Dimitri Sivanich has run into what looks like a similar problem.
Hope the above workaround is a good clue to its solution.

--
Cliff Wickman
Silicon Graphics, Inc.
[email protected]
(651) 683-3824


2008-06-04 13:50:33

by Dimitri Sivanich

[permalink] [raw]
Subject: Re: [BUG] hotplug cpus on ia64

On Tue, Jun 03, 2008 at 05:17:59PM -0500, Cliff Wickman wrote:
>
> On Fri, May 30, 2008 at 03:36:54PM +0200, Peter Zijlstra wrote:
> > On Thu, 2008-05-29 at 11:32 -0500, Cliff Wickman wrote:
> > > I put in counters and see that schedule() is never again entered by cpu 17
> > > after it is downed the 3rd time.
> > > (it is entered after being up'd the first two times)
> > >
> > > The kstopmachine thread is bound to cpu 17 by __stop_machine_run()'s call
> > > to kthread_bind().
> > >
> > > A cpu does not schedule after being downed, of course. But it does again
> > > after being up'd.
> > > Why would the second up be different? Following it, if the cpu is
> > > downed it never schedules again.
> > >
> > > If I always bind kstopmachine to cpu 0 the problem disappears.
> >
> > does:
> >
> > echo -1 > /proc/sys/kernel/sched_rt_runtime_us
> >
> > fix the problem?
>
> Yes! It does.
>
> Dimitri Sivanich has run into what looks like a similar problem.
> Hope the above workaround is a good clue to its solution.

This fixes the problem I was seeing as well.

2008-06-05 12:50:36

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [BUG] hotplug cpus on ia64

On Tue, 2008-06-03 at 17:17 -0500, Cliff Wickman wrote:
> On Fri, May 30, 2008 at 03:36:54PM +0200, Peter Zijlstra wrote:
> > On Thu, 2008-05-29 at 11:32 -0500, Cliff Wickman wrote:
> > > >> I built an ia64 kernel from Andrew's tree (2.6.26-rc2-mm1)
> > > >> and get a very predictable hotplug cpu problem.
> > > >> billberry1:/tmp/cpw # ./dis
> > > >> disabled cpu 17
> > > >> enabled cpu 17
> > > >> billberry1:/tmp/cpw # ./dis
> > > >> disabled cpu 17
> > > >> enabled cpu 17
> > > >> billberry1:/tmp/cpw # ./dis
> > > >>
> > > >> The script that disables the cpu always hangs (unkillable)
> > > >> on the 3rd attempt.
> > >
> > > > And a bit further:
> > > > The kstopmachine thread always sits on the run queue (real time) for about
> > > > 30 minutes before running.
> > >
> > > And a bit further:
> > >
> > > The kstopmachine thread is queued as real-time on the downed cpu:
> > > >> rq -f 17
> > > CPU# runq address size Lock current task time name
> > > ==========================================================================
> > > 17 0xe000046003059540 3 U 0xe0000360f06f8000 0 swapper
> > > Total of 3 queued:
> > > 3 real time tasks: px *(rt_rq *)0xe000046003059608
> > > exclusive queue:
> > > slot 0
> > > 0xe0000760f4628000 0 migration/17
> > > 0xe0000760f4708000 0 kstopmachine
> > > 0xe0000760f6678000 0 watchdog/17
> > >
> > > I put in counters and see that schedule() is never again entered by cpu 17
> > > after it is downed the 3rd time.
> > > (it is entered after being up'd the first two times)
> > >
> > > The kstopmachine thread is bound to cpu 17 by __stop_machine_run()'s call
> > > to kthread_bind().
> > >
> > > A cpu does not schedule after being downed, of course. But it does again
> > > after being up'd.
> > > Why would the second up be different? Following it, if the cpu is
> > > downed it never schedules again.
> > >
> > > If I always bind kstopmachine to cpu 0 the problem disappears.
> >
> > does:
> >
> > echo -1 > /proc/sys/kernel/sched_rt_runtime_us
> >
> > fix the problem?
>
> Yes! It does.
>
> Dimitri Sivanich has run into what looks like a similar problem.
> Hope the above workaround is a good clue to its solution.


Does the below fix it?


Signed-off-by: Peter Zijlstra <[email protected]>
---
kernel/sched.c | 15 +++++--
kernel/sched_rt.c | 109 +++++++++++++++++++++++++++++++++++++++++++++++++++---
2 files changed, 115 insertions(+), 9 deletions(-)

Index: linux-2.6/kernel/sched_rt.c
===================================================================
--- linux-2.6.orig/kernel/sched_rt.c
+++ linux-2.6/kernel/sched_rt.c
@@ -280,6 +280,9 @@ static int balance_runtime(struct rt_rq
continue;

spin_lock(&iter->rt_runtime_lock);
+ if (iter->rt_runtime == RUNTIME_INF)
+ goto next;
+
diff = iter->rt_runtime - iter->rt_time;
if (diff > 0) {
do_div(diff, weight);
@@ -293,12 +296,105 @@ static int balance_runtime(struct rt_rq
break;
}
}
+next:
spin_unlock(&iter->rt_runtime_lock);
}
spin_unlock(&rt_b->rt_runtime_lock);

return more;
}
+
+static void __disable_runtime(struct rq *rq)
+{
+ struct root_domain *rd = rq->rd;
+ struct rt_rq *rt_rq;
+
+ if (unlikely(!scheduler_running))
+ return;
+
+ for_each_leaf_rt_rq(rt_rq, rq) {
+ struct rt_bandwidth *rt_b = sched_rt_bandwidth(rt_rq);
+ s64 want;
+ int i;
+
+ spin_lock(&rt_b->rt_runtime_lock);
+ spin_lock(&rt_rq->rt_runtime_lock);
+ if (rt_rq->rt_runtime == RUNTIME_INF ||
+ rt_rq->rt_runtime == rt_b->rt_runtime)
+ goto balanced;
+ spin_unlock(&rt_rq->rt_runtime_lock);
+
+ want = rt_b->rt_runtime - rt_rq->rt_runtime;
+
+ for_each_cpu_mask(i, rd->span) {
+ struct rt_rq *iter = sched_rt_period_rt_rq(rt_b, i);
+ s64 diff;
+
+ if (iter == rt_rq)
+ continue;
+
+ spin_lock(&iter->rt_runtime_lock);
+ if (want > 0) {
+ diff = min_t(s64, iter->rt_runtime, want);
+ iter->rt_runtime -= diff;
+ want -= diff;
+ } else {
+ iter->rt_runtime -= want;
+ want -= want;
+ }
+ spin_unlock(&iter->rt_runtime_lock);
+
+ if (!want)
+ break;
+ }
+
+ spin_lock(&rt_rq->rt_runtime_lock);
+ BUG_ON(want);
+balanced:
+ rt_rq->rt_runtime = RUNTIME_INF;
+ spin_unlock(&rt_rq->rt_runtime_lock);
+ spin_unlock(&rt_b->rt_runtime_lock);
+ }
+}
+
+static void disable_runtime(struct rq *rq)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&rq->lock, flags);
+ __disable_runtime(rq);
+ spin_unlock_irqrestore(&rq->lock, flags);
+}
+
+static void __enable_runtime(struct rq *rq)
+{
+ struct root_domain *rd = rq->rd;
+ struct rt_rq *rt_rq;
+
+ if (unlikely(!scheduler_running))
+ return;
+
+ for_each_leaf_rt_rq(rt_rq, rq) {
+ struct rt_bandwidth *rt_b = sched_rt_bandwidth(rt_rq);
+
+ spin_lock(&rt_b->rt_runtime_lock);
+ spin_lock(&rt_rq->rt_runtime_lock);
+ rt_rq->rt_runtime = rt_b->rt_runtime;
+ rt_rq->rt_time = 0;
+ spin_unlock(&rt_rq->rt_runtime_lock);
+ spin_unlock(&rt_b->rt_runtime_lock);
+ }
+}
+
+static void enable_runtime(struct rq *rq)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&rq->lock, flags);
+ __enable_runtime(rq);
+ spin_unlock_irqrestore(&rq->lock, flags);
+}
+
#endif

static inline int rt_se_prio(struct sched_rt_entity *rt_se)
@@ -328,14 +424,13 @@ static int sched_rt_runtime_exceeded(str

#ifdef CONFIG_SMP
if (rt_rq->rt_time > runtime) {
- int more;
-
spin_unlock(&rt_rq->rt_runtime_lock);
- more = balance_runtime(rt_rq);
+ balance_runtime(rt_rq);
spin_lock(&rt_rq->rt_runtime_lock);

- if (more)
- runtime = sched_rt_runtime(rt_rq);
+ runtime = sched_rt_runtime(rt_rq);
+ if (runtime == RUNTIME_INF)
+ return 0;
}
#endif

@@ -1157,6 +1252,8 @@ static void join_domain_rt(struct rq *rq
{
if (rq->rt.overloaded)
rt_set_overload(rq);
+
+ __enable_runtime(rq);
}

/* Assumes rq->lock is held */
@@ -1164,6 +1261,8 @@ static void leave_domain_rt(struct rq *r
{
if (rq->rt.overloaded)
rt_clear_overload(rq);
+
+ __disable_runtime(rq);
}

/*
Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -7455,20 +7455,27 @@ int sched_create_sysfs_power_savings_ent
static int update_sched_domains(struct notifier_block *nfb,
unsigned long action, void *hcpu)
{
+ int cpu = (int)(long)hcpu;
+
switch (action) {
- case CPU_UP_PREPARE:
- case CPU_UP_PREPARE_FROZEN:
case CPU_DOWN_PREPARE:
case CPU_DOWN_PREPARE_FROZEN:
+ disable_runtime(cpu_rq(cpu));
+ /* fall-through */
+ case CPU_UP_PREPARE:
+ case CPU_UP_PREPARE_FROZEN:
detach_destroy_domains(&cpu_online_map);
return NOTIFY_OK;

- case CPU_UP_CANCELED:
- case CPU_UP_CANCELED_FROZEN:
+
case CPU_DOWN_FAILED:
case CPU_DOWN_FAILED_FROZEN:
case CPU_ONLINE:
case CPU_ONLINE_FROZEN:
+ enable_runtime(cpu_rq(cpu));
+ /* fall-through */
+ case CPU_UP_CANCELED:
+ case CPU_UP_CANCELED_FROZEN:
case CPU_DEAD:
case CPU_DEAD_FROZEN:
/*

2008-06-05 14:01:18

by Dimitri Sivanich

[permalink] [raw]
Subject: Re: [BUG] hotplug cpus on ia64

On Thu, Jun 05, 2008 at 02:49:58PM +0200, Peter Zijlstra wrote:
>
> Does the below fix it?
>

I don't believe so:


:~ # taskset -p $$
pid 4502's current affinity mask: 8
:~ # cd /sys/devices/system
:/sys/devices/system # cd cpu
:/sys/devices/system/cpu # cd cpu2
:/sys/devices/system/cpu/cpu2 # cat online
1
:/sys/devices/system/cpu/cpu2 # echo 0 >online
:/sys/devices/system/cpu/cpu2 # taskset -p $$
pid 4502's current affinity mask: 8
:/sys/devices/system/cpu/cpu2 # taskset -cp 0-3 $$
pid 4502's current affinity list: 3
pid 4502's new affinity list: 0,1,3
:/sys/devices/system/cpu/cpu2 # taskset -p $$
pid 4502's current affinity mask: b
:/sys/devices/system/cpu/cpu2 # echo 1 >online
:/sys/devices/system/cpu/cpu2 # taskset -p $$
pid 4502's current affinity mask: b
:/sys/devices/system/cpu/cpu2 # taskset -cp 0-3 $$
pid 4502's current affinity list: 0,1,3
pid 4502's new affinity list: 0-3
:/sys/devices/system/cpu/cpu2 # taskset -p $$
pid 4502's current affinity mask: f
:/sys/devices/system/cpu/cpu2 # echo 0 >online
:/sys/devices/system/cpu/cpu2 # taskset -p $$
pid 4502's current affinity mask: b
:/sys/devices/system/cpu/cpu2 # echo 1 >online
:/sys/devices/system/cpu/cpu2 # taskset -p $$
(above command now hangs)


(ps output)
0xe0000060b5650000 4502 4349 0 2 S 0xe0000060b5650390 bash
0xe0000060b8da0000 4843 4502 0 2 D 0xe0000060b8da0390 bash


Stack traceback for pid 4843
0xe0000060b8da0000 4843 4502 0 2 D 0xe0000060b8da0390 bash
0xa0000001007d44b0 schedule+0x1210
args (0xe0000060ba470ce4, 0xa000000100dae190, 0xe000006003129200, 0xa000000100084b70, 0x48c, 0xe0000060b8dafda8, 0xe000006003129200, 0x200, 0xe0000060f780fe80)
0xa0000001007d4ac0 schedule_timeout+0x40
args (0x7fffffffffffffff, 0x0, 0x0, 0xa0000001007d2f00, 0x309, 0xe000006003129200)
0xa0000001007d2f00 wait_for_common+0x240
args (0xe0000060b8dafe08, 0x7fffffffffffffff, 0x2, 0xa0000001007d3280, 0x207, 0xe0000060ba470070)
0xa0000001007d3280 wait_for_completion+0x40
args (0xe0000060b8dafe08, 0xa00000010008d990, 0x38a, 0xffffffffffff9200)
0xa00000010008d990 sched_exec+0x1b0
args (0x2, 0xe0000060ba470000, 0xe0000060ba470010, 0xe000006003129200, 0xa00000010017e980, 0x58e, 0xa00000010017dce0)
0xa00000010017e980 do_execve+0xa0
args (0xe0000060f39e5000, 0x60000000000394b0, 0x6000000000056150, 0xe0000060b8dafe40, 0xe0000060f799f100, 0xe0000060f799bb00, 0xe0000060f799bbd8, 0x60000000000620b1, 0xa000000100013940)
0xa000000100013940 sys_execve+0x60
args (0xe0000060f39e5000, 0xe0000060f39e5000, 0x6000000000056150, 0xe0000060b8dafe40, 0xa00000010000a270, 0x50e, 0x2000000000028490)
0xa00000010000a270 ia64_execve+0x30
args (0x60000000000620a0, 0x60000000000394b0, 0x6000000000056150, 0x0, 0xc00000000000058e, 0x400000000003d020, 0x60000000000394b0, 0x0, 0xa00000010000aba0)
0xa00000010000aba0 ia64_ret_from_syscall
args (0x60000000000620a0, 0x60000000000394b0, 0x6000000000056150, 0x0, 0xc00000000000058e, 0x400000000003d020, 0x60000000000394b0, 0x0)
0xa000000000010720 __kernel_syscall_via_break
args (0x60000000000620a0, 0x60000000000394b0, 0x6000000000056150, 0x0, 0xc00000000000058e, 0x400000000003d020, 0x60000000000394b0, 0x0)

2008-06-05 14:18:52

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [BUG] hotplug cpus on ia64

On Thu, 2008-06-05 at 08:51 -0500, Dimitri Sivanich wrote:
> On Thu, Jun 05, 2008 at 02:49:58PM +0200, Peter Zijlstra wrote:
> >
> > Does the below fix it?
> >
>
> I don't believe so:

Humpfh :-( I'll continue looking then...

Thanks for testing.

2008-06-10 10:20:10

by Ingo Molnar

[permalink] [raw]
Subject: Re: [BUG] hotplug cpus on ia64


* Peter Zijlstra <[email protected]> wrote:

> > > does:
> > >
> > > echo -1 > /proc/sys/kernel/sched_rt_runtime_us
> > >
> > > fix the problem?
> >
> > Yes! It does.
> >
> > Dimitri Sivanich has run into what looks like a similar problem.
> > Hope the above workaround is a good clue to its solution.
>
> Does the below fix it?

while it's not the full fix i've applied it to tip/sched-devel for more
testing. Thanks,

Ingo