2002-03-25 02:01:49

by Andrew Morton

[permalink] [raw]
Subject: preempt-related hangs

I sent this email to Ingo last week; seems that he's
having some downtime. It was happening on my dual PIII
and I now discover that the quad pIII does the same
thing. Any ideas?



Kernel is 2.5.7, dual PIII. When I enable preempt it
locks during boot.

I applied the kgdb patch and had a poke.

(gdb) info threads
* 6 Thread 6 preempt_schedule () at sched.c:848
5 Thread 5 preempt_schedule () at sched.c:848
4 Thread 4 context_thread (startup=0xc0395f90) at context.c:101
3 Thread 3 migration_thread (unused=0x0) at sched.c:1646
2 Thread 2 migration_thread (unused=0x0) at sched.c:1646
1 Thread 1 spawn_ksoftirqd () at softirq.c:407

Note that init is stuck in spawn_ksoftirqd. It's spinning in
that function, yielding, waiting for the softirqd threads to
come alive. They're threads 5 and 6.

(gdb) thread 6
[Switching to thread 6 (Thread 6)]#0 preempt_schedule () at sched.c:848
848 }
(gdb) bt
#0 preempt_schedule () at sched.c:848
#1 0xc0117f77 in try_to_wake_up (p=0xefef4dc0) at sched.c:179
#2 0xc0117f8d in wake_up_process (p=0xefef68c0) at sched.c:347
#3 0xc011ad70 in set_cpus_allowed (p=0xefef68c0, new_mask=2) at sched.c:1583
#4 0xc01247fe in ksoftirqd (__bind_cpu=0x1) at softirq.c:371
#5 0xc010586b in kernel_thread (fn=0xc038aa97 <spawn_ksoftirqd+71>, arg=0x10, flags=582)
at process.c:501
#6 0xffffffff in ?? ()


So ksoftirqd has entered set_cpus_allowed(), and has tried to wake
migration_thread.

try_to_wake_up() has called task_rq_unlock(rq, &flags); and
task_rq_unlock() has done preempt_enable(). Game over at that
point. Looks like ksoftirqd has scheduled away on that preempt_enable
and is never coming back.

(gdb) info registers
eax 0xefef68c0 -269522752
ecx 0xc03d1540 -1069738688
edx 0x0 0
ebx 0xefed8000 -269647872
esp 0xefed9f18 0xefed9f18
...
(gdb) p *((struct thread_info *)0xefed8000)->task
$8 = {state = 0, thread_info = 0xefed8000, usage = {counter = 1}, flags = 64,
ptrace = 0, lock_depth = -1, prio = 139, static_prio = 139, run_list = {
next = 0xc03d239c, prev = 0xc03d239c}, array = 0xc03d1f2c, sleep_avg = 0,
sleep_timestamp = 117, policy = 0, cpus_allowed = 2, time_slice = 8, tasks = {
next = 0xc0360580, prev = 0xefef6240}, mm = 0x0, active_mm = 0x0, local_pages = {
next = 0xefef6910, prev = 0xefef6910}, allocation_order = 0, nr_local_pages = 0,
binfmt = 0x0, exit_code = 0, exit_signal = 0, pdeath_signal = 0, personality = 0,
did_exec = 0, pid = 6, pgrp = 1, tty_old_pgrp = 0, session = 1, tgid = 0, leader = 0,
real_parent = 0xefef4700, parent = 0xefef4700, children = {next = 0xefef6958,
prev = 0xefef6958}, sibling = {next = 0xefef4798, prev = 0xefef62a0},
thread_group = {next = 0xefef62a8, prev = 0xefef47a8}, pidhash_next = 0x0,
pidhash_pprev = 0xc03e4d98, wait_chldexit = {lock = {lock = 1, magic = 3735899821},
task_list = {next = 0xefef6980, prev = 0xefef6980}}, vfork_done = 0x0,
rt_priority = 0, it_real_value = 0, it_prof_value = 0, it_virt_value = 0,
it_real_incr = 0, it_prof_incr = 0, it_virt_incr = 0, real_timer = {list = {
next = 0x0, prev = 0x0}, expires = 0, data = 4025444544,
function = 0xc0123548 <it_real_fn>}, times = {tms_utime = 0, tms_stime = 0,
tms_cutime = 0, tms_cstime = 0}, start_time = 116, per_cpu_utime = {
0 <repeats 32 times>}, per_cpu_stime = {0 <repeats 32 times>}, min_flt = 0,
maj_flt = 0, nswap = 0, cmin_flt = 0, cmaj_flt = 0, cnswap = 0, swappable = -1,
uid = 0, euid = 0, suid = 0, fsuid = 0, gid = 0, egid = 0, sgid = 0, fsgid = 0,
ngroups = 0, groups = {0 <repeats 32 times>}, cap_effective = 4294967039,
cap_inheritable = 0, cap_permitted = 4294967295, keep_capabilities = 0,
user = 0xc0363a8c, rlim = {{rlim_cur = 4294967295, rlim_max = 4294967295}, {
...

It's in state TASK_RUNNING.

It's 100% reproducible. Happens with gcc-2.91.66 and gcc-3.0.3.
Can diagnose further if needed.

-


2002-03-25 02:13:51

by Andrew Morton

[permalink] [raw]
Subject: Re: preempt-related hangs

Andrew Morton wrote:
>
> ..
> Kernel is 2.5.7, dual PIII. When I enable preempt it
> locks during boot.

OK, this patch fixed it. I don't know why.


--- linux-2.5.7/kernel/sched.c Mon Mar 18 13:04:41 2002
+++ 25/kernel/sched.c Sun Mar 24 18:09:09 2002
@@ -1545,6 +1545,8 @@ void set_cpus_allowed(task_t *p, unsigne
migration_req_t req;
runqueue_t *rq;

+ preempt_disable();
+
new_mask &= cpu_online_map;
if (!new_mask)
BUG();
@@ -1557,7 +1559,7 @@ void set_cpus_allowed(task_t *p, unsigne
*/
if (new_mask & (1UL << p->thread_info->cpu)) {
task_rq_unlock(rq, &flags);
- return;
+ goto out;
}

init_MUTEX_LOCKED(&req.sem);
@@ -1567,6 +1569,8 @@ void set_cpus_allowed(task_t *p, unsigne
wake_up_process(rq->migration_thread);

down(&req.sem);
+out:
+ preempt_disable();
}

static volatile unsigned long migration_mask;


-

2002-03-25 02:30:37

by Robert Love

[permalink] [raw]
Subject: Re: preempt-related hangs

On Sun, 2002-03-24 at 21:11, Andrew Morton wrote:

> OK, this patch fixed it. I don't know why.

Eh, odd. That effectively disables kernel preemption around
set_cpus_allowed, but preemption is already disabled by the task_rq_lock
call. Note, however, preemption is enabled by the task_rq_unlock and
thus wake_up_process is called with preemption enabled.

With your patch, preemption is now disabled across the call, and
subsequently the task_rq_unlock in try_to_wake_up will never call
preempt_schedule and your lock does not happen.

The actual problem may be elsewhere, and this just hides it. This is
pretty clear, since we would get a similar effect just wrapping
wake_up_process in preempt_disable. But, oh, try_to_wake_up disables
preempt, too ... hrm.

Hm, what if try_to_wake_up wakes up a process and then preemptively
schedules into it and it wants to acquire the req.sem semaphore, but
cannot, as it is still taken by set_cpus_allowed? The semaphore seems
to just be used in the migration code.

So we have init spinning on softirq threads to come up and then we have
a deadlock on req.sem from set_cpus_allowed and into the migration
thread?

Bleh ... Ingo?

Robert Love


2002-03-25 02:34:17

by Anton Altaparmakov

[permalink] [raw]
Subject: Re: preempt-related hangs

At 02:11 25/03/02, Andrew Morton wrote:
>Andrew Morton wrote:
> >
> > ..
> > Kernel is 2.5.7, dual PIII. When I enable preempt it
> > locks during boot.
>
>OK, this patch fixed it. I don't know why.

Er, because you disable preemption twice and it never gets enabled again? (-:

You probably meant that to be preemt_enable() at the bottom of the patch...
That might not solve your problem of course... But with the patch you
basically have completely disabled preemption, you might as well not
configure it into the kernel. (-;

Anton



>--- linux-2.5.7/kernel/sched.c Mon Mar 18 13:04:41 2002
>+++ 25/kernel/sched.c Sun Mar 24 18:09:09 2002
>@@ -1545,6 +1545,8 @@ void set_cpus_allowed(task_t *p, unsigne
> migration_req_t req;
> runqueue_t *rq;
>
>+ preempt_disable();
>+
> new_mask &= cpu_online_map;
> if (!new_mask)
> BUG();
>@@ -1557,7 +1559,7 @@ void set_cpus_allowed(task_t *p, unsigne
> */
> if (new_mask & (1UL << p->thread_info->cpu)) {
> task_rq_unlock(rq, &flags);
>- return;
>+ goto out;
> }
>
> init_MUTEX_LOCKED(&req.sem);
>@@ -1567,6 +1569,8 @@ void set_cpus_allowed(task_t *p, unsigne
> wake_up_process(rq->migration_thread);
>
> down(&req.sem);
>+out:
>+ preempt_disable();
> }
>
> static volatile unsigned long migration_mask;
>
>
>-
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/

--
"I've not lost my mind. It's backed up on tape somewhere." - Unknown
--
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Linux NTFS Maintainer / WWW: http://linux-ntfs.sf.net/
ICQ: 8561279 / WWW: http://www-stu.christs.cam.ac.uk/~aia21/

2002-03-25 02:40:27

by Robert Love

[permalink] [raw]
Subject: Re: preempt-related hangs

On Sun, 2002-03-24 at 21:33, Anton Altaparmakov wrote:

> Er, because you disable preemption twice and it never gets enabled again? (-:
>
> You probably meant that to be preemt_enable() at the bottom of the patch...
> That might not solve your problem of course... But with the patch you
> basically have completely disabled preemption, you might as well not
> configure it into the kernel. (-;

Crap - good eye Anton. What does it do now, Andrew?

Robert Love

2002-03-25 02:50:27

by Andrew Morton

[permalink] [raw]
Subject: Re: preempt-related hangs

Anton Altaparmakov wrote:
>
> At 02:11 25/03/02, Andrew Morton wrote:
> >Andrew Morton wrote:
> > >
> > > ..
> > > Kernel is 2.5.7, dual PIII. When I enable preempt it
> > > locks during boot.
> >
> >OK, this patch fixed it. I don't know why.
>
> Er, because you disable preemption twice and it never gets enabled again? (-:
>
> You probably meant that to be preemt_enable() at the bottom of the patch...
> That might not solve your problem of course... But with the patch you
> basically have completely disabled preemption, you might as well not
> configure it into the kernel. (-;

Yeah I know. Sheesh. I don't even have time to test the fix
before you're on my act :)

Fixed-up workaround with a little debug check is below.

I think Robert's right - the problem is more likely to lie
with the migration thread handoff thingy.

--- 2.5.7/kernel/sched.c~preempt-lockup Sun Mar 24 18:10:49 2002
+++ 2.5.7-akpm/kernel/sched.c Sun Mar 24 18:25:29 2002
@@ -1561,6 +1561,8 @@ void set_cpus_allowed(task_t *p, unsigne
migration_req_t req;
runqueue_t *rq;

+ preempt_disable();
+
new_mask &= cpu_online_map;
if (!new_mask)
BUG();
@@ -1573,7 +1575,7 @@ void set_cpus_allowed(task_t *p, unsigne
*/
if (new_mask & (1UL << p->thread_info->cpu)) {
task_rq_unlock(rq, &flags);
- return;
+ goto out;
}

init_MUTEX_LOCKED(&req.sem);
@@ -1583,6 +1585,8 @@ void set_cpus_allowed(task_t *p, unsigne
wake_up_process(rq->migration_thread);

down(&req.sem);
+out:
+ preempt_enable();
}

static volatile unsigned long migration_mask;
--- 2.5.7/kernel/exit.c~preempt-lockup Sun Mar 24 18:31:39 2002
+++ 2.5.7-akpm/kernel/exit.c Sun Mar 24 18:37:19 2002
@@ -489,6 +489,14 @@ NORET_TYPE void do_exit(long code)
panic("Attempted to kill the idle task!");
if (tsk->pid == 1)
panic("Attempted to kill init!");
+#ifdef CONFIG_PREEMPT
+ if (preempt_get_count()) {
+ printk(KERN_ERR "task `%s' exits with non-zero "
+ "preempt count: %d\n",
+ current->comm,
+ preempt_get_count());
+ }
+#endif
tsk->flags |= PF_EXITING;
del_timer_sync(&tsk->real_timer);


-

2002-03-25 08:15:32

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: preempt-related hangs

On Sun, 24 Mar 2002, Andrew Morton wrote:

> I sent this email to Ingo last week; seems that he's
> having some downtime. It was happening on my dual PIII
> and I now discover that the quad pIII does the same
> thing. Any ideas?
>
>
> Kernel is 2.5.7, dual PIII. When I enable preempt it
> locks during boot.

same 2.5.7 here with quad ppro emulation, i have preempt disabled.

> I applied the kgdb patch and had a poke.
>
> (gdb) info threads
> * 6 Thread 6 preempt_schedule () at sched.c:848
> 5 Thread 5 preempt_schedule () at sched.c:848
> 4 Thread 4 context_thread (startup=0xc0395f90) at context.c:101
> 3 Thread 3 migration_thread (unused=0x0) at sched.c:1646
> 2 Thread 2 migration_thread (unused=0x0) at sched.c:1646
> 1 Thread 1 spawn_ksoftirqd () at softirq.c:407
>
> Note that init is stuck in spawn_ksoftirqd. It's spinning in
> that function, yielding, waiting for the softirqd threads to
> come alive. They're threads 5 and 6.

I'm locking in the same place i have my last CPU spinning, waiting for its
softirqd thread. Then i get a smp_migrate_task IPI from an alive CPU, at
which case i'm stuck.

Zwane