During the testing of the FFTW library (http://www.fftw.org) in multithreaded
mode I am regularly getting strange lock-ups on Itanium LION (4CPU)
and NEC AzusA (16CPU) systems. Anyway, I don't think this is IA64
specific...
The symptoms: running the tests (make check) sometimes ends up with
hanging processes. Reading from some of the /proc/pid/* files also
lead to hangs (top & ps just don't return). The processes cannot be
killed either.
As far as I understand, the program crashes while one of the threads
(#1) tries to get a write lock on mm->mmap_sem. Another thread (#2)
starts dumping core and gets a read lock in the mean time (before
thread #1). The write lock request gets queued and a subsequent
read lock request by thread #2 happening somewhere in the call tree of
coredump (in access_process_vm) cannot be satisfied. The two threads
wait forever and attempts to access their mm structure end up in the
same deadlock.
The tracebacks obtained by kdb vary from case to case a bit but I
basically see:
#1 : schedule <- __down_write <- sys_mmap <- ia64_ret_from_syscall
#2: schedule <- __down_read <- access_process_vm <- ia64_sync_user_rbs
<- do_copy_regs <- unw_init_running <- load_script <- do_coredump
<- ia64_do_signal <- handle_signal_delivery <- ia64_leave_kernel
others... : either already gone or somewhere in do_exit or
sys_rt_sigsuspend, unimportant anyway...
I'm not really curious to debug the FFTW or pthreads libraries but
don't think that user program should lead to such deadlocks in the
kernel. So maybe one can fix this problem in the kernel...
I guess the problem is the nesting of critical sections, in this case
a critical section is defined in fs/exec.c:do_coredump and in
kernel/ptrace.c:access_process_vm. Getting rid of this kind of nesting
is quite tedious, so maybe one should deal with the nested critical
sections. A solution would be: The read lock request (down_read)
should be immediately granted if the current thread already owns the
lock. So maybe each task should remember the locks it owns, maybe in a
list accessible through the task structure.
I'm curious about your oppinions. Maybe there is already a way to deal
with this kind of problem?
Best regards,
Erich
---
Erich Focht <[email protected]>
NEC European Supercomputer Systems, European HPC Technology Center
> The symptoms: running the tests (make check) sometimes ends up
> with hanging processes.
Does it _only_ hang during coredumping, or also during normal usage?
Could you remove
down_read(&mmap_sem);
binfmt->coredump();
up_read(&mmap_sem);
from fs/exec.c and rerun your tests?
The hang during coredumping is known, there are 2 fixes [I have one, not
yet released, Andrea wrote one, IIRC included in his -aa kernels].
Up to 2.4.10 there was a second hang with /proc/*/stats, that one is
fixed in 2.4.10.
--
Manfred
On Mon, 1 Oct 2001, Manfred Spraul wrote:
> > The symptoms: running the tests (make check) sometimes ends up
> > with hanging processes.
>
> Does it _only_ hang during coredumping, or also during normal usage?
>
> Could you remove
> down_read(&mmap_sem);
> binfmt->coredump();
> up_read(&mmap_sem);
> from fs/exec.c and rerun your tests?
Setting the coredumpsize limit to 0 already solves the problem.
The question that remains is how to deal with nested locks on the same
resource that can lead to deadlocks. Is there any (un)written rule that
one should avoid them in the Linux Kernel? Or are there any approaches to
deal with them (which are not yet included in the Kernel)?
> The hang during coredumping is known, there are 2 fixes [I have one, not
> yet released, Andrea wrote one, IIRC included in his -aa kernels].
Do these solutions deal only with the coredump problem or with nested
critical sections?
Thanks,
Erich
Erich Focht wrote:
>
> The question that remains is how to deal with nested locks on the same
> resource that can lead to deadlocks. Is there any (un)written rule that
> one should avoid them in the Linux Kernel? Or are there any approaches to
> deal with them (which are not yet included in the Kernel)?
>
Yes, semaphores and spinlocks are not recursive. There is one exception
for rw spinlocks, they can recurse on read. I'm not aware that there are
any plans to change that.
My patch avoids calling copy_from_user in elf_core_dump, Andrea adds a
limited recursion support and uses that to prevent the deadlock.
With his patch you can recurse on on down_read() if you pass special
parameters.
But full recursion support is not planned.
--
Manfred