2001-10-14 04:26:03

by Keith Owens

[permalink] [raw]
Subject: Recursive deadlock on die_lock

Typical die() code.

spinlock_t die_lock = SPIN_LOCK_UNLOCKED;

void die(const char * str, struct pt_regs * regs, long err)
{
console_verbose();
spin_lock_irq(&die_lock);
bust_spinlocks(1);
printk("%s: %04lx\n", str, err & 0xffff);
show_registers(regs);
bust_spinlocks(0);
spin_unlock_irq(&die_lock);
do_exit(SIGSEGV);
}

If show_registers() fails (which it does far too often on IA64) then
the system deadlocks trying to recursively obtain die_lock. Also
die_lock is never used outside die(), it should be proc local.
Suggested fix:

void die(const char * str, struct pt_regs * regs, long err)
{
static spinlock_t die_lock = SPIN_LOCK_UNLOCKED;
static int die_lock_owner = -1, die_lock_owner_depth = 0;

if (die_lock_owner != smp_processor_id()) {
console_verbose();
spin_lock_irq(&die_lock);
die_lock_owner = smp_processor_id();
die_lock_owner_depth = 0;
bust_spinlocks(1);
}

if (++die_lock_owner_depth < 3) {
printk("%s: %04lx\n", str, err & 0xffff);
show_registers(regs);
}
else
printk(KERN_ERR "Recursive die() failure, registers suppressed\n");

bust_spinlocks(0);
die_lock_owner = -1;
spin_unlock_irq(&die_lock);
do_exit(SIGSEGV);
}


2001-10-14 06:42:49

by Andrew Morton

[permalink] [raw]
Subject: Re: Recursive deadlock on die_lock

Keith Owens wrote:
>
> ...
> If show_registers() fails (which it does far too often on IA64) then
> the system deadlocks trying to recursively obtain die_lock. Also
> die_lock is never used outside die(), it should be proc local.
> Suggested fix:
>

Looks to me like it'll work. But why does ia64 show_registers()
die so easily? Can it be taught to validate addresses before
dereferencing them somehow?

2001-10-14 07:13:18

by Keith Owens

[permalink] [raw]
Subject: Re: Recursive deadlock on die_lock

On Sat, 13 Oct 2001 23:42:51 -0700,
Andrew Morton <[email protected]> wrote:
>Keith Owens wrote:
>>
>> ...
>> If show_registers() fails (which it does far too often on IA64) then
>> the system deadlocks trying to recursively obtain die_lock. Also
>> die_lock is never used outside die(), it should be proc local.
>> Suggested fix:
>>
>
>Looks to me like it'll work. But why does ia64 show_registers()
>die so easily? Can it be taught to validate addresses before
>dereferencing them somehow?

Unwind code. It is impossible to obtain IA64 saved registers or back
trace the calling sequence without using the unwind API. That API
relies on decent unwind data being associated with each function
prologue, stack adjustment, save of return registers etc. Not an issue
for C code, it is for Assembler where the unwind info has to be hand
coded to match what the asm is doing. IA64 also has PAL code which is
called directly by the kernel, that PAL code has no unwind data so
failures in PAL code result in bad or incomplete back traces.

Unwind is not supposed to fail, it should detect bad input data and
avoid errors. Alas, sometimes it does fail.

2001-10-14 23:25:51

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Recursive deadlock on die_lock

Keith Owens <[email protected]> writes:

> On Sat, 13 Oct 2001 23:42:51 -0700,
> Andrew Morton <[email protected]> wrote:
> >Keith Owens wrote:
> >>
> >> ...
> >> If show_registers() fails (which it does far too often on IA64) then
> >> the system deadlocks trying to recursively obtain die_lock. Also
> >> die_lock is never used outside die(), it should be proc local.
> >> Suggested fix:
> >>
> >
> >Looks to me like it'll work. But why does ia64 show_registers()
> >die so easily? Can it be taught to validate addresses before
> >dereferencing them somehow?
>
> Unwind code. It is impossible to obtain IA64 saved registers or back
> trace the calling sequence without using the unwind API. That API
> relies on decent unwind data being associated with each function
> prologue, stack adjustment, save of return registers etc. Not an issue
> for C code, it is for Assembler where the unwind info has to be hand
> coded to match what the asm is doing. IA64 also has PAL code which is
> called directly by the kernel, that PAL code has no unwind data so
> failures in PAL code result in bad or incomplete back traces.
>
> Unwind is not supposed to fail, it should detect bad input data and
> avoid errors. Alas, sometimes it does fail.

PAL Ahh!!!!!

Please tell me that we are not rely on the firmware to be correct
after we have finished initializing the operating system.

Please tell me it ain't so. I have nightmares about that kind of setup.

Eric


2001-10-15 00:42:52

by Keith Owens

[permalink] [raw]
Subject: Re: Recursive deadlock on die_lock

On 14 Oct 2001 17:14:24 -0600,
[email protected] (Eric W. Biederman) wrote:
>Keith Owens <[email protected]> writes:
>> IA64 also has PAL code which is
>> called directly by the kernel, that PAL code has no unwind data so
>> failures in PAL code result in bad or incomplete back traces.
>
>PAL Ahh!!!!!
>
>Please tell me that we are not rely on the firmware to be correct
>after we have finished initializing the operating system.
>
>Please tell me it ain't so. I have nightmares about that kind of setup.

Not only do we rely on it, it is mandated by the IA64 design. Intel
IA64 System Abstraction Layer, 24535901.pdf. The IA64 kernel calls SAL
all over the place. grep -ir '\<[ps]al' include/asm-ia64/ arch/ia64/

2001-10-15 01:55:06

by Sam Varshavchik

[permalink] [raw]
Subject: Re: Recursive deadlock on die_lock

Keith Owens writes:

> On 14 Oct 2001 17:14:24 -0600,
> [email protected] (Eric W. Biederman) wrote:
>>Keith Owens <[email protected]> writes:
>>> IA64 also has PAL code which is
>>> called directly by the kernel, that PAL code has no unwind data so
>>> failures in PAL code result in bad or incomplete back traces.
>>
>>PAL Ahh!!!!!
>>
>>Please tell me that we are not rely on the firmware to be correct
>>after we have finished initializing the operating system.
>>
>>Please tell me it ain't so. I have nightmares about that kind of setup.
>
> Not only do we rely on it, it is mandated by the IA64 design. Intel
> IA64 System Abstraction Layer, 24535901.pdf. The IA64 kernel calls SAL
> all over the place. grep -ir '\<[ps]al' include/asm-ia64/ arch/ia64/

Oh, goody! What an excellent way to shove CPRM or SSSCA down your throat!
The possibilities are endless...

--
Sam