2002-11-06 22:43:43

by Petr Vandrovec

[permalink] [raw]
Subject: Re: 2.5.44 (now 2.5.46-c929): Strange oopses triggered by .

On 6 Nov 02 at 23:09, Petr Vandrovec wrote:
>
> I'm getting really nervous :-( Is kdb able to track who caused unbalanced
> in_atomic() incrementation?
>
> After more than week of stable system I run simple
> "arp vanicka.vc.cvut.cz" few minutes ago, and after arp output I got
> sleeping function called from illegal context, quickly followed by two
> scheduling while atomic, and finally it died because of userspace faults
> when in_atomic() is != 0 are treated as kernel ones...
>
> As I saw nobody else reporting this or simillar problem, I'll start
> looking at e100 driver I use. Maybe it did not occured because of I
> was running -acX kernels since 25th Oct until yesterday. Anybody knows?

-acX use special stack for hardware IRQs, and preempt_count() is
copied only from task -> hwirq, not other way around (because of it
assumes that preempt_count() is same on exit as it was on enter...).
That's probably reason why -acX was working for me almost two weeks,
but as soon as I returned back to non-ac, it died.
Petr Vandrovec
[email protected]


2002-11-07 00:37:30

by Roger Larsson

[permalink] [raw]
Subject: Preempt count check when leaving IRQ? (Was: Re: 2.5.44 (now 2.5.46-c929): Strange oopses triggered by .)

On Wednesday 06 November 2002 22.49, Petr Vandrovec wrote:
> On 6 Nov 02 at 23:09, Petr Vandrovec wrote:
> >
> > I'm getting really nervous :-( Is kdb able to track who caused unbalanced
> > in_atomic() incrementation?
> >
> > After more than week of stable system I run simple
> > "arp vanicka.vc.cvut.cz" few minutes ago, and after arp output I got
> > sleeping function called from illegal context, quickly followed by two
> > scheduling while atomic, and finally it died because of userspace faults
> > when in_atomic() is != 0 are treated as kernel ones...
> >
> > As I saw nobody else reporting this or simillar problem, I'll start
> > looking at e100 driver I use. Maybe it did not occured because of I
> > was running -acX kernels since 25th Oct until yesterday. Anybody knows?
>
> -acX use special stack for hardware IRQs, and preempt_count() is
> copied only from task -> hwirq, not other way around (because of it
> assumes that preempt_count() is same on exit as it was on enter...).
> That's probably reason why -acX was working for me almost two weeks,
> but as soon as I returned back to non-ac, it died.
> Petr Vandrovec
> [email protected]
>

This is another CHECK to do then.

Make a copy of preempt count when entering an IRQ.
Check that we have the same value when leaving.
(using -acX we only have to add the check when leaving)

/RogerL

--
Roger Larsson
Skellefte?
Sweden