2009-01-13 12:25:26

by Michal Hocko

[permalink] [raw]
Subject: WARNING: at kernel/sched.c:4440 sub_preempt_count+0x81/0x95()

Hi,
I am not sure whether someone has already reported this, but
I can see the following early boot WARNING with the 2.6.29-rc1 kernel in
the serial console output:

------------[ cut here ]------------
WARNING: at kernel/sched.c:4440 sub_preempt_count+0x81/0x95()
Hardware name: LIFEBOOK S7110
Modules linked in:
Pid: 0, comm: swapper Not tainted 2.6.29-rc1 #5
Call Trace:
[<c0125735>] warn_slowpath+0x71/0xa8
[<c013f0e1>] ? tick_check_oneshot_change+0x3c/0xf0
[<c0137f96>] ? hrtimer_run_pending+0x33/0xac
[<c03fde85>] ? _spin_unlock_irq+0x10/0x24
[<c012d110>] ? run_timer_softirq+0x198/0x1a0
[<c04001e0>] sub_preempt_count+0x81/0x95
[<c0129fe6>] _local_bh_enable+0x6c/0x6e
[<c012a0de>] __do_softirq+0xf6/0x100
[<c0129fe8>] ? __do_softirq+0x0/0x100
<IRQ> [<c014f444>] ? handle_level_irq+0x0/0xc6
[<c0129db7>] ? irq_exit+0x3a/0x77
[<c01046ac>] ? do_IRQ+0xf1/0x107
[<c0103527>] ? common_interrupt+0x27/0x2c
[<c01400d8>] ? timer_stats_update_stats+0x1f6/0x1fe
[<c03fdea0>] ? _spin_unlock_irqrestore+0x7/0x25
[<c014eabf>] ? __setup_irq+0x17e/0x1cb
[<c014ebe7>] ? setup_irq+0x19/0x1d
[<c05d6786>] ? time_init_hook+0x2b/0x2d
[<c05ca6f6>] ? hpet_time_init+0x16/0x18
[<c05c575a>] ? start_kernel+0x224/0x292
[<c05c5085>] ? __init_begin+0x85/0x8d
---[ end trace 4eaa2a86a8e2da22 ]---

Full boot log is attached as well as config.
Should I try to bisect down the cause? (Any hints which commits to try
before that?)

[I wasn't sure who should be in CC]

Thanks in advance
and
Best regards.
--
Michal Hocko
L3 team
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic


Attachments:
(No filename) (1.56 kB)
boot.log (66.11 kB)
config-2.6.29-rc1 (59.33 kB)
Download all attachments

2009-01-13 12:34:46

by Ingo Molnar

[permalink] [raw]
Subject: Re: WARNING: at kernel/sched.c:4440 sub_preempt_count+0x81/0x95()


* Michal Hocko <[email protected]> wrote:

> Hi,
> I am not sure whether someone has already reported this, but
> I can see the following early boot WARNING with the 2.6.29-rc1 kernel in
> the serial console output:
>
> ------------[ cut here ]------------
> WARNING: at kernel/sched.c:4440 sub_preempt_count+0x81/0x95()

That one should be fixed in tip/master and it is in the to-Linus queue of
fixes:

http://people.redhat.com/mingo/tip.git/README

it's this commit:

01e3eb8: Revert "sched: improve preempt debugging"

if you want to cherry-pick the fix.

Ingo

2009-01-13 13:21:27

by Michal Hocko

[permalink] [raw]
Subject: Re: WARNING: at kernel/sched.c:4440 sub_preempt_count+0x81/0x95()

On Tue 13-01-09 13:34:25, Ingo Molnar wrote:
>
> * Michal Hocko <[email protected]> wrote:
>
> > Hi,
> > I am not sure whether someone has already reported this, but
> > I can see the following early boot WARNING with the 2.6.29-rc1 kernel in
> > the serial console output:
> >
> > ------------[ cut here ]------------
> > WARNING: at kernel/sched.c:4440 sub_preempt_count+0x81/0x95()
>
> That one should be fixed in tip/master and it is in the to-Linus queue of
> fixes:
>
> http://people.redhat.com/mingo/tip.git/README
>
> it's this commit:
>
> 01e3eb8: Revert "sched: improve preempt debugging"
>
> if you want to cherry-pick the fix.

Thanks that fixes the issue on my end.

>
> Ingo

--
Michal Hocko
L3 team
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic

2009-01-15 06:55:29

by Nick Piggin

[permalink] [raw]
Subject: Re: WARNING: at kernel/sched.c:4440 sub_preempt_count+0x81/0x95()

On Tuesday 13 January 2009 23:34:25 Ingo Molnar wrote:
> * Michal Hocko <[email protected]> wrote:
> > Hi,
> > I am not sure whether someone has already reported this, but
> > I can see the following early boot WARNING with the 2.6.29-rc1 kernel in
> > the serial console output:
> >
> > ------------[ cut here ]------------
> > WARNING: at kernel/sched.c:4440 sub_preempt_count+0x81/0x95()
>
> That one should be fixed in tip/master and it is in the to-Linus queue of
> fixes:
>
> http://people.redhat.com/mingo/tip.git/README
>
> it's this commit:
>
> 01e3eb8: Revert "sched: improve preempt debugging"
>
> if you want to cherry-pick the fix.

OK, but I still don't think this is the actual problem, but there is
something amiss in the init code being exposed by it.

2009-01-15 10:00:35

by Ingo Molnar

[permalink] [raw]
Subject: Re: WARNING: at kernel/sched.c:4440 sub_preempt_count+0x81/0x95()


* Nick Piggin <[email protected]> wrote:

> On Tuesday 13 January 2009 23:34:25 Ingo Molnar wrote:
> > * Michal Hocko <[email protected]> wrote:
> > > Hi,
> > > I am not sure whether someone has already reported this, but
> > > I can see the following early boot WARNING with the 2.6.29-rc1 kernel in
> > > the serial console output:
> > >
> > > ------------[ cut here ]------------
> > > WARNING: at kernel/sched.c:4440 sub_preempt_count+0x81/0x95()
> >
> > That one should be fixed in tip/master and it is in the to-Linus queue of
> > fixes:
> >
> > http://people.redhat.com/mingo/tip.git/README
> >
> > it's this commit:
> >
> > 01e3eb8: Revert "sched: improve preempt debugging"
> >
> > if you want to cherry-pick the fix.
>
> OK, but I still don't think this is the actual problem, but there is
> something amiss in the init code being exposed by it.

the warnings triggered after a softirq, and there's already preempt-leak
checks in the softirq code - so we can exclude that.

a hardirq might have leaked a preempt count - but that would have quite
bad effects [with quick atomic check asserts in schedule()], wouldnt it?
So i tend to think that this is a false positive.

One problem i can think of (and which i outlined in the revert commit log)
is that if a hardirq hits this window in lock_kernel():

void __lockfunc lock_kernel(void)
{
int depth = current->lock_depth+1;
<-------------- HERE
if (likely(!depth))
__lock_kernel();
current->lock_depth = depth;
}

then we have kernel_locked() already true (it checks lock_depth), but the
preempt count is not elevated yet via __lock_kernel(). So if we return
from the hardirq [and run into softirqs that end with a preempt_enable() -
a pure hardirq exit has no preempt debug check] we'll incorrectly think
that there's a preempt leak going on.

Ingo

2009-01-15 10:54:19

by Nick Piggin

[permalink] [raw]
Subject: Re: WARNING: at kernel/sched.c:4440 sub_preempt_count+0x81/0x95()

On Thursday 15 January 2009 21:00:13 Ingo Molnar wrote:
> * Nick Piggin <[email protected]> wrote:
> > On Tuesday 13 January 2009 23:34:25 Ingo Molnar wrote:
> > > * Michal Hocko <[email protected]> wrote:
> > > > Hi,
> > > > I am not sure whether someone has already reported this, but
> > > > I can see the following early boot WARNING with the 2.6.29-rc1 kernel
> > > > in the serial console output:
> > > >
> > > > ------------[ cut here ]------------
> > > > WARNING: at kernel/sched.c:4440 sub_preempt_count+0x81/0x95()
> > >
> > > That one should be fixed in tip/master and it is in the to-Linus queue
> > > of fixes:
> > >
> > > http://people.redhat.com/mingo/tip.git/README
> > >
> > > it's this commit:
> > >
> > > 01e3eb8: Revert "sched: improve preempt debugging"
> > >
> > > if you want to cherry-pick the fix.
> >
> > OK, but I still don't think this is the actual problem, but there is
> > something amiss in the init code being exposed by it.
>
> the warnings triggered after a softirq, and there's already preempt-leak
> checks in the softirq code - so we can exclude that.
>
> a hardirq might have leaked a preempt count - but that would have quite
> bad effects [with quick atomic check asserts in schedule()], wouldnt it?
> So i tend to think that this is a false positive.
>
> One problem i can think of (and which i outlined in the revert commit log)
> is that if a hardirq hits this window in lock_kernel():
>
> void __lockfunc lock_kernel(void)
> {
> int depth = current->lock_depth+1;
> <-------------- HERE

current->lock_depth is not yet modified at this point.

> if (likely(!depth))
> __lock_kernel();

And here we increment preempt_count when taking the BKL, but we
has not yet modified current->lock_depth, so preempt debugging
will proceed with no impact of my patch at this point, regardless
of what interrupts are taken.

So there is a race window, but it is to err on the safe side and
not trigger a false report.

> current->lock_depth = depth;
> }

At this point, the BKL addition to preempt_count will be taken into
account in underflow checking of subsequent preempt counters.


> then we have kernel_locked() already true (it checks lock_depth), but the
> preempt count is not elevated yet via __lock_kernel(). So if we return
> from the hardirq [and run into softirqs that end with a preempt_enable() -
> a pure hardirq exit has no preempt debug check] we'll incorrectly think
> that there's a preempt leak going on.

I don't think so. And anyway BKL is taken very early in boot and not
released for a long time, so I don't think it is possible for these
kinds of races to trigger here anyway.

So I really think it still is a bug or some other misunderstanding we
have. I'd prefer to try to fix it or at least discover why it is happening.

2009-01-15 11:26:54

by Ingo Molnar

[permalink] [raw]
Subject: Re: WARNING: at kernel/sched.c:4440 sub_preempt_count+0x81/0x95()


* Nick Piggin <[email protected]> wrote:

> On Thursday 15 January 2009 21:00:13 Ingo Molnar wrote:
> > * Nick Piggin <[email protected]> wrote:
> > > On Tuesday 13 January 2009 23:34:25 Ingo Molnar wrote:
> > > > * Michal Hocko <[email protected]> wrote:
> > > > > Hi,
> > > > > I am not sure whether someone has already reported this, but
> > > > > I can see the following early boot WARNING with the 2.6.29-rc1 kernel
> > > > > in the serial console output:
> > > > >
> > > > > ------------[ cut here ]------------
> > > > > WARNING: at kernel/sched.c:4440 sub_preempt_count+0x81/0x95()
> > > >
> > > > That one should be fixed in tip/master and it is in the to-Linus queue
> > > > of fixes:
> > > >
> > > > http://people.redhat.com/mingo/tip.git/README
> > > >
> > > > it's this commit:
> > > >
> > > > 01e3eb8: Revert "sched: improve preempt debugging"
> > > >
> > > > if you want to cherry-pick the fix.
> > >
> > > OK, but I still don't think this is the actual problem, but there is
> > > something amiss in the init code being exposed by it.
> >
> > the warnings triggered after a softirq, and there's already preempt-leak
> > checks in the softirq code - so we can exclude that.
> >
> > a hardirq might have leaked a preempt count - but that would have quite
> > bad effects [with quick atomic check asserts in schedule()], wouldnt it?
> > So i tend to think that this is a false positive.
> >
> > One problem i can think of (and which i outlined in the revert commit log)
> > is that if a hardirq hits this window in lock_kernel():
> >
> > void __lockfunc lock_kernel(void)
> > {
> > int depth = current->lock_depth+1;
> > <-------------- HERE
>
> current->lock_depth is not yet modified at this point.
>
> > if (likely(!depth))
> > __lock_kernel();
>
> And here we increment preempt_count when taking the BKL, but we
> has not yet modified current->lock_depth, so preempt debugging
> will proceed with no impact of my patch at this point, regardless
> of what interrupts are taken.
>
> So there is a race window, but it is to err on the safe side and
> not trigger a false report.
>
> > current->lock_depth = depth;
> > }
>
> At this point, the BKL addition to preempt_count will be taken into
> account in underflow checking of subsequent preempt counters.

indeed, and the unlock path looks good too.

> > then we have kernel_locked() already true (it checks lock_depth), but
> > the preempt count is not elevated yet via __lock_kernel(). So if we
> > return from the hardirq [and run into softirqs that end with a
> > preempt_enable() - a pure hardirq exit has no preempt debug check]
> > we'll incorrectly think that there's a preempt leak going on.
>
> I don't think so. And anyway BKL is taken very early in boot and not
> released for a long time, so I don't think it is possible for these
> kinds of races to trigger here anyway.
>
> So I really think it still is a bug or some other misunderstanding we
> have. I'd prefer to try to fix it or at least discover why it is
> happening.

yes - but the thing was reported a month ago, it went then upstream and
re-reported and re-bisected - so time ran out to debug it and we reverted
it. We can re-apply it if you have a good theory about what happened.

Ingo

2009-01-15 11:44:39

by Nick Piggin

[permalink] [raw]
Subject: Re: WARNING: at kernel/sched.c:4440 sub_preempt_count+0x81/0x95()

On Thursday 15 January 2009 22:26:40 Ingo Molnar wrote:
> * Nick Piggin <[email protected]> wrote:
> > On Thursday 15 January 2009 21:00:13 Ingo Molnar wrote:
> > > * Nick Piggin <[email protected]> wrote:
> > > > On Tuesday 13 January 2009 23:34:25 Ingo Molnar wrote:
> > > > > * Michal Hocko <[email protected]> wrote:
> > > > > > Hi,
> > > > > > I am not sure whether someone has already reported this, but
> > > > > > I can see the following early boot WARNING with the 2.6.29-rc1
> > > > > > kernel in the serial console output:
> > > > > >
> > > > > > ------------[ cut here ]------------
> > > > > > WARNING: at kernel/sched.c:4440 sub_preempt_count+0x81/0x95()
> > > > >
> > > > > That one should be fixed in tip/master and it is in the to-Linus
> > > > > queue of fixes:
> > > > >
> > > > > http://people.redhat.com/mingo/tip.git/README
> > > > >
> > > > > it's this commit:
> > > > >
> > > > > 01e3eb8: Revert "sched: improve preempt debugging"
> > > > >
> > > > > if you want to cherry-pick the fix.
> > > >
> > > > OK, but I still don't think this is the actual problem, but there is
> > > > something amiss in the init code being exposed by it.
> > >
> > > the warnings triggered after a softirq, and there's already
> > > preempt-leak checks in the softirq code - so we can exclude that.
> > >
> > > a hardirq might have leaked a preempt count - but that would have quite
> > > bad effects [with quick atomic check asserts in schedule()], wouldnt
> > > it? So i tend to think that this is a false positive.
> > >
> > > One problem i can think of (and which i outlined in the revert commit
> > > log) is that if a hardirq hits this window in lock_kernel():
> > >
> > > void __lockfunc lock_kernel(void)
> > > {
> > > int depth = current->lock_depth+1;
> > > <-------------- HERE
> >
> > current->lock_depth is not yet modified at this point.
> >
> > > if (likely(!depth))
> > > __lock_kernel();
> >
> > And here we increment preempt_count when taking the BKL, but we
> > has not yet modified current->lock_depth, so preempt debugging
> > will proceed with no impact of my patch at this point, regardless
> > of what interrupts are taken.
> >
> > So there is a race window, but it is to err on the safe side and
> > not trigger a false report.
> >
> > > current->lock_depth = depth;
> > > }
> >
> > At this point, the BKL addition to preempt_count will be taken into
> > account in underflow checking of subsequent preempt counters.
>
> indeed, and the unlock path looks good too.
>
> > > then we have kernel_locked() already true (it checks lock_depth), but
> > > the preempt count is not elevated yet via __lock_kernel(). So if we
> > > return from the hardirq [and run into softirqs that end with a
> > > preempt_enable() - a pure hardirq exit has no preempt debug check]
> > > we'll incorrectly think that there's a preempt leak going on.
> >
> > I don't think so. And anyway BKL is taken very early in boot and not
> > released for a long time, so I don't think it is possible for these
> > kinds of races to trigger here anyway.
> >
> > So I really think it still is a bug or some other misunderstanding we
> > have. I'd prefer to try to fix it or at least discover why it is
> > happening.
>
> yes - but the thing was reported a month ago, it went then upstream and
> re-reported and re-bisected - so time ran out to debug it and we reverted
> it. We can re-apply it if you have a good theory about what happened.

OK, but don't shoot the messenger ;) I don't know that code very
well... I was hoping someone else might jump in and fix it.

A few printks around could shed light on what preempt_count and
lock_depth is doing.