Hi,
I am not sure whether someone has already reported this, but
I can see the following early boot WARNING with the 2.6.29-rc1 kernel in
the serial console output:
------------[ cut here ]------------
WARNING: at kernel/sched.c:4440 sub_preempt_count+0x81/0x95()
Hardware name: LIFEBOOK S7110
Modules linked in:
Pid: 0, comm: swapper Not tainted 2.6.29-rc1 #5
Call Trace:
[<c0125735>] warn_slowpath+0x71/0xa8
[<c013f0e1>] ? tick_check_oneshot_change+0x3c/0xf0
[<c0137f96>] ? hrtimer_run_pending+0x33/0xac
[<c03fde85>] ? _spin_unlock_irq+0x10/0x24
[<c012d110>] ? run_timer_softirq+0x198/0x1a0
[<c04001e0>] sub_preempt_count+0x81/0x95
[<c0129fe6>] _local_bh_enable+0x6c/0x6e
[<c012a0de>] __do_softirq+0xf6/0x100
[<c0129fe8>] ? __do_softirq+0x0/0x100
<IRQ> [<c014f444>] ? handle_level_irq+0x0/0xc6
[<c0129db7>] ? irq_exit+0x3a/0x77
[<c01046ac>] ? do_IRQ+0xf1/0x107
[<c0103527>] ? common_interrupt+0x27/0x2c
[<c01400d8>] ? timer_stats_update_stats+0x1f6/0x1fe
[<c03fdea0>] ? _spin_unlock_irqrestore+0x7/0x25
[<c014eabf>] ? __setup_irq+0x17e/0x1cb
[<c014ebe7>] ? setup_irq+0x19/0x1d
[<c05d6786>] ? time_init_hook+0x2b/0x2d
[<c05ca6f6>] ? hpet_time_init+0x16/0x18
[<c05c575a>] ? start_kernel+0x224/0x292
[<c05c5085>] ? __init_begin+0x85/0x8d
---[ end trace 4eaa2a86a8e2da22 ]---
Full boot log is attached as well as config.
Should I try to bisect down the cause? (Any hints which commits to try
before that?)
[I wasn't sure who should be in CC]
Thanks in advance
and
Best regards.
--
Michal Hocko
L3 team
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic
* Michal Hocko <[email protected]> wrote:
> Hi,
> I am not sure whether someone has already reported this, but
> I can see the following early boot WARNING with the 2.6.29-rc1 kernel in
> the serial console output:
>
> ------------[ cut here ]------------
> WARNING: at kernel/sched.c:4440 sub_preempt_count+0x81/0x95()
That one should be fixed in tip/master and it is in the to-Linus queue of
fixes:
http://people.redhat.com/mingo/tip.git/README
it's this commit:
01e3eb8: Revert "sched: improve preempt debugging"
if you want to cherry-pick the fix.
Ingo
On Tue 13-01-09 13:34:25, Ingo Molnar wrote:
>
> * Michal Hocko <[email protected]> wrote:
>
> > Hi,
> > I am not sure whether someone has already reported this, but
> > I can see the following early boot WARNING with the 2.6.29-rc1 kernel in
> > the serial console output:
> >
> > ------------[ cut here ]------------
> > WARNING: at kernel/sched.c:4440 sub_preempt_count+0x81/0x95()
>
> That one should be fixed in tip/master and it is in the to-Linus queue of
> fixes:
>
> http://people.redhat.com/mingo/tip.git/README
>
> it's this commit:
>
> 01e3eb8: Revert "sched: improve preempt debugging"
>
> if you want to cherry-pick the fix.
Thanks that fixes the issue on my end.
>
> Ingo
--
Michal Hocko
L3 team
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic
On Tuesday 13 January 2009 23:34:25 Ingo Molnar wrote:
> * Michal Hocko <[email protected]> wrote:
> > Hi,
> > I am not sure whether someone has already reported this, but
> > I can see the following early boot WARNING with the 2.6.29-rc1 kernel in
> > the serial console output:
> >
> > ------------[ cut here ]------------
> > WARNING: at kernel/sched.c:4440 sub_preempt_count+0x81/0x95()
>
> That one should be fixed in tip/master and it is in the to-Linus queue of
> fixes:
>
> http://people.redhat.com/mingo/tip.git/README
>
> it's this commit:
>
> 01e3eb8: Revert "sched: improve preempt debugging"
>
> if you want to cherry-pick the fix.
OK, but I still don't think this is the actual problem, but there is
something amiss in the init code being exposed by it.
* Nick Piggin <[email protected]> wrote:
> On Tuesday 13 January 2009 23:34:25 Ingo Molnar wrote:
> > * Michal Hocko <[email protected]> wrote:
> > > Hi,
> > > I am not sure whether someone has already reported this, but
> > > I can see the following early boot WARNING with the 2.6.29-rc1 kernel in
> > > the serial console output:
> > >
> > > ------------[ cut here ]------------
> > > WARNING: at kernel/sched.c:4440 sub_preempt_count+0x81/0x95()
> >
> > That one should be fixed in tip/master and it is in the to-Linus queue of
> > fixes:
> >
> > http://people.redhat.com/mingo/tip.git/README
> >
> > it's this commit:
> >
> > 01e3eb8: Revert "sched: improve preempt debugging"
> >
> > if you want to cherry-pick the fix.
>
> OK, but I still don't think this is the actual problem, but there is
> something amiss in the init code being exposed by it.
the warnings triggered after a softirq, and there's already preempt-leak
checks in the softirq code - so we can exclude that.
a hardirq might have leaked a preempt count - but that would have quite
bad effects [with quick atomic check asserts in schedule()], wouldnt it?
So i tend to think that this is a false positive.
One problem i can think of (and which i outlined in the revert commit log)
is that if a hardirq hits this window in lock_kernel():
void __lockfunc lock_kernel(void)
{
int depth = current->lock_depth+1;
<-------------- HERE
if (likely(!depth))
__lock_kernel();
current->lock_depth = depth;
}
then we have kernel_locked() already true (it checks lock_depth), but the
preempt count is not elevated yet via __lock_kernel(). So if we return
from the hardirq [and run into softirqs that end with a preempt_enable() -
a pure hardirq exit has no preempt debug check] we'll incorrectly think
that there's a preempt leak going on.
Ingo
On Thursday 15 January 2009 21:00:13 Ingo Molnar wrote:
> * Nick Piggin <[email protected]> wrote:
> > On Tuesday 13 January 2009 23:34:25 Ingo Molnar wrote:
> > > * Michal Hocko <[email protected]> wrote:
> > > > Hi,
> > > > I am not sure whether someone has already reported this, but
> > > > I can see the following early boot WARNING with the 2.6.29-rc1 kernel
> > > > in the serial console output:
> > > >
> > > > ------------[ cut here ]------------
> > > > WARNING: at kernel/sched.c:4440 sub_preempt_count+0x81/0x95()
> > >
> > > That one should be fixed in tip/master and it is in the to-Linus queue
> > > of fixes:
> > >
> > > http://people.redhat.com/mingo/tip.git/README
> > >
> > > it's this commit:
> > >
> > > 01e3eb8: Revert "sched: improve preempt debugging"
> > >
> > > if you want to cherry-pick the fix.
> >
> > OK, but I still don't think this is the actual problem, but there is
> > something amiss in the init code being exposed by it.
>
> the warnings triggered after a softirq, and there's already preempt-leak
> checks in the softirq code - so we can exclude that.
>
> a hardirq might have leaked a preempt count - but that would have quite
> bad effects [with quick atomic check asserts in schedule()], wouldnt it?
> So i tend to think that this is a false positive.
>
> One problem i can think of (and which i outlined in the revert commit log)
> is that if a hardirq hits this window in lock_kernel():
>
> void __lockfunc lock_kernel(void)
> {
> int depth = current->lock_depth+1;
> <-------------- HERE
current->lock_depth is not yet modified at this point.
> if (likely(!depth))
> __lock_kernel();
And here we increment preempt_count when taking the BKL, but we
has not yet modified current->lock_depth, so preempt debugging
will proceed with no impact of my patch at this point, regardless
of what interrupts are taken.
So there is a race window, but it is to err on the safe side and
not trigger a false report.
> current->lock_depth = depth;
> }
At this point, the BKL addition to preempt_count will be taken into
account in underflow checking of subsequent preempt counters.
> then we have kernel_locked() already true (it checks lock_depth), but the
> preempt count is not elevated yet via __lock_kernel(). So if we return
> from the hardirq [and run into softirqs that end with a preempt_enable() -
> a pure hardirq exit has no preempt debug check] we'll incorrectly think
> that there's a preempt leak going on.
I don't think so. And anyway BKL is taken very early in boot and not
released for a long time, so I don't think it is possible for these
kinds of races to trigger here anyway.
So I really think it still is a bug or some other misunderstanding we
have. I'd prefer to try to fix it or at least discover why it is happening.
* Nick Piggin <[email protected]> wrote:
> On Thursday 15 January 2009 21:00:13 Ingo Molnar wrote:
> > * Nick Piggin <[email protected]> wrote:
> > > On Tuesday 13 January 2009 23:34:25 Ingo Molnar wrote:
> > > > * Michal Hocko <[email protected]> wrote:
> > > > > Hi,
> > > > > I am not sure whether someone has already reported this, but
> > > > > I can see the following early boot WARNING with the 2.6.29-rc1 kernel
> > > > > in the serial console output:
> > > > >
> > > > > ------------[ cut here ]------------
> > > > > WARNING: at kernel/sched.c:4440 sub_preempt_count+0x81/0x95()
> > > >
> > > > That one should be fixed in tip/master and it is in the to-Linus queue
> > > > of fixes:
> > > >
> > > > http://people.redhat.com/mingo/tip.git/README
> > > >
> > > > it's this commit:
> > > >
> > > > 01e3eb8: Revert "sched: improve preempt debugging"
> > > >
> > > > if you want to cherry-pick the fix.
> > >
> > > OK, but I still don't think this is the actual problem, but there is
> > > something amiss in the init code being exposed by it.
> >
> > the warnings triggered after a softirq, and there's already preempt-leak
> > checks in the softirq code - so we can exclude that.
> >
> > a hardirq might have leaked a preempt count - but that would have quite
> > bad effects [with quick atomic check asserts in schedule()], wouldnt it?
> > So i tend to think that this is a false positive.
> >
> > One problem i can think of (and which i outlined in the revert commit log)
> > is that if a hardirq hits this window in lock_kernel():
> >
> > void __lockfunc lock_kernel(void)
> > {
> > int depth = current->lock_depth+1;
> > <-------------- HERE
>
> current->lock_depth is not yet modified at this point.
>
> > if (likely(!depth))
> > __lock_kernel();
>
> And here we increment preempt_count when taking the BKL, but we
> has not yet modified current->lock_depth, so preempt debugging
> will proceed with no impact of my patch at this point, regardless
> of what interrupts are taken.
>
> So there is a race window, but it is to err on the safe side and
> not trigger a false report.
>
> > current->lock_depth = depth;
> > }
>
> At this point, the BKL addition to preempt_count will be taken into
> account in underflow checking of subsequent preempt counters.
indeed, and the unlock path looks good too.
> > then we have kernel_locked() already true (it checks lock_depth), but
> > the preempt count is not elevated yet via __lock_kernel(). So if we
> > return from the hardirq [and run into softirqs that end with a
> > preempt_enable() - a pure hardirq exit has no preempt debug check]
> > we'll incorrectly think that there's a preempt leak going on.
>
> I don't think so. And anyway BKL is taken very early in boot and not
> released for a long time, so I don't think it is possible for these
> kinds of races to trigger here anyway.
>
> So I really think it still is a bug or some other misunderstanding we
> have. I'd prefer to try to fix it or at least discover why it is
> happening.
yes - but the thing was reported a month ago, it went then upstream and
re-reported and re-bisected - so time ran out to debug it and we reverted
it. We can re-apply it if you have a good theory about what happened.
Ingo
On Thursday 15 January 2009 22:26:40 Ingo Molnar wrote:
> * Nick Piggin <[email protected]> wrote:
> > On Thursday 15 January 2009 21:00:13 Ingo Molnar wrote:
> > > * Nick Piggin <[email protected]> wrote:
> > > > On Tuesday 13 January 2009 23:34:25 Ingo Molnar wrote:
> > > > > * Michal Hocko <[email protected]> wrote:
> > > > > > Hi,
> > > > > > I am not sure whether someone has already reported this, but
> > > > > > I can see the following early boot WARNING with the 2.6.29-rc1
> > > > > > kernel in the serial console output:
> > > > > >
> > > > > > ------------[ cut here ]------------
> > > > > > WARNING: at kernel/sched.c:4440 sub_preempt_count+0x81/0x95()
> > > > >
> > > > > That one should be fixed in tip/master and it is in the to-Linus
> > > > > queue of fixes:
> > > > >
> > > > > http://people.redhat.com/mingo/tip.git/README
> > > > >
> > > > > it's this commit:
> > > > >
> > > > > 01e3eb8: Revert "sched: improve preempt debugging"
> > > > >
> > > > > if you want to cherry-pick the fix.
> > > >
> > > > OK, but I still don't think this is the actual problem, but there is
> > > > something amiss in the init code being exposed by it.
> > >
> > > the warnings triggered after a softirq, and there's already
> > > preempt-leak checks in the softirq code - so we can exclude that.
> > >
> > > a hardirq might have leaked a preempt count - but that would have quite
> > > bad effects [with quick atomic check asserts in schedule()], wouldnt
> > > it? So i tend to think that this is a false positive.
> > >
> > > One problem i can think of (and which i outlined in the revert commit
> > > log) is that if a hardirq hits this window in lock_kernel():
> > >
> > > void __lockfunc lock_kernel(void)
> > > {
> > > int depth = current->lock_depth+1;
> > > <-------------- HERE
> >
> > current->lock_depth is not yet modified at this point.
> >
> > > if (likely(!depth))
> > > __lock_kernel();
> >
> > And here we increment preempt_count when taking the BKL, but we
> > has not yet modified current->lock_depth, so preempt debugging
> > will proceed with no impact of my patch at this point, regardless
> > of what interrupts are taken.
> >
> > So there is a race window, but it is to err on the safe side and
> > not trigger a false report.
> >
> > > current->lock_depth = depth;
> > > }
> >
> > At this point, the BKL addition to preempt_count will be taken into
> > account in underflow checking of subsequent preempt counters.
>
> indeed, and the unlock path looks good too.
>
> > > then we have kernel_locked() already true (it checks lock_depth), but
> > > the preempt count is not elevated yet via __lock_kernel(). So if we
> > > return from the hardirq [and run into softirqs that end with a
> > > preempt_enable() - a pure hardirq exit has no preempt debug check]
> > > we'll incorrectly think that there's a preempt leak going on.
> >
> > I don't think so. And anyway BKL is taken very early in boot and not
> > released for a long time, so I don't think it is possible for these
> > kinds of races to trigger here anyway.
> >
> > So I really think it still is a bug or some other misunderstanding we
> > have. I'd prefer to try to fix it or at least discover why it is
> > happening.
>
> yes - but the thing was reported a month ago, it went then upstream and
> re-reported and re-bisected - so time ran out to debug it and we reverted
> it. We can re-apply it if you have a good theory about what happened.
OK, but don't shoot the messenger ;) I don't know that code very
well... I was hoping someone else might jump in and fix it.
A few printks around could shed light on what preempt_count and
lock_depth is doing.