2008-06-11 22:06:25

by Alessandro Suardi

[permalink] [raw]
Subject: [2.6.26-rc5-git2] WARNING: at kernel/lockdep.c:2658 check_flags+0x4c/0x128()

Recently upgraded my FC6 desktop to Fedora 9; with the
latest nautilus RPM updates my VNC session went nuts
with nautilus pegging the CPU for everything that breathed.

I now reverted to an earlier nautilus package, but during
the peak CPU period my kernel spat this:

[314185.623294] ------------[ cut here ]------------
[314185.623414] WARNING: at kernel/lockdep.c:2658 check_flags+0x4c/0x128()
[314185.623514] Modules linked in: iptable_filter ip_tables x_tables
sunrpc ipv6 fuse snd_via82xx snd_ac97_codec ac97_bus snd_mpu401_uart
snd_rawmidi via686a hwmon parport_pc sg parport uhci_hcd ehci_hcd
[314185.623924] Pid: 12314, comm: nautilus Not tainted 2.6.26-rc5-git2 #4
[314185.624021] [<c0115b95>] warn_on_slowpath+0x41/0x7b
[314185.624021] [<c010de70>] ? do_page_fault+0x2c1/0x5fd
[314185.624021] [<c0128396>] ? up_read+0x16/0x28
[314185.624021] [<c010de70>] ? do_page_fault+0x2c1/0x5fd
[314185.624021] [<c012fa33>] ? __lock_acquire+0xbb4/0xbc3
[314185.624021] [<c012d0a0>] check_flags+0x4c/0x128
[314185.624021] [<c012fa73>] lock_acquire+0x31/0x7d
[314185.624021] [<c0128cf6>] __atomic_notifier_call_chain+0x30/0x80
[314185.624021] [<c0128cc6>] ? __atomic_notifier_call_chain+0x0/0x80
[314185.624021] [<c0128d52>] atomic_notifier_call_chain+0xc/0xe
[314185.624021] [<c0128d81>] notify_die+0x2d/0x2f
[314185.624021] [<c01043b0>] do_int3+0x1f/0x4d
[314185.624021] [<c02f2d3b>] int3+0x27/0x2c
[314185.624021] =======================
[314185.624021] ---[ end trace 1923f65a2d7bb246 ]---
[314185.624021] possible reason: unannotated irqs-off.
[314185.624021] irq event stamp: 488879
[314185.624021] hardirqs last enabled at (488879): [<c0102d67>]
restore_nocheck+0x12/0x15
[314185.624021] hardirqs last disabled at (488878): [<c0102dca>]
work_resched+0x19/0x30
[314185.624021] softirqs last enabled at (488876): [<c011a1ba>]
__do_softirq+0xa6/0xac
[314185.624021] softirqs last disabled at (488865): [<c010476e>]
do_softirq+0x57/0xa6

I didn't seem to find it with some googling, so here it is.

I was incidentally ltracing that process to try and find out
what was gulping down that much CPU (sorry, no idea
whether ltrace and the WARNING happened at the same
time or which came first) and:

[root@donkey ~]# ltrace -c -p 12314
^C% time seconds usecs/call calls function
------ ----------- ----------- --------- --------------------
78.22 28.837601 739425 39 g_cclosure_marshal_VOID__OBJECT
6.53 2.407936 150496 16 eel_canvas_item_get_bounds
5.82 2.145861 3461 620 g_type_check_instance_cast
5.45 2.009936 71783 28 gdk_gc_set_rgb_fg_color
1.13 0.416437 450 924 g_direct_hash
0.48 0.175278 304 576 g_type_check_instance_is_a
0.22 0.080984 5061 16 eel_canvas_item_set
0.21 0.078327 349 224 eel_str_is_empty
0.20 0.073830 4614 16 pango_layout_get_extents
0.19 0.071272 456 156 g_object_unref
0.19 0.070753 4422 16 gtk_icon_theme_choose_icon
0.11 0.041811 140 297 g_direct_equal
0.11 0.041042 513 80 g_hash_table_lookup
0.09 0.034237 237 144 eel_canvas_item_get_type
0.07 0.024834 253 98 g_free
0.05 0.019926 176 113 g_object_ref
[snipped stuff under 0.05]
------ ----------- ----------- --------- --------------------
100.00 36.868859 4658 total

Ouch. And another ltrace several seconds later:

[root@donkey ~]# ltrace -c -p 12314
^C% time seconds usecs/call calls function
------ ----------- ----------- --------- --------------------
98.55 6.777972 2259324 3 g_cclosure_marshal_VOID__VOID
0.71 0.049092 79 616 g_direct_hash
0.26 0.017639 430 41 g_object_unref
0.14 0.009971 77 129 g_type_check_instance_cast
0.07 0.004904 1226 4 gdk_pixbuf_get_height
0.04 0.002583 75 34 gtk_widget_get_type
0.03 0.001875 234 8 gdk_x11_get_xatom_by_name
0.02 0.001681 80 21 g_type_check_class_cast
0.02 0.001539 384 4 eel_gdk_draw_layout_with_drop_shadow
0.02 0.001494 74 20 eel_canvas_item_get_type
0.02 0.001238 123 10 gtk_widget_style_get
0.01 0.000971 74 13 g_type_check_instance_is_a
0.01 0.000920 76 12 g_object_ref
0.01 0.000918 131 7 gdk_gc_new
0.01 0.000748 187 4 gdk_draw_pixbuf
0.01 0.000664 83 8 gdk_pixbuf_get_width
0.01 0.000633 90 7 gdk_gc_set_rgb_fg_color
0.01 0.000525 75 7 gtk_layout_get_type
0.01 0.000522 74 7 eel_gdk_rgb_to_color
0.01 0.000425 85 5 gtk_object_get_type
0.00 0.000305 76 4 gdk_rectangle_intersect
0.00 0.000305 76 4 floor
0.00 0.000302 75 4 pango_layout_get_extents
0.00 0.000232 77 3 eel_gdk_color_to_rgb
0.00 0.000149 74 2 eel_gdk_color_is_dark
0.00 0.000093 93 1 eel_get_widget_background
0.00 0.000078 78 1 eel_background_get_type
0.00 0.000078 78 1 eel_background_is_set
------ ----------- ----------- --------- --------------------
100.00 6.877856 980 total


Yikes ! Enough. After that, the unavoidable downgrade...

[root@donkey ~]# rpm -Uvh --oldpackage /tmp/nautilus-2.22.2-7.fc9.i386.rpm

And stuff is back to decent - Xvnc under Fedora 9 is clearly
slower than the same under Fedora 8, but the above
nautilus behavior was rendering it unusable.

Box is my trusty noname K7-800, 512MB RAM; if there's
anything else useful I might be able to provide, just ask.

Thanks,

--alessandro

"Give me love / Or give me hate
Give me anything that's not just ok"

(Sophia, 'Weightless')


2008-06-12 06:56:26

by Vegard Nossum

[permalink] [raw]
Subject: Re: [2.6.26-rc5-git2] WARNING: at kernel/lockdep.c:2658 check_flags+0x4c/0x128()

Hi,

Thanks for reporting!

On Thu, Jun 12, 2008 at 12:06 AM, Alessandro Suardi
<[email protected]> wrote:
> Recently upgraded my FC6 desktop to Fedora 9; with the
> latest nautilus RPM updates my VNC session went nuts
> with nautilus pegging the CPU for everything that breathed.
>
> I now reverted to an earlier nautilus package, but during
> the peak CPU period my kernel spat this:
>
> [314185.623294] ------------[ cut here ]------------
> [314185.623414] WARNING: at kernel/lockdep.c:2658 check_flags+0x4c/0x128()
> [314185.623514] Modules linked in: iptable_filter ip_tables x_tables
> sunrpc ipv6 fuse snd_via82xx snd_ac97_codec ac97_bus snd_mpu401_uart
> snd_rawmidi via686a hwmon parport_pc sg parport uhci_hcd ehci_hcd
> [314185.623924] Pid: 12314, comm: nautilus Not tainted 2.6.26-rc5-git2 #4
> [314185.624021] [<c0115b95>] warn_on_slowpath+0x41/0x7b
> [314185.624021] [<c010de70>] ? do_page_fault+0x2c1/0x5fd
> [314185.624021] [<c0128396>] ? up_read+0x16/0x28
> [314185.624021] [<c010de70>] ? do_page_fault+0x2c1/0x5fd
> [314185.624021] [<c012fa33>] ? __lock_acquire+0xbb4/0xbc3
> [314185.624021] [<c012d0a0>] check_flags+0x4c/0x128
> [314185.624021] [<c012fa73>] lock_acquire+0x31/0x7d
> [314185.624021] [<c0128cf6>] __atomic_notifier_call_chain+0x30/0x80
> [314185.624021] [<c0128cc6>] ? __atomic_notifier_call_chain+0x0/0x80
> [314185.624021] [<c0128d52>] atomic_notifier_call_chain+0xc/0xe
> [314185.624021] [<c0128d81>] notify_die+0x2d/0x2f
> [314185.624021] [<c01043b0>] do_int3+0x1f/0x4d
> [314185.624021] [<c02f2d3b>] int3+0x27/0x2c
> [314185.624021] =======================
> [314185.624021] ---[ end trace 1923f65a2d7bb246 ]---
> [314185.624021] possible reason: unannotated irqs-off.
> [314185.624021] irq event stamp: 488879
> [314185.624021] hardirqs last enabled at (488879): [<c0102d67>]
> restore_nocheck+0x12/0x15
> [314185.624021] hardirqs last disabled at (488878): [<c0102dca>]
> work_resched+0x19/0x30
> [314185.624021] softirqs last enabled at (488876): [<c011a1ba>]
> __do_softirq+0xa6/0xac
> [314185.624021] softirqs last disabled at (488865): [<c010476e>]
> do_softirq+0x57/0xa6
>
> I didn't seem to find it with some googling, so here it is.
>
> I was incidentally ltracing that process to try and find out
> what was gulping down that much CPU (sorry, no idea
> whether ltrace and the WARNING happened at the same
> time or which came first) and:

Yeah, this is extremely likely to be the source of the warning.

The warning should be harmless, however.

> Box is my trusty noname K7-800, 512MB RAM; if there's
> anything else useful I might be able to provide, just ask.

It would be interesting to see where the int3 comes from. Too bad,
lockdep doesn't provide the register dump. The stacktrace also doesn't
go further than the int3(), I wonder if this int3 came from userspace?
The ltrace readme says "software breakpoints, like gdb", so I guess
this is the case. Yep, seems like it.

This looks relevant, so I'm adding a Cc here as well:

commit fb1dac909d94ff807cd833d340c6827c3a957159
Author: Peter Zijlstra <[email protected]>
Date: Wed Jan 16 09:51:59 2008 +0100

lockdep: more hardirq annotations for notify_die()

I'm attaching a similarly-looking patch for this case (DO_VM86_ERROR),
though I suspect it might be missing for the other cases
(DO_ERROR/DO_ERROR_INFO) as well.

Does this look like the right fix?

(The patch is also inlined, but expect whitespace breakage.)


Vegard


From: Vegard Nossum <[email protected]>
Date: Thu, 12 Jun 2008 08:49:18 +0200
Subject: [PATCH] x86: more hardirq annotations for notify_die()

Reported-by: Alessandro Suardi <[email protected]>
Signed-off-by: Vegard Nossum <[email protected]>
---
arch/x86/kernel/traps_32.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/traps_32.c b/arch/x86/kernel/traps_32.c
index bde6f63..be9ecae 100644
--- a/arch/x86/kernel/traps_32.c
+++ b/arch/x86/kernel/traps_32.c
@@ -569,6 +569,7 @@ void do_##name(struct pt_regs *regs, long error_code) \
#define DO_VM86_ERROR(trapnr, signr, str, name) \
void do_##name(struct pt_regs *regs, long error_code) \
{ \
+ trace_hardirqs_fixup(); \
if (notify_die(DIE_TRAP, str, regs, error_code, trapnr, signr) \
== NOTIFY_STOP) \
return; \
--
1.5.4.1


Attachments:
(No filename) (4.26 kB)
0001-x86-more-hardirq-annotations-for-notify_die.patch (955.00 B)
Download all attachments

2008-06-12 07:05:23

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [2.6.26-rc5-git2] WARNING: at kernel/lockdep.c:2658 check_flags+0x4c/0x128()

On Thu, 2008-06-12 at 08:55 +0200, Vegard Nossum wrote:


> From: Vegard Nossum <[email protected]>
> Date: Thu, 12 Jun 2008 08:49:18 +0200
> Subject: [PATCH] x86: more hardirq annotations for notify_die()
>
> Reported-by: Alessandro Suardi <[email protected]>
> Signed-off-by: Vegard Nossum <[email protected]>
> ---
> arch/x86/kernel/traps_32.c | 1 +
> 1 files changed, 1 insertions(+), 0 deletions(-)
>
> diff --git a/arch/x86/kernel/traps_32.c b/arch/x86/kernel/traps_32.c
> index bde6f63..be9ecae 100644
> --- a/arch/x86/kernel/traps_32.c
> +++ b/arch/x86/kernel/traps_32.c
> @@ -569,6 +569,7 @@ void do_##name(struct pt_regs *regs, long error_code) \
> #define DO_VM86_ERROR(trapnr, signr, str, name) \
> void do_##name(struct pt_regs *regs, long error_code) \
> { \
> + trace_hardirqs_fixup(); \
> if (notify_die(DIE_TRAP, str, regs, error_code, trapnr, signr) \
> == NOTIFY_STOP) \
> return; \

Yeah, I think this is ok, actual traps don't have any way of knowing the
actual IRQ state afaik. So

Acked-by: Peter Zijlstra <[email protected]>

2008-06-12 07:30:39

by Ingo Molnar

[permalink] [raw]
Subject: Re: [2.6.26-rc5-git2] WARNING: at kernel/lockdep.c:2658 check_flags+0x4c/0x128()


* Peter Zijlstra <[email protected]> wrote:

> On Thu, 2008-06-12 at 08:55 +0200, Vegard Nossum wrote:
>
>
> > From: Vegard Nossum <[email protected]>
> > Date: Thu, 12 Jun 2008 08:49:18 +0200
> > Subject: [PATCH] x86: more hardirq annotations for notify_die()
> >
> > Reported-by: Alessandro Suardi <[email protected]>
> > Signed-off-by: Vegard Nossum <[email protected]>
> > ---
> > arch/x86/kernel/traps_32.c | 1 +
> > 1 files changed, 1 insertions(+), 0 deletions(-)
> >
> > diff --git a/arch/x86/kernel/traps_32.c b/arch/x86/kernel/traps_32.c
> > index bde6f63..be9ecae 100644
> > --- a/arch/x86/kernel/traps_32.c
> > +++ b/arch/x86/kernel/traps_32.c
> > @@ -569,6 +569,7 @@ void do_##name(struct pt_regs *regs, long error_code) \
> > #define DO_VM86_ERROR(trapnr, signr, str, name) \
> > void do_##name(struct pt_regs *regs, long error_code) \
> > { \
> > + trace_hardirqs_fixup(); \
> > if (notify_die(DIE_TRAP, str, regs, error_code, trapnr, signr) \
> > == NOTIFY_STOP) \
> > return; \
>
> Yeah, I think this is ok, actual traps don't have any way of knowing the
> actual IRQ state afaik. So
>
> Acked-by: Peter Zijlstra <[email protected]>

applied to tip/x86/urgent - thanks.

Alessandro, does the patch from Vegard fix the ltrace warning for you?

Ingo

2008-06-12 10:57:09

by Alessandro Suardi

[permalink] [raw]
Subject: Re: [2.6.26-rc5-git2] WARNING: at kernel/lockdep.c:2658 check_flags+0x4c/0x128()

On Thu, Jun 12, 2008 at 9:30 AM, Ingo Molnar <[email protected]> wrote:
>
> * Peter Zijlstra <[email protected]> wrote:
>
>> On Thu, 2008-06-12 at 08:55 +0200, Vegard Nossum wrote:
>>
>>
>> > From: Vegard Nossum <[email protected]>
>> > Date: Thu, 12 Jun 2008 08:49:18 +0200
>> > Subject: [PATCH] x86: more hardirq annotations for notify_die()
>> >
>> > Reported-by: Alessandro Suardi <[email protected]>
>> > Signed-off-by: Vegard Nossum <[email protected]>
>> > ---
>> > arch/x86/kernel/traps_32.c | 1 +
>> > 1 files changed, 1 insertions(+), 0 deletions(-)
>> >
>> > diff --git a/arch/x86/kernel/traps_32.c b/arch/x86/kernel/traps_32.c
>> > index bde6f63..be9ecae 100644
>> > --- a/arch/x86/kernel/traps_32.c
>> > +++ b/arch/x86/kernel/traps_32.c
>> > @@ -569,6 +569,7 @@ void do_##name(struct pt_regs *regs, long error_code) \
>> > #define DO_VM86_ERROR(trapnr, signr, str, name) \
>> > void do_##name(struct pt_regs *regs, long error_code) \
>> > { \
>> > + trace_hardirqs_fixup(); \
>> > if (notify_die(DIE_TRAP, str, regs, error_code, trapnr, signr) \
>> > == NOTIFY_STOP) \
>> > return; \
>>
>> Yeah, I think this is ok, actual traps don't have any way of knowing the
>> actual IRQ state afaik. So
>>
>> Acked-by: Peter Zijlstra <[email protected]>
>
> applied to tip/x86/urgent - thanks.
>
> Alessandro, does the patch from Vegard fix the ltrace warning for you?
>
> Ingo
>

I don't know whether the issue is reproducable at will.

I'll give the patch a go when I come home from work
later, trying to put back the rogue nautilus RPM and
putting it under ltrace, then report back.

Thanks for the quick turnaround, hope to be able to
provide feedback later. Ciao,

--alessandro

"Give me love / Or give me hate
Give me anything that's not just ok"

(Sophia, 'Weightless')

2008-06-21 17:22:50

by Alessandro Suardi

[permalink] [raw]
Subject: Re: [2.6.26-rc5-git2] WARNING: at kernel/lockdep.c:2658 check_flags+0x4c/0x128()

On Thu, Jun 12, 2008 at 12:56 PM, Alessandro Suardi
<[email protected]> wrote:
> On Thu, Jun 12, 2008 at 9:30 AM, Ingo Molnar <[email protected]> wrote:
>>
>> * Peter Zijlstra <[email protected]> wrote:
>>
>>> On Thu, 2008-06-12 at 08:55 +0200, Vegard Nossum wrote:
>>>
>>>
>>> > From: Vegard Nossum <[email protected]>
>>> > Date: Thu, 12 Jun 2008 08:49:18 +0200
>>> > Subject: [PATCH] x86: more hardirq annotations for notify_die()
>>> >
>>> > Reported-by: Alessandro Suardi <[email protected]>
>>> > Signed-off-by: Vegard Nossum <[email protected]>
>>> > ---
>>> > arch/x86/kernel/traps_32.c | 1 +
>>> > 1 files changed, 1 insertions(+), 0 deletions(-)
>>> >
>>> > diff --git a/arch/x86/kernel/traps_32.c b/arch/x86/kernel/traps_32.c
>>> > index bde6f63..be9ecae 100644
>>> > --- a/arch/x86/kernel/traps_32.c
>>> > +++ b/arch/x86/kernel/traps_32.c
>>> > @@ -569,6 +569,7 @@ void do_##name(struct pt_regs *regs, long error_code) \
>>> > #define DO_VM86_ERROR(trapnr, signr, str, name) \
>>> > void do_##name(struct pt_regs *regs, long error_code) \
>>> > { \
>>> > + trace_hardirqs_fixup(); \
>>> > if (notify_die(DIE_TRAP, str, regs, error_code, trapnr, signr) \
>>> > == NOTIFY_STOP) \
>>> > return; \
>>>
>>> Yeah, I think this is ok, actual traps don't have any way of knowing the
>>> actual IRQ state afaik. So
>>>
>>> Acked-by: Peter Zijlstra <[email protected]>
>>
>> applied to tip/x86/urgent - thanks.
>>
>> Alessandro, does the patch from Vegard fix the ltrace warning for you?
>>
>> Ingo
>>
>
> I don't know whether the issue is reproducable at will.
>
> I'll give the patch a go when I come home from work
> later, trying to put back the rogue nautilus RPM and
> putting it under ltrace, then report back.
>
> Thanks for the quick turnaround, hope to be able to
> provide feedback later. Ciao,

All right, finally had time to try this one, and it's good for
me - patch on top of 2.6.26-rc7, the WARNING does not
appear in over three minutes of ltracing nautilus even
with load climbing up to 6 while just two gnome-terminals
(one running 'top' and the other running 'ltrace') and a
firefox with two tabs are opened under XVnc. Hence,

Tested-by: <[email protected]>

BTW, both the old and the new nautilus package have
horrible performance under VNC, I guess I noticed the
new one simply due to a fresh-start effect...

Thanks !

--alessandro

"Give me love / Or give me hate
Give me anything that's not just ok"

(Sophia, 'Weightless')