2005-11-14 19:09:47

by Luca Falavigna

[permalink] [raw]
Subject: [BUG] Softlockup detected with linux-2.6.14-rt6

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I found this softlockup bug involving arts daemon using a
linux-2.6.14-rt6 kernel (with "Complete Preemption" and "Detect Soft
Lockups" compiled in).
This bug does not happen everytime: I was able to reproduce it only
three times in a week. Here is the oops message (obtained from my
printer because system is frozen):

BUG: artsd:4177, possible softlockup detected on CPU#0!
[<c0146db4>] softlockup_detected+0x34/0x40 (8)
[<c0146e69>] softlockup_tick+0xa9/axb0 (20)
[<c01388a1>] timer_interrupt+0x21/0x40 (12)
[<c014724e>] handle_IRQ_event+0x63/0xf0 (12)
[<c01476a3>] __do_IRQ+0xa3/0x150 (48)
[<c0105594>] do_IRQ+0x34/0x70 (40)
[<c0103d42>] common_interrupt+0x1a/0x20 (8)
[<c0103d42>] common_interrupt+0x1a/0x20 (20)
[<c01bfcc0>] __delay+0x20/0x30 (44)
[<c91f16a4>] snd_timer_close+0x1b4/0x2b0 [snd_timer] (12)
[<c0a7f36a>] fasync_helper+0x7a/0x100 (12)
[<c91f307c>] snd_timer_user_release+0x4c/0x80 [snd_timer] (28)
[<c016c14d>] __fput+0xad/0x1a0 (24)
[<c0a6a632>] filp_close+0x52/0x90 (40)
[<c016a6e0>] sys_close+0x70/0xc0 (24)
[<c01032fd>] syscall_call+0x7/0xb (28)

Regards,
- --
Luca
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iQEVAwUBQ3i0ipK+HIH6ZZ2zAQJ6mwf/QfAY+f+JFfb2cjLIoBrnY6rFsDBFE9XR
KeJQmYuRRRmesI/lgNdAH7Wnb3QIqw62AKncWRj3JQj5IOhfQoexnf9IYg08XJoS
oDD1mQI5udDJBeGfIUg2AUiPYbtvAdtAZnJQ3QWs9PcN4G6Zi2y9+WfHTFqF/x+P
J9vMQa8mSmt3jAQ8E3th/i+njGekk/O7lp86A6CSKfLqhuxPvMJtwYTtZams7kRV
xyfKEQ7nkMmWNTYu/Beuxdes2ME70ZBLVSg54OJluhWfKBjviuJCbUT0itGRDTaL
RZ7Uev1OE7QiVY6g55xYRCwB/QO3un93iQy4N1rXNyIGS2koVGsq5Q==
=u9dF
-----END PGP SIGNATURE-----


2005-11-14 19:56:49

by john cooper

[permalink] [raw]
Subject: Re: [BUG] Softlockup detected with linux-2.6.14-rt6

Luca Falavigna wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> I found this softlockup bug involving arts daemon using a
> linux-2.6.14-rt6 kernel (with "Complete Preemption" and "Detect Soft
> Lockups" compiled in).
> This bug does not happen everytime: I was able to reproduce it only
> three times in a week. Here is the oops message (obtained from my
> printer because system is frozen):
>
> BUG: artsd:4177, possible softlockup detected on CPU#0!
> [<c0146db4>] softlockup_detected+0x34/0x40 (8)
> [<c0146e69>] softlockup_tick+0xa9/axb0 (20)
> [<c01388a1>] timer_interrupt+0x21/0x40 (12)
> [<c014724e>] handle_IRQ_event+0x63/0xf0 (12)
> [<c01476a3>] __do_IRQ+0xa3/0x150 (48)
> [<c0105594>] do_IRQ+0x34/0x70 (40)
> [<c0103d42>] common_interrupt+0x1a/0x20 (8)
> [<c0103d42>] common_interrupt+0x1a/0x20 (20)
> [<c01bfcc0>] __delay+0x20/0x30 (44)
> [<c91f16a4>] snd_timer_close+0x1b4/0x2b0 [snd_timer] (12)
> [<c0a7f36a>] fasync_helper+0x7a/0x100 (12)
> [<c91f307c>] snd_timer_user_release+0x4c/0x80 [snd_timer] (28)
> [<c016c14d>] __fput+0xad/0x1a0 (24)
> [<c0a6a632>] filp_close+0x52/0x90 (40)
> [<c016a6e0>] sys_close+0x70/0xc0 (24)
> [<c01032fd>] syscall_call+0x7/0xb (28)

We are seeing this too, particularly [IIRC exclusively]
on MIPS and PPC though there isn't any obvious target
architecture correlation. It appears to be a false
positive as the system seems otherwise responsive to
general scheduling. It's on my list to address but not
a priority just yet.

Adding a sched pri/policy printk to show_task() and
inserting a call to show_state() in the softlockup
detect routine should shed some light.

-john

--
[email protected]

2005-11-15 15:32:44

by Ingo Molnar

[permalink] [raw]
Subject: Re: [BUG] Softlockup detected with linux-2.6.14-rt6


* Luca Falavigna <[email protected]> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> I found this softlockup bug involving arts daemon using a
> linux-2.6.14-rt6 kernel (with "Complete Preemption" and "Detect Soft
> Lockups" compiled in).
> This bug does not happen everytime: I was able to reproduce it only
> three times in a week. [...]

does this happen with -rt13 too? I have fixed a softlockup
false-positive in it.

Ingo

2005-11-15 17:14:08

by john cooper

[permalink] [raw]
Subject: Re: [BUG] Softlockup detected with linux-2.6.14-rt6

Ingo Molnar wrote:
> * Luca Falavigna <[email protected]> wrote:
>>I found this softlockup bug involving arts daemon using a
>>linux-2.6.14-rt6 kernel (with "Complete Preemption" and "Detect Soft
>>Lockups" compiled in).
>>This bug does not happen everytime: I was able to reproduce it only
>>three times in a week. [...]
>
>
> does this happen with -rt13 too? I have fixed a softlockup
> false-positive in it.

Just curious what the cause of the false positive was?

-john

--
[email protected]

2005-11-15 20:00:06

by Ingo Molnar

[permalink] [raw]
Subject: Re: [BUG] Softlockup detected with linux-2.6.14-rt6


* john cooper <[email protected]> wrote:

> Ingo Molnar wrote:
> >* Luca Falavigna <[email protected]> wrote:
> >>I found this softlockup bug involving arts daemon using a
> >>linux-2.6.14-rt6 kernel (with "Complete Preemption" and "Detect Soft
> >>Lockups" compiled in).
> >>This bug does not happen everytime: I was able to reproduce it only
> >>three times in a week. [...]
> >
> >
> >does this happen with -rt13 too? I have fixed a softlockup
> >false-positive in it.
>
> Just curious what the cause of the false positive was?

the fix is below - we didnt reset the 'light' counter in the else
branch.

Ingo

Index: linux/kernel/softlockup.c
===================================================================
--- linux.orig/kernel/softlockup.c
+++ linux/kernel/softlockup.c
@@ -90,7 +90,8 @@ void softlockup_tick(void)

wake_up_process(per_cpu(watchdog_task, this_cpu));
per_cpu(timeout, this_cpu) = jiffies + msecs_to_jiffies(1000);
- }
+ } else
+ touch_light_softlockup_watchdog();

if (per_cpu(print_timestamp, this_cpu) == timestamp)
return;

2005-11-15 20:07:31

by Daniel Walker

[permalink] [raw]
Subject: Re: [BUG] Softlockup detected with linux-2.6.14-rt6


Wasn't this thread related to a real lock up? I thought Luca said that
the stack trace came from his printer.

I thought it might be related to the __delay troubles ..

Daniel

On Tue, 15 Nov 2005, Ingo Molnar wrote:

>
> * john cooper <[email protected]> wrote:
>
>> Ingo Molnar wrote:
>>> * Luca Falavigna <[email protected]> wrote:
>>>> I found this softlockup bug involving arts daemon using a
>>>> linux-2.6.14-rt6 kernel (with "Complete Preemption" and "Detect Soft
>>>> Lockups" compiled in).
>>>> This bug does not happen everytime: I was able to reproduce it only
>>>> three times in a week. [...]
>>>
>>>
>>> does this happen with -rt13 too? I have fixed a softlockup
>>> false-positive in it.
>>
>> Just curious what the cause of the false positive was?
>
> the fix is below - we didnt reset the 'light' counter in the else
> branch.
>
> Ingo
>
> Index: linux/kernel/softlockup.c
> ===================================================================
> --- linux.orig/kernel/softlockup.c
> +++ linux/kernel/softlockup.c
> @@ -90,7 +90,8 @@ void softlockup_tick(void)
>
> wake_up_process(per_cpu(watchdog_task, this_cpu));
> per_cpu(timeout, this_cpu) = jiffies + msecs_to_jiffies(1000);
> - }
> + } else
> + touch_light_softlockup_watchdog();
>
> if (per_cpu(print_timestamp, this_cpu) == timestamp)
> return;
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2005-11-16 00:14:24

by Luca Falavigna

[permalink] [raw]
Subject: Re: [BUG] Softlockup detected with linux-2.6.14-rt6

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Daniel Walker ha scritto:
|
| Wasn't this thread related to a real lock up? I thought Luca said that
| the stack trace came from his printer.
|
| I thought it might be related to the __delay troubles ..
|
| Daniel

Yes, stack trace was generated by console line printer because system
was completely stucked.
Actually I am testing linux-2.6.14-rt13. If this problem come out again,
I will notify you ASAP.

Regards,
- --
Luca
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iQEVAwUBQ3p6VpK+HIH6ZZ2zAQK/Owf9HvYGrO5IRZFuHByM/KlhoZ/5Abc2vSsk
/J4aZAsWOSlJOC55N6PsTa5iDIGa7ddKH4ieb1ckNGxUxc8baFSCiuWq/dJta57T
VRdkh588nPWaSQ9lIQSb5jy4RP8J8uyPc+FQVs//W+sj5YtXsikcB7Jh70GOBKGD
VOdMTI0bTmD/GahNwGcLGbZDGrVDvfwT0t9tXqCyC4SrQhQfWjyExYArej1hfACL
xAAM0WTNoi9e1DTeq9YSBXQnfTWj0eyjIWaRNhDcIjPBk3IPumF2xIPlg8nUHpZE
gazTuzSzZNerXQV62gof8opQUfrLdP/9XR57kRdw0fGRo06a1qZktA==
=e2JM
-----END PGP SIGNATURE-----

2005-11-18 15:22:32

by Luca Falavigna

[permalink] [raw]
Subject: Re: [BUG] Softlockup detected with linux-2.6.14-rt6

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Luca Falavigna ha scritto:
| Actually I am testing linux-2.6.14-rt13. If this problem come out again,
| I will notify you ASAP.

Unfortunately this soft lockup bug affects version 2.6.14-rt13 too :(
Stack traces are almost the same, except base addresses.

The bug was much harder to track down this time. Kernel detected a soft
lockup bug after about three days of uptime.

I noticed another strange behaviour. When I rebooted the machine
(actually pushing reset button, not CTRL+ALT+CANC or SysRq at all), it
became *very* slow. GRUB took up to five seconds to be fully loaded
(usualy it takes only a fraction of second) and the initial phase of
booting the kernel was about 4 times slower. Even after a power off this
behaviour was still present! Need more coffee?

Regards,
- --
Luca

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iQEVAwUBQ33wppK+HIH6ZZ2zAQIYSQgAh7T0/a40F9ImNK4Ut7WvVxdLZn7Hekvn
8xBr2jSm5Vz4Ewiht1kZ+sXvAV98J8oDGBghQK8CVE+LdGnyFxncMH8sIlnOViq1
PAwVXYOI7FQzk0UX0FTfsVcsmGkqZk653HC/VRTfY+7xg0NKzUFTjwZg0pDmhkoC
yKg73Krr2Zx9N4JH5v2y9uFHhn3YC7mPmbO1mSsmF6WIcePLaY3+Um/FGkQZXr3b
ZDwAd8IQmbgiJiyO7sAgc6ydPWxjDzrlPTri7pFHERkJJKxQAan+rKxYdGO7e9C4
57iny6wa2C2sdmn8quL+ZXnmpgB7vNRroDZ1kg5Qm/1SiKjbvC5BqQ==
=RNUt
-----END PGP SIGNATURE-----

2005-11-18 18:34:19

by K.R. Foley

[permalink] [raw]
Subject: Re: [BUG] Softlockup detected with linux-2.6.14-rt6

Luca Falavigna wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Luca Falavigna ha scritto:
> | Actually I am testing linux-2.6.14-rt13. If this problem come out again,
> | I will notify you ASAP.
>
> Unfortunately this soft lockup bug affects version 2.6.14-rt13 too :(
> Stack traces are almost the same, except base addresses.
>
> The bug was much harder to track down this time. Kernel detected a soft
> lockup bug after about three days of uptime.
>
> I noticed another strange behaviour. When I rebooted the machine
> (actually pushing reset button, not CTRL+ALT+CANC or SysRq at all), it
> became *very* slow. GRUB took up to five seconds to be fully loaded
> (usualy it takes only a fraction of second) and the initial phase of
> booting the kernel was about 4 times slower. Even after a power off this
> behaviour was still present! Need more coffee?

The slow booting sounds a lot like the RCU_TORTURE_TEST slowing things
down. You might try turning off RCU_TORTURE_TEST or setting it to be a
module.

>
> Regards,
> - --
> Luca
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.2 (GNU/Linux)
> Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
>
> iQEVAwUBQ33wppK+HIH6ZZ2zAQIYSQgAh7T0/a40F9ImNK4Ut7WvVxdLZn7Hekvn
> 8xBr2jSm5Vz4Ewiht1kZ+sXvAV98J8oDGBghQK8CVE+LdGnyFxncMH8sIlnOViq1
> PAwVXYOI7FQzk0UX0FTfsVcsmGkqZk653HC/VRTfY+7xg0NKzUFTjwZg0pDmhkoC
> yKg73Krr2Zx9N4JH5v2y9uFHhn3YC7mPmbO1mSsmF6WIcePLaY3+Um/FGkQZXr3b
> ZDwAd8IQmbgiJiyO7sAgc6ydPWxjDzrlPTri7pFHERkJJKxQAan+rKxYdGO7e9C4
> 57iny6wa2C2sdmn8quL+ZXnmpgB7vNRroDZ1kg5Qm/1SiKjbvC5BqQ==
> =RNUt
> -----END PGP SIGNATURE-----
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>


--
kr

2005-11-19 00:48:07

by Luca Falavigna

[permalink] [raw]
Subject: Re: [BUG] Softlockup detected with linux-2.6.14-rt6

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

K.R. Foley ha scritto:
|> The slow booting sounds a lot like the RCU_TORTURE_TEST slowing things
|> down. You might try turning off RCU_TORTURE_TEST or setting it to be a
|> module.

RCU torture test is not compiled at all
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iQEVAwUBQ352ypK+HIH6ZZ2zAQKyKAf8DCP61LE52PSx7yedqof2G9qGfhqa6Tiz
/tgTfAmzzsQ2KvQUiYIDU5snmZmM5vbMYvtv07T/D+E5o/9BqIa8yJt1mFh0EcPr
uOGljrUTCyNdR+4LXe2H8Tdacc7k0DOjfxxDDoTKJoWv6al4a+K5FAy6wxfHIr2t
whpsceXSZxyRPY9EmOpupFxcGjnoq+ywURqPHm3D8qVfSJwT24U9bRu/7hkrbFy6
lRJQssnD7o/Dcf+xtuvQkDD5AzrcOVdI8KDGD1AO+S7pGcnPAxA0aPdeCQsUuOs5
RF6Z7VeQuBvffuxSXW7BgGNA9W0aA+UZMFw0qVkqrjblTTp/QhcExA==
=Qjq4
-----END PGP SIGNATURE-----