2011-03-24 10:25:59

by Sedat Dilek

[permalink] [raw]
Subject: Re: linux-next: Tree for March 24 (BUG: soft lockup - CPU#0 stuck for 63s! [kworker/0:1:7764])

Hi,

with today's linux-next (next-20110324) I see the following Call trace:

[ build.log ]
...
[ 1374.280369] BUG: soft lockup - CPU#0 stuck for 63s! [kworker/0:1:7764]
[ 1374.280374] Modules linked in: btrfs zlib_deflate crc32c libcrc32c
ufs qnx4 hfsplus hfs minix ntfs vfat msdos fat jfs xfs reiserfs ext3
jbd ext2 rfcomm bnep bluetooth aes_i586 aes_generic binfmt_misc ppdev
lp acpi_cpufreq mperf cpufreq_powersave cpufreq_userspace
cpufreq_stats cpufreq_conservative fuse snd_intel8x0m snd_intel8x0
snd_ac97_codec ac97_bus snd_pcm_oss thinkpad_acpi snd_mixer_oss radeon
snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq pcmcia
snd_timer ttm drm_kms_helper joydev yenta_socket drm pcmcia_rsrc
pcmcia_core i2c_algo_bit tpm_tis i2c_i801 nsc_ircc shpchp
snd_seq_device pci_hotplug snd_page_alloc snd tpm rng_core i2c_core
irda tpm_bios battery ac parport_pc soundcore evdev crc_ccitt
power_supply nvram processor video parport button psmouse serio_raw
pcspkr arc4 ecb ath5k ath mac80211 cfg80211 rfkill autofs4 ext4
mbcache jbd2 crc16 dm_mod usbhid hid usb_storage uas sg sd_mod sr_mod
cdrom crc_t10dif ata_generic ata_piix libata uhci_hcd ehci_hcd usbcore
scsi_mod e1000 thermal thermal_sys floppy [last unloaded:
scsi_wait_scan]
[ 1374.280502] Modules linked in: btrfs zlib_deflate crc32c libcrc32c
ufs qnx4 hfsplus hfs minix ntfs vfat msdos fat jfs xfs reiserfs ext3
jbd ext2 rfcomm bnep bluetooth aes_i586 aes_generic binfmt_misc ppdev
lp acpi_cpufreq mperf cpufreq_powersave cpufreq_userspace
cpufreq_stats cpufreq_conservative fuse snd_intel8x0m snd_intel8x0
snd_ac97_codec ac97_bus snd_pcm_oss thinkpad_acpi snd_mixer_oss radeon
snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq pcmcia
snd_timer ttm drm_kms_helper joydev yenta_socket drm pcmcia_rsrc
pcmcia_core i2c_algo_bit tpm_tis i2c_i801 nsc_ircc shpchp
snd_seq_device pci_hotplug snd_page_alloc snd tpm rng_core i2c_core
irda tpm_bios battery ac parport_pc soundcore evdev crc_ccitt
power_supply nvram processor video parport button psmouse serio_raw
pcspkr arc4 ecb ath5k ath mac80211 cfg80211 rfkill autofs4 ext4
mbcache jbd2 crc16 dm_mod usbhid hid usb_storage uas sg sd_mod sr_mod
cdrom crc_t10dif ata_generic ata_piix libata uhci_hcd ehci_hcd usbcore
scsi_mod e1000 thermal thermal_sys floppy [last unloaded:
scsi_wait_scan]
[ 1374.280604]
[ 1374.280608] Pid: 7764, comm: kworker/0:1 Not tainted
2.6.38-next20110324-2-686-iniza #1 IBM 2374SG6/2374SG6
[ 1374.280616] EIP: 0060:[<c10546a1>] EFLAGS: 00000282 CPU: 0
[ 1374.280626] EIP is at arch_local_irq_restore+0x9/0xb
[ 1374.280629] EAX: 00000282 EBX: c1400514 ECX: c1400514 EDX: 00000282
[ 1374.280633] ESI: 00000282 EDI: 000006c3 EBP: f733bef8 ESP: f733bef8
[ 1374.280637] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[ 1374.280641] Process kworker/0:1 (pid: 7764, ti=f733a000
task=dd074000 task.ti=f733a000)
[ 1374.280644] Stack:
[ 1374.280647] f733bf00 c12a544a f733bf20 c1023397 00000000 00000000
00000001 00000003
[ 1374.280655] 00000286 f7392900 f733bf34 c11b2986 00000000 00000000
eeab10bc f733bf3c
[ 1374.280664] c11b2996 f733bf6c c11b3385 00000000 f7398b80 ee950800
f7392900 00000286
[ 1374.280672] Call Trace:
[ 1374.280682] [<c12a544a>] _raw_spin_unlock_irqrestore+0xe/0x10
[ 1374.280690] [<c1023397>] __wake_up+0x32/0x3b
[ 1374.280696] [<c11b2986>] put_ldisc+0x8a/0x92
[ 1374.280700] [<c11b2996>] tty_ldisc_deref+0x8/0xa
[ 1374.280705] [<c11b3385>] flush_to_ldisc+0x13e/0x146
[ 1374.280711] [<c1040d79>] process_one_work+0x12c/0x211
[ 1374.280716] [<c103fe77>] ? wq_barrier_func+0x0/0xd
[ 1374.280721] [<c11b3247>] ? flush_to_ldisc+0x0/0x146
[ 1374.280725] [<c1041a70>] worker_thread+0xb0/0x12b
[ 1374.280730] [<c10419c0>] ? worker_thread+0x0/0x12b
[ 1374.280736] [<c1044436>] kthread+0x62/0x67
[ 1374.280741] [<c10443d4>] ? kthread+0x0/0x67
[ 1374.280747] [<c12ab37e>] kernel_thread_helper+0x6/0xd
[ 1374.280749] Code: c1 89 e5 e8 a8 0d 25 00 5d c3 55 89 e5 fe 05 08
9b 3b c1 fb 90 8d 74 26 00 5d c3 55 89 e5 5d c3 90 90 55 89 e5 50 9d
8d 74 26 00 <5d> c3 55 89 e5 fa 90 8d 74 26 00 5d c3 55 89 e5 50 9c 58
8d 74
[ 1374.280794] Call Trace:
[ 1374.280798] [<c12a544a>] _raw_spin_unlock_irqrestore+0xe/0x10
[ 1374.280803] [<c1023397>] __wake_up+0x32/0x3b
[ 1374.280807] [<c11b2986>] put_ldisc+0x8a/0x92
[ 1374.280811] [<c11b2996>] tty_ldisc_deref+0x8/0xa
[ 1374.280815] [<c11b3385>] flush_to_ldisc+0x13e/0x146
[ 1374.280820] [<c1040d79>] process_one_work+0x12c/0x211
[ 1374.280824] [<c103fe77>] ? wq_barrier_func+0x0/0xd
[ 1374.280828] [<c11b3247>] ? flush_to_ldisc+0x0/0x146
[ 1374.280833] [<c1041a70>] worker_thread+0xb0/0x12b
[ 1374.280837] [<c10419c0>] ? worker_thread+0x0/0x12b
[ 1374.280842] [<c1044436>] kthread+0x62/0x67
[ 1374.280846] [<c10443d4>] ? kthread+0x0/0x67
[ 1374.280851] [<c12ab37e>] kernel_thread_helper+0x6/0xd

Any idea?
(Later I will walk through the latest commits...)

Regards,
- Sedat -

P.S.: Attached is my kernel-config (Just in case it matters: I have
overlayfs-v7 and vfs-inode-lock-breakup patchsets in addition)


Attachments:
config-2.6.38-next20110324-2-686-iniza (125.33 kB)

2011-03-24 15:09:40

by Sedat Dilek

[permalink] [raw]
Subject: Re: linux-next: Tree for March 24 (BUG: soft lockup - CPU#0 stuck for 63s! [kworker/0:1:7764])

On Thu, Mar 24, 2011 at 11:25 AM, Sedat Dilek
<[email protected]> wrote:
> Hi,
>
> with today's linux-next (next-20110324) I see the following Call trace:
>
> [ build.log ]
> ...
> [ 1374.280369] BUG: soft lockup - CPU#0 stuck for 63s! [kworker/0:1:7764]
> [ 1374.280374] Modules linked in: btrfs zlib_deflate crc32c libcrc32c
> ufs qnx4 hfsplus hfs minix ntfs vfat msdos fat jfs xfs reiserfs ext3
> jbd ext2 rfcomm bnep bluetooth aes_i586 aes_generic binfmt_misc ppdev
> lp acpi_cpufreq mperf cpufreq_powersave cpufreq_userspace
> cpufreq_stats cpufreq_conservative fuse snd_intel8x0m snd_intel8x0
> snd_ac97_codec ac97_bus snd_pcm_oss thinkpad_acpi snd_mixer_oss radeon
> snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq pcmcia
> snd_timer ttm drm_kms_helper joydev yenta_socket drm pcmcia_rsrc
> pcmcia_core i2c_algo_bit tpm_tis i2c_i801 nsc_ircc shpchp
> snd_seq_device pci_hotplug snd_page_alloc snd tpm rng_core i2c_core
> irda tpm_bios battery ac parport_pc soundcore evdev crc_ccitt
> power_supply nvram processor video parport button psmouse serio_raw
> pcspkr arc4 ecb ath5k ath mac80211 cfg80211 rfkill autofs4 ext4
> mbcache jbd2 crc16 dm_mod usbhid hid usb_storage uas sg sd_mod sr_mod
> cdrom crc_t10dif ata_generic ata_piix libata uhci_hcd ehci_hcd usbcore
> scsi_mod e1000 thermal thermal_sys floppy [last unloaded:
> scsi_wait_scan]
> [ 1374.280502] Modules linked in: btrfs zlib_deflate crc32c libcrc32c
> ufs qnx4 hfsplus hfs minix ntfs vfat msdos fat jfs xfs reiserfs ext3
> jbd ext2 rfcomm bnep bluetooth aes_i586 aes_generic binfmt_misc ppdev
> lp acpi_cpufreq mperf cpufreq_powersave cpufreq_userspace
> cpufreq_stats cpufreq_conservative fuse snd_intel8x0m snd_intel8x0
> snd_ac97_codec ac97_bus snd_pcm_oss thinkpad_acpi snd_mixer_oss radeon
> snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq pcmcia
> snd_timer ttm drm_kms_helper joydev yenta_socket drm pcmcia_rsrc
> pcmcia_core i2c_algo_bit tpm_tis i2c_i801 nsc_ircc shpchp
> snd_seq_device pci_hotplug snd_page_alloc snd tpm rng_core i2c_core
> irda tpm_bios battery ac parport_pc soundcore evdev crc_ccitt
> power_supply nvram processor video parport button psmouse serio_raw
> pcspkr arc4 ecb ath5k ath mac80211 cfg80211 rfkill autofs4 ext4
> mbcache jbd2 crc16 dm_mod usbhid hid usb_storage uas sg sd_mod sr_mod
> cdrom crc_t10dif ata_generic ata_piix libata uhci_hcd ehci_hcd usbcore
> scsi_mod e1000 thermal thermal_sys floppy [last unloaded:
> scsi_wait_scan]
> [ 1374.280604]
> [ 1374.280608] Pid: 7764, comm: kworker/0:1 Not tainted
> 2.6.38-next20110324-2-686-iniza #1 IBM 2374SG6/2374SG6
> [ 1374.280616] EIP: 0060:[<c10546a1>] EFLAGS: 00000282 CPU: 0
> [ 1374.280626] EIP is at arch_local_irq_restore+0x9/0xb
> [ 1374.280629] EAX: 00000282 EBX: c1400514 ECX: c1400514 EDX: 00000282
> [ 1374.280633] ESI: 00000282 EDI: 000006c3 EBP: f733bef8 ESP: f733bef8
> [ 1374.280637]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> [ 1374.280641] Process kworker/0:1 (pid: 7764, ti=f733a000
> task=dd074000 task.ti=f733a000)
> [ 1374.280644] Stack:
> [ 1374.280647]  f733bf00 c12a544a f733bf20 c1023397 00000000 00000000
> 00000001 00000003
> [ 1374.280655]  00000286 f7392900 f733bf34 c11b2986 00000000 00000000
> eeab10bc f733bf3c
> [ 1374.280664]  c11b2996 f733bf6c c11b3385 00000000 f7398b80 ee950800
> f7392900 00000286
> [ 1374.280672] Call Trace:
> [ 1374.280682]  [<c12a544a>] _raw_spin_unlock_irqrestore+0xe/0x10
> [ 1374.280690]  [<c1023397>] __wake_up+0x32/0x3b
> [ 1374.280696]  [<c11b2986>] put_ldisc+0x8a/0x92
> [ 1374.280700]  [<c11b2996>] tty_ldisc_deref+0x8/0xa
> [ 1374.280705]  [<c11b3385>] flush_to_ldisc+0x13e/0x146
> [ 1374.280711]  [<c1040d79>] process_one_work+0x12c/0x211
> [ 1374.280716]  [<c103fe77>] ? wq_barrier_func+0x0/0xd
> [ 1374.280721]  [<c11b3247>] ? flush_to_ldisc+0x0/0x146
> [ 1374.280725]  [<c1041a70>] worker_thread+0xb0/0x12b
> [ 1374.280730]  [<c10419c0>] ? worker_thread+0x0/0x12b
> [ 1374.280736]  [<c1044436>] kthread+0x62/0x67
> [ 1374.280741]  [<c10443d4>] ? kthread+0x0/0x67
> [ 1374.280747]  [<c12ab37e>] kernel_thread_helper+0x6/0xd
> [ 1374.280749] Code: c1 89 e5 e8 a8 0d 25 00 5d c3 55 89 e5 fe 05 08
> 9b 3b c1 fb 90 8d 74 26 00 5d c3 55 89 e5 5d c3 90 90 55 89 e5 50 9d
> 8d 74 26 00 <5d> c3 55 89 e5 fa 90 8d 74 26 00 5d c3 55 89 e5 50 9c 58
> 8d 74
> [ 1374.280794] Call Trace:
> [ 1374.280798]  [<c12a544a>] _raw_spin_unlock_irqrestore+0xe/0x10
> [ 1374.280803]  [<c1023397>] __wake_up+0x32/0x3b
> [ 1374.280807]  [<c11b2986>] put_ldisc+0x8a/0x92
> [ 1374.280811]  [<c11b2996>] tty_ldisc_deref+0x8/0xa
> [ 1374.280815]  [<c11b3385>] flush_to_ldisc+0x13e/0x146
> [ 1374.280820]  [<c1040d79>] process_one_work+0x12c/0x211
> [ 1374.280824]  [<c103fe77>] ? wq_barrier_func+0x0/0xd
> [ 1374.280828]  [<c11b3247>] ? flush_to_ldisc+0x0/0x146
> [ 1374.280833]  [<c1041a70>] worker_thread+0xb0/0x12b
> [ 1374.280837]  [<c10419c0>] ? worker_thread+0x0/0x12b
> [ 1374.280842]  [<c1044436>] kthread+0x62/0x67
> [ 1374.280846]  [<c10443d4>] ? kthread+0x0/0x67
> [ 1374.280851]  [<c12ab37e>] kernel_thread_helper+0x6/0xd
>
> Any idea?
> (Later I will walk through the latest commits...)
>
> Regards,
> - Sedat -
>
> P.S.: Attached is my kernel-config (Just in case it matters: I have
> overlayfs-v7 and vfs-inode-lock-breakup patchsets in addition)
>

[ CCing Vinod Koul + Tomoya MORINAGA ]

With reverting c5a9f9d0895b2c16908979244d3d678fd6db0545 ("pch_dma: fix
kernel error issue") from upstream my old IBM T41p notebook runs about
1.5hrs with no call-traces like seen above (which occured from the
very first start into my desktop).
( Now, I will stress a bit the box by building a mipsel-toolchain. )

I have attached the revert patch and my dmesg output (if you want to
have a look at it).
If you need more details let me know.

- Sedat -


Attachments:
dmesg.txt (53.78 kB)
0001-Revert-pch_dma-fix-kernel-error-issue.patch (3.97 kB)
Download all attachments

2011-03-25 00:08:37

by Tomoya MORINAGA

[permalink] [raw]
Subject: RE: linux-next: Tree for March 24 (BUG: soft lockup - CPU#0 stuck for 63s! [kworker/0:1:7764])

Hi Sedat,

We have used this DMA driver(include this patch) on linux-2.6.35/37.
However I haven't seen the issue yet.

I want to know more information.
Do you mean that it takes 63 more seconds for boot-completing
when using this patch and pch_dma is integrated as built-in ?

Thanks,
-----------------------------------------
Tomoya MORINAGA
OKI SEMICONDUCTOR CO., LTD.

> -----Original Message-----
> From: Sedat Dilek [mailto:[email protected]]
> Sent: Friday, March 25, 2011 12:10 AM
> To: linux-next; LKML
> Cc: Stephen Rothwell; Randy Dunlap; Vinod Koul; Tomoya MORINAGA
> Subject: Re: linux-next: Tree for March 24 (BUG: soft lockup
> - CPU#0 stuck for 63s! [kworker/0:1:7764])

2011-03-25 01:41:02

by Sedat Dilek

[permalink] [raw]
Subject: Re: linux-next: Tree for March 24 (BUG: soft lockup - CPU#0 stuck for 63s! [kworker/0:1:7764])

On Fri, Mar 25, 2011 at 1:08 AM, Tomoya MORINAGA
<[email protected]> wrote:
> Hi  Sedat,
>
> We have used this DMA driver(include this patch) on linux-2.6.35/37.
> However I haven't seen the issue yet.
>
> I want to know more information.
> Do you mean that it takes 63 more seconds for boot-completing
> when using this patch and pch_dma is integrated as built-in ?
>
> Thanks,
> -----------------------------------------
> Tomoya MORINAGA
> OKI SEMICONDUCTOR CO., LTD.
>

I jumped back to the kernel from which I thought it was making
problems and run it several hours.
Unfortunately, I haven't seen any call-traces in the syslogs like before.
So, I could reproduce the errors.
I am sorry for the noise abd will have an eye on the issue.
( So, reverting the patch did not really helped. )

- Sedat -

>> -----Original Message-----
>> From: Sedat Dilek [mailto:[email protected]]
>> Sent: Friday, March 25, 2011 12:10 AM
>> To: linux-next; LKML
>> Cc: Stephen Rothwell; Randy Dunlap; Vinod Koul; Tomoya MORINAGA
>> Subject: Re: linux-next: Tree for March 24 (BUG: soft lockup
>> - CPU#0 stuck for 63s! [kworker/0:1:7764])
>
>