LinuxLists.cc - scheduling while atomic & hang.

2013-07-04 01:55:38

Subject: scheduling while atomic & hang.

This is a pretty context free trace. What the hell happened here?

BUG: scheduling while atomic: trinity-child0/13280/0xefffffff
INFO: lockdep is turned off.
Modules linked in: dlci dccp_ipv6 dccp_ipv4 dccp sctp bridge 8021q garp stp snd_seq_dummy tun fuse hidp rfcomm bnep can_raw can_bcm nfnetlink phonet llc2 pppoe pppox ppp_generic slhc appletalk af_rxrpc ipt_ULOG irda af_key
can atm scsi_transport_iscsi rds rose x25 af_802154 caif_socket ipx caif p8023 psnap crc_ccitt p8022 llc bluetooth nfc rfkill netrom ax25 snd_hda_codec_realtek microcode snd_hda_codec_hdmi pcspkr snd_hda_intel snd_hda_codec snd_hwdep sn
d_seq snd_seq_device usb_debug snd_pcm e1000e snd_page_alloc snd_timer ptp snd pps_core soundcore xfs libcrc32c
CPU: 0 PID: 13280 Comm: trinity-child0 Not tainted 3.10.0+ #40
0000000000000000 ffff880228533ee0 ffffffff816ec1a2 ffff880228533ef8
ffffffff816e782e 0000000000000db5 ffff880228533f60 ffffffff816f42bf
ffff88023bdeca40 ffff880228533fd8 ffff880228533fd8 ffff880228533fd8
Call Trace:
[<ffffffff816ec1a2>] dump_stack+0x19/0x1b
[<ffffffff816e782e>] __schedule_bug+0x61/0x70
[<ffffffff816f42bf>] __schedule+0x94f/0x9c0
[<ffffffff816f487e>] schedule_user+0x2e/0x70
[<ffffffff816f6de4>] retint_careful+0x12/0x2e
BUG: scheduling while atomic: trinity-child0/13280/0xefffffff
INFO: lockdep is turned off.
Modules linked in: dlci dccp_ipv6 dccp_ipv4 dccp sctp bridge 8021q garp stp snd_seq_dummy tun fuse hidp rfcomm bnep can_raw can_bcm nfnetlink phonet llc2 pppoe pppox ppp_generic slhc appletalk af_rxrpc ipt_ULOG irda af_key can atm scsi_transport_iscsi rds rose x25 af_802154 caif_socket ipx caif p8023 psnap crc_ccitt p8022 llc bluetooth nfc rfkill netrom ax25 snd_hda_codec_realtek microcode snd_hda_codec_hdmi pcspkr snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device usb_debug snd_pcm e1000e snd_page_alloc snd_timer ptp snd pps_core soundcore xfs libcrc32c
CPU: 0 PID: 13280 Comm: trinity-child0 Tainted: G W 3.10.0+ #40
0000000000000000 ffff880228533e80 ffffffff816ec1a2 ffff880228533e98
ffffffff816e782e ffff880228533fd8 ffff880228533f00 ffffffff816f42bf
ffff88023bdeca40 ffff880228533fd8 ffff880228533fd8 ffff880228533fd8
Call Trace:
[<ffffffff816ec1a2>] dump_stack+0x19/0x1b
[<ffffffff816e782e>] __schedule_bug+0x61/0x70
[<ffffffff816f42bf>] __schedule+0x94f/0x9c0
[<ffffffff816f487e>] schedule_user+0x2e/0x70
[<ffffffff816ff169>] int_careful+0x12/0x1e
[<ffffffff816f487e>] ? schedule_user+0x2e/0x70
[<ffffffff816f6de4>] ? retint_careful+0x12/0x2e
Kernel panic - not syncing: Aiee, killing interrupt handler!

Box is wedged, and I won't be able to get to it until Friday to poke it.

Dave

2013-07-04 02:49:21

by Linus Torvalds

[permalink] [raw]

Subject: Re: scheduling while atomic & hang.

On Wed, Jul 3, 2013 at 6:55 PM, Dave Jones <[email protected]> wrote:
> This is a pretty context free trace. What the hell happened here?

That lack of call trace looks like it happened at the final stage of
an interrupt or page fault or other trap that is about to return to
user space.

My guess would be that the trap/irq/whatever handler for some odd
reason ended up with an unbalanced spinlock or something. But since
there is no trace of it, I can't even begin to guess what it would be.

Does trinity save enough pseudo-random state that it can be
repeatable, because if it's something repeatable it might be
interesting to see what the last few system calls and traps were...

> Box is wedged, and I won't be able to get to it until Friday to poke it.

Oh well. Adding the x86 people to the cc, since the whole
"retint_careful" does seem imply that it's not a system call entry,
and more likely to be something like a page fault or debug trap or
something. Any ideas, guys?

>From the " 3.10.0+" I assume this is from the merge window, and
possibly a new failure. Do you have an actual git ID? I can heartily
recommend CONFIG_LOCALVERSION_AUTO=y as a way to get commit ID's
encoded in the version string (which is obviously more useful if you
end up running mainly kernels without extra commits of your own on top
of them - if you have your own local commits you'd still need to
translate it into "your kernel XYZ with commits of mine on top")

Linus

---
> BUG: scheduling while atomic: trinity-child0/13280/0xefffffff
> INFO: lockdep is turned off.
> Modules linked in: dlci dccp_ipv6 dccp_ipv4 dccp sctp bridge 8021q garp stp snd_seq_dummy tun fuse hidp rfcomm bnep can_raw can_bcm nfnetlink phonet llc2 pppoe pppox ppp_generic slhc appletalk af_rxrpc ipt_ULOG irda af_key
> can atm scsi_transport_iscsi rds rose x25 af_802154 caif_socket ipx caif p8023 psnap crc_ccitt p8022 llc bluetooth nfc rfkill netrom ax25 snd_hda_codec_realtek microcode snd_hda_codec_hdmi pcspkr snd_hda_intel snd_hda_codec snd_hwdep sn
> d_seq snd_seq_device usb_debug snd_pcm e1000e snd_page_alloc snd_timer ptp snd pps_core soundcore xfs libcrc32c
> CPU: 0 PID: 13280 Comm: trinity-child0 Not tainted 3.10.0+ #40
> 0000000000000000 ffff880228533ee0 ffffffff816ec1a2 ffff880228533ef8
> ffffffff816e782e 0000000000000db5 ffff880228533f60 ffffffff816f42bf
> ffff88023bdeca40 ffff880228533fd8 ffff880228533fd8 ffff880228533fd8
> Call Trace:
> [<ffffffff816ec1a2>] dump_stack+0x19/0x1b
> [<ffffffff816e782e>] __schedule_bug+0x61/0x70
> [<ffffffff816f42bf>] __schedule+0x94f/0x9c0
> [<ffffffff816f487e>] schedule_user+0x2e/0x70
> [<ffffffff816f6de4>] retint_careful+0x12/0x2e
> BUG: scheduling while atomic: trinity-child0/13280/0xefffffff
> INFO: lockdep is turned off.
> Modules linked in: dlci dccp_ipv6 dccp_ipv4 dccp sctp bridge 8021q garp stp snd_seq_dummy tun fuse hidp rfcomm bnep can_raw can_bcm nfnetlink phonet llc2 pppoe pppox ppp_generic slhc appletalk af_rxrpc ipt_ULOG irda af_key can atm scsi_transport_iscsi rds rose x25 af_802154 caif_socket ipx caif p8023 psnap crc_ccitt p8022 llc bluetooth nfc rfkill netrom ax25 snd_hda_codec_realtek microcode snd_hda_codec_hdmi pcspkr snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device usb_debug snd_pcm e1000e snd_page_alloc snd_timer ptp snd pps_core soundcore xfs libcrc32c
> CPU: 0 PID: 13280 Comm: trinity-child0 Tainted: G W 3.10.0+ #40
> 0000000000000000 ffff880228533e80 ffffffff816ec1a2 ffff880228533e98
> ffffffff816e782e ffff880228533fd8 ffff880228533f00 ffffffff816f42bf
> ffff88023bdeca40 ffff880228533fd8 ffff880228533fd8 ffff880228533fd8
> Call Trace:
> [<ffffffff816ec1a2>] dump_stack+0x19/0x1b
> [<ffffffff816e782e>] __schedule_bug+0x61/0x70
> [<ffffffff816f42bf>] __schedule+0x94f/0x9c0
> [<ffffffff816f487e>] schedule_user+0x2e/0x70
> [<ffffffff816ff169>] int_careful+0x12/0x1e
> [<ffffffff816f487e>] ? schedule_user+0x2e/0x70
> [<ffffffff816f6de4>] ? retint_careful+0x12/0x2e
> Kernel panic - not syncing: Aiee, killing interrupt handler!

2013-07-04 04:43:41

by H. Peter Anvin

[permalink] [raw]

Subject: Re: scheduling while atomic & hang.

I'll look harder at the backtrace tomorrow, but my guess is that the cpu has just gotten a scheduling interrupt (time quantum expired.)

Linus Torvalds <[email protected]> wrote:

>On Wed, Jul 3, 2013 at 6:55 PM, Dave Jones <[email protected]> wrote:
>> This is a pretty context free trace. What the hell happened here?
>
>That lack of call trace looks like it happened at the final stage of
>an interrupt or page fault or other trap that is about to return to
>user space.
>
>My guess would be that the trap/irq/whatever handler for some odd
>reason ended up with an unbalanced spinlock or something. But since
>there is no trace of it, I can't even begin to guess what it would be.
>
>Does trinity save enough pseudo-random state that it can be
>repeatable, because if it's something repeatable it might be
>interesting to see what the last few system calls and traps were...
>
>> Box is wedged, and I won't be able to get to it until Friday to poke
>it.
>
>Oh well. Adding the x86 people to the cc, since the whole
>"retint_careful" does seem imply that it's not a system call entry,
>and more likely to be something like a page fault or debug trap or
>something. Any ideas, guys?
>
>From the " 3.10.0+" I assume this is from the merge window, and
>possibly a new failure. Do you have an actual git ID? I can heartily
>recommend CONFIG_LOCALVERSION_AUTO=y as a way to get commit ID's
>encoded in the version string (which is obviously more useful if you
>end up running mainly kernels without extra commits of your own on top
>of them - if you have your own local commits you'd still need to
>translate it into "your kernel XYZ with commits of mine on top")
>
> Linus
>
>---
>> BUG: scheduling while atomic: trinity-child0/13280/0xefffffff
>> INFO: lockdep is turned off.
>> Modules linked in: dlci dccp_ipv6 dccp_ipv4 dccp sctp bridge 8021q
>garp stp snd_seq_dummy tun fuse hidp rfcomm bnep can_raw can_bcm
>nfnetlink phonet llc2 pppoe pppox ppp_generic slhc appletalk af_rxrpc
>ipt_ULOG irda af_key
>> can atm scsi_transport_iscsi rds rose x25 af_802154 caif_socket ipx
>caif p8023 psnap crc_ccitt p8022 llc bluetooth nfc rfkill netrom ax25
>snd_hda_codec_realtek microcode snd_hda_codec_hdmi pcspkr snd_hda_intel
>snd_hda_codec snd_hwdep sn
>> d_seq snd_seq_device usb_debug snd_pcm e1000e snd_page_alloc
>snd_timer ptp snd pps_core soundcore xfs libcrc32c
>> CPU: 0 PID: 13280 Comm: trinity-child0 Not tainted 3.10.0+ #40
>> 0000000000000000 ffff880228533ee0 ffffffff816ec1a2 ffff880228533ef8
>> ffffffff816e782e 0000000000000db5 ffff880228533f60 ffffffff816f42bf
>> ffff88023bdeca40 ffff880228533fd8 ffff880228533fd8 ffff880228533fd8
>> Call Trace:
>> [<ffffffff816ec1a2>] dump_stack+0x19/0x1b
>> [<ffffffff816e782e>] __schedule_bug+0x61/0x70
>> [<ffffffff816f42bf>] __schedule+0x94f/0x9c0
>> [<ffffffff816f487e>] schedule_user+0x2e/0x70
>> [<ffffffff816f6de4>] retint_careful+0x12/0x2e
>> BUG: scheduling while atomic: trinity-child0/13280/0xefffffff
>> INFO: lockdep is turned off.
>> Modules linked in: dlci dccp_ipv6 dccp_ipv4 dccp sctp bridge 8021q
>garp stp snd_seq_dummy tun fuse hidp rfcomm bnep can_raw can_bcm
>nfnetlink phonet llc2 pppoe pppox ppp_generic slhc appletalk af_rxrpc
>ipt_ULOG irda af_key can atm scsi_transport_iscsi rds rose x25
>af_802154 caif_socket ipx caif p8023 psnap crc_ccitt p8022 llc
>bluetooth nfc rfkill netrom ax25 snd_hda_codec_realtek microcode
>snd_hda_codec_hdmi pcspkr snd_hda_intel snd_hda_codec snd_hwdep snd_seq
>snd_seq_device usb_debug snd_pcm e1000e snd_page_alloc snd_timer ptp
>snd pps_core soundcore xfs libcrc32c
>> CPU: 0 PID: 13280 Comm: trinity-child0 Tainted: G W 3.10.0+
>#40
>> 0000000000000000 ffff880228533e80 ffffffff816ec1a2 ffff880228533e98
>> ffffffff816e782e ffff880228533fd8 ffff880228533f00 ffffffff816f42bf
>> ffff88023bdeca40 ffff880228533fd8 ffff880228533fd8 ffff880228533fd8
>> Call Trace:
>> [<ffffffff816ec1a2>] dump_stack+0x19/0x1b
>> [<ffffffff816e782e>] __schedule_bug+0x61/0x70
>> [<ffffffff816f42bf>] __schedule+0x94f/0x9c0
>> [<ffffffff816f487e>] schedule_user+0x2e/0x70
>> [<ffffffff816ff169>] int_careful+0x12/0x1e
>> [<ffffffff816f487e>] ? schedule_user+0x2e/0x70
>> [<ffffffff816f6de4>] ? retint_careful+0x12/0x2e
>> Kernel panic - not syncing: Aiee, killing interrupt handler!

--
Sent from my mobile phone. Please excuse brevity and lack of formatting.

2013-07-04 07:49:20

by Dave Jones

[permalink] [raw]

Subject: Re: scheduling while atomic & hang.

On Wed, Jul 03, 2013 at 07:49:18PM -0700, Linus Torvalds wrote:
> On Wed, Jul 3, 2013 at 6:55 PM, Dave Jones <[email protected]> wrote:
> > This is a pretty context free trace. What the hell happened here?
>
> That lack of call trace looks like it happened at the final stage of
> an interrupt or page fault or other trap that is about to return to
> user space.
>
> My guess would be that the trap/irq/whatever handler for some odd
> reason ended up with an unbalanced spinlock or something. But since
> there is no trace of it, I can't even begin to guess what it would be.
>
> Does trinity save enough pseudo-random state that it can be
> repeatable, because if it's something repeatable it might be
> interesting to see what the last few system calls and traps were...

It should, but I'll have to see if anything useful hit the disk before
it wedged.

> >From the " 3.10.0+" I assume this is from the merge window, and
> possibly a new failure. Do you have an actual git ID? I can heartily
> recommend CONFIG_LOCALVERSION_AUTO=y as a way to get commit ID's
> encoded in the version string (which is obviously more useful if you
> end up running mainly kernels without extra commits of your own on top
> of them - if you have your own local commits you'd still need to
> translate it into "your kernel XYZ with commits of mine on top")

top of tree was 0b0585c3e192967cb2ef0ac0816eb8a8c8d99840 I think.
(That's what it is on my local box that I pull all my test trees from,
and I don't think it changed after I started that run, but I'll
double check on Friday)

I don't use the auto config, because I end up filling up /boot
unless I go through and clean them out by hand every time I install
a new one (which I do probably a dozen or so times a day).
Is there some easy way to prune old builds I'm missing ?

Dave

2013-07-04 17:22:13

by Linus Torvalds

[permalink] [raw]

Subject: Re: scheduling while atomic & hang.

On Thu, Jul 4, 2013 at 12:49 AM, Dave Jones <[email protected]> wrote:
>
> top of tree was 0b0585c3e192967cb2ef0ac0816eb8a8c8d99840 I think.
> (That's what it is on my local box that I pull all my test trees from,
> and I don't think it changed after I started that run, but I'll
> double check on Friday)

Ok. So that has all the scheduler changes, and all the x86 changes
(and all the cgroup workqueue changes too - but without any call chain
at all, I'm disinclined to blame those yet).

> I don't use the auto config, because I end up filling up /boot
> unless I go through and clean them out by hand every time I install
> a new one (which I do probably a dozen or so times a day).
> Is there some easy way to prune old builds I'm missing ?

I'm sure it could be done automatically some way, but yeah, I just do
it by hand. I often *compile* dozens of kernels a day, but I seldom
install more than one or two even during the merge window..

Linus

2013-07-04 18:08:51

by H. Peter Anvin

[permalink] [raw]

Subject: Re: scheduling while atomic & hang.

On 07/03/2013 07:49 PM, Linus Torvalds wrote:
>> [<ffffffff816f42bf>] __schedule+0x94f/0x9c0
>> [<ffffffff816f487e>] schedule_user+0x2e/0x70
>> [<ffffffff816f6de4>] retint_careful+0x12/0x2e

This call trace does indeed indicate that we took a hardware interrupt
which caused a reschedule. It doesn't necessarily have to be a quantum
expiration interrupt, it could just as easily be an I/O completion
interrupt.

-hpa

2013-07-05 06:51:18

by Ingo Molnar

[permalink] [raw]

Subject: Re: scheduling while atomic & hang.

* H. Peter Anvin <[email protected]> wrote:

> On 07/03/2013 07:49 PM, Linus Torvalds wrote:
> >> [<ffffffff816f42bf>] __schedule+0x94f/0x9c0
> >> [<ffffffff816f487e>] schedule_user+0x2e/0x70
> >> [<ffffffff816f6de4>] retint_careful+0x12/0x2e
>
> This call trace does indeed indicate that we took a hardware
> interrupt which caused a reschedule. It doesn't necessarily have to
> be a quantum expiration interrupt, it could just as easily be an I/O
> completion interrupt.

Too bad the kernel stack frame at this point is so small [we are just
before return to userspace], so we don't really get our usual trace of
recent activities via the '?' entries ...

It would be nice to have a full-kernel-stack backtrace printout as an
option, which prints entries 'above' the current RSP.

I've Cc:-ed Frederic, maybe he has interest in implementing that?

Thanks,

Ingo

2013-07-05 07:19:09

by Frederic Weisbecker

[permalink] [raw]

Subject: Re: scheduling while atomic & hang.

On Fri, Jul 05, 2013 at 08:51:13AM +0200, Ingo Molnar wrote:
>
> * H. Peter Anvin <[email protected]> wrote:
>
> > On 07/03/2013 07:49 PM, Linus Torvalds wrote:
> > >> [<ffffffff816f42bf>] __schedule+0x94f/0x9c0
> > >> [<ffffffff816f487e>] schedule_user+0x2e/0x70
> > >> [<ffffffff816f6de4>] retint_careful+0x12/0x2e
> >
> > This call trace does indeed indicate that we took a hardware
> > interrupt which caused a reschedule. It doesn't necessarily have to
> > be a quantum expiration interrupt, it could just as easily be an I/O
> > completion interrupt.
>
> Too bad the kernel stack frame at this point is so small [we are just
> before return to userspace], so we don't really get our usual trace of
> recent activities via the '?' entries ...
>
> It would be nice to have a full-kernel-stack backtrace printout as an
> option, which prints entries 'above' the current RSP.
>
> I've Cc:-ed Frederic, maybe he has interest in implementing that?

Yeah that sounds interesting. I may give it a try.

Thanks.

2013-07-05 10:05:36

by Ingo Molnar

[permalink] [raw]

Subject: Re: scheduling while atomic & hang.

* Frederic Weisbecker <[email protected]> wrote:

> On Fri, Jul 05, 2013 at 08:51:13AM +0200, Ingo Molnar wrote:
> >
> > * H. Peter Anvin <[email protected]> wrote:
> >
> > > On 07/03/2013 07:49 PM, Linus Torvalds wrote:
> > > >> [<ffffffff816f42bf>] __schedule+0x94f/0x9c0
> > > >> [<ffffffff816f487e>] schedule_user+0x2e/0x70
> > > >> [<ffffffff816f6de4>] retint_careful+0x12/0x2e
> > >
> > > This call trace does indeed indicate that we took a hardware
> > > interrupt which caused a reschedule. It doesn't necessarily have to
> > > be a quantum expiration interrupt, it could just as easily be an I/O
> > > completion interrupt.
> >
> > Too bad the kernel stack frame at this point is so small [we are just
> > before return to userspace], so we don't really get our usual trace of
> > recent activities via the '?' entries ...
> >
> > It would be nice to have a full-kernel-stack backtrace printout as an
> > option, which prints entries 'above' the current RSP.
> >
> > I've Cc:-ed Frederic, maybe he has interest in implementing that?
>
> Yeah that sounds interesting. I may give it a try.

Which would be visible clearly in the output, via some clear delineation
from the 'real' portion of the stack trace - a separator line or so?

Thanks,

Ingo

2013-07-05 14:38:44

by Dave Jones

[permalink] [raw]

Subject: Yet more softlockups.

On Wed, Jul 03, 2013 at 07:49:18PM -0700, Linus Torvalds wrote:

> Does trinity save enough pseudo-random state that it can be
> repeatable, because if it's something repeatable it might be
> interesting to see what the last few system calls and traps were...
>
> > Box is wedged, and I won't be able to get to it until Friday to poke it.

Looks like nothing hit the disk but a bunch of zero byte log files.
I'll add some more syncing to todays runs to see if that helps.

A second almost identical machine (same cpu/chipset/ram) has been hitting
a bunch of different soft lockups btw. (What happened to the good
old days when kernels oopsed instead of locked up?)

These are the ones I sucked out of its logs this morning that looked unique.
They're all merge window builds, but I don't have a log of which git hash
corresponds to each build.

BUG: soft lockup - CPU#3 stuck for 22s! [trinity-child3:20119]
irq event stamp: 2453171
hardirqs last enabled at (2453170): [<ffffffff816ec620>] restore_args+0x0/0x30
hardirqs last disabled at (2453171): [<ffffffff816f512a>] apic_timer_interrupt+0x6a/0x80
softirqs last enabled at (262552): [<ffffffff810542e4>] __do_softirq+0x194/0x440
softirqs last disabled at (262555): [<ffffffff8105474d>] irq_exit+0xcd/0xe0
CPU: 3 PID: 20119 Comm: trinity-child3 Not tainted 3.10.0+ #16
task: ffff880228c22520 ti: ffff88021e8c8000 task.ti: ffff88021e8c8000
RIP: 0010:[<ffffffff8109bd05>] [<ffffffff8109bd05>] run_rebalance_domains+0x185/0x1b0
RSP: 0018:ffff880244e03eb0 EFLAGS: 00000246
RAX: ffff880244e00000 RBX: ffffffff816ec620 RCX: 000000000000b910
RDX: 000000000000628a RSI: 0000000010000100 RDI: 0000000000000001
RBP: ffff880244e03ef8 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000001 R12: ffff880244e03e28
R13: ffffffff816f512f R14: ffff880244e03ef8 R15: 0000000000000003
FS: 00007f8d790df740(0000) GS:ffff880244e00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007feed4713010 CR3: 00000002208d6000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 00000000012169c0 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
Stack:
ffff88021e8c9fd8 0000000000000003 ffff880244e03ec0 ffff880244e03ec0
0000000000000007 0000000000000007 0000000000000001 ffffffff81c04138
0000000000000005 ffff880244e03f70 ffffffff8105424f 0000000a00406040
Call Trace:
<IRQ>
[<ffffffff8105424f>] __do_softirq+0xff/0x440
[<ffffffff8105474d>] irq_exit+0xcd/0xe0
[<ffffffff816f5fcb>] smp_apic_timer_interrupt+0x6b/0x9b
[<ffffffff816f512f>] apic_timer_interrupt+0x6f/0x80
<EOI>
[<ffffffff816e9640>] ? wait_for_completion_interruptible+0x170/0x170
[<ffffffff816ea6e3>] ? preempt_schedule_irq+0x53/0x90
[<ffffffff816ec736>] retint_kernel+0x26/0x30
[<ffffffff811458a7>] ? user_enter+0x87/0xd0
[<ffffffff8100f6a8>] syscall_trace_leave+0x78/0x140
[<ffffffff816f47af>] int_check_syscall_exit_work+0x34/0x3d
Code: 85 28 09 00 00 48 89 05 ea 6e 06 01 48 8b 45 c0 48 8b 04 c5 40 51 cf 81 49 c7 c7 00 3f 1d 00 49 01 c7 f0 41 80 a7 88 00 00 00 fd <48> 83 c4 20 5b 41 5c 41 5d 41 5e 41 5f 5d c3 48 89 7d b8 e8 23

BUG: soft lockup - CPU#0 stuck for 22s! [trinity-child2:26807]
Modules linked in: bridge 8021q garp stp snd_seq_dummy tun fuse hidp rfcomm ipt_ULOG can_raw nfnetlink scsi_transport_iscsi bnep can_bcm bluetooth ipx p8023 x25 p8022 irda netrom rose llc2 pppoe ax25 pppox ppp_generic appletalk slhc psnap llc can rds af_rxrpc caif_socket caif phonet crc_ccitt af_key atm nfc rfkill af_802154 snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm e1000e snd_page_alloc snd_timer microcode pcspkr ptp snd pps_core soundcore
irq event stamp: 0
hardirqs last enabled at (0): [< (null)>] (null)
hardirqs last disabled at (0): [<ffffffff8104b009>] copy_process.part.28+0x619/0x15a0
softirqs last enabled at (0): [<ffffffff8104b009>] copy_process.part.28+0x619/0x15a0
softirqs last disabled at (0): [< (null)>] (null)
CPU: 0 PID: 26807 Comm: trinity-child2 Not tainted 3.10.0+ #18
task: ffff8801cb21a520 ti: ffff8801e06da000 task.ti: ffff8801e06da000
RIP: 0010:[<ffffffff810570c1>] [<ffffffff810570c1>] __do_softirq+0xb1/0x440
RSP: 0018:ffff880244803f10 EFLAGS: 00000246
RAX: ffff8801e06dbfd8 RBX: ffffffff816f56e0 RCX: 0000000000000001
RDX: 0000000000000100 RSI: 0000000000000000 RDI: ffffffff810570b9
RBP: ffff880244803f78 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff880244803e88
R13: ffffffff816fe46f R14: ffff880244803f78 R15: ffff8801cb21a520
FS: 00007f3ac43bd740(0000) GS:ffff880244800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000001ebb648 CR3: 000000022f70e000 CR4: 00000000001407f0
DR0: 0000000000b76000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
Stack:
0000000a00406444 0000000100153dba ffff8801e06dbfd8 ffff8801e06dbfd8
ffff8801e06dbfd8 ffff8801cb21a918 ffff8801e06dbfd8 ffffffff00000000
ffff8801cb21a520 ffff880229fd8ba1 ffffea00074a15c0 ffff8801e06dbc38
Call Trace:
<IRQ>
[<ffffffff8105760d>] irq_exit+0xcd/0xe0
[<ffffffff816ff7e5>] smp_apic_timer_interrupt+0x45/0x60
[<ffffffff816fe46f>] apic_timer_interrupt+0x6f/0x80
<EOI>
[<ffffffff816f56e0>] ? retint_restore_args+0xe/0xe
[<ffffffff81180aa8>] ? page_remove_rmap+0x18/0xa0
[<ffffffff81180ae5>] ? page_remove_rmap+0x55/0xa0
[<ffffffff81173e8d>] unmap_single_vma+0x3dd/0x850
[<ffffffff811752a9>] unmap_vmas+0x49/0x90
[<ffffffff8117db41>] exit_mmap+0xc1/0x170
[<ffffffff8104a1af>] mmput+0x6f/0x100
[<ffffffff810521dc>] do_exit+0x27c/0xcd0
[<ffffffff81066371>] ? get_signal_to_deliver+0xb1/0x940
[<ffffffff8105401c>] do_group_exit+0x4c/0xc0
[<ffffffff810665a1>] get_signal_to_deliver+0x2e1/0x940
[<ffffffff810ba51d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffff810024b8>] do_signal+0x48/0x5a0
[<ffffffff8131198e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[<ffffffff816f57f6>] ? retint_kernel+0x26/0x30
[<ffffffff816f32f3>] ? preempt_schedule_irq+0x63/0x90
[<ffffffff816f56e0>] ? retint_restore_args+0xe/0xe
[<ffffffff810954f9>] ? vtime_account_user+0x69/0x70
[<ffffffff8114a5f5>] ? user_exit+0x45/0x90
[<ffffffff8114a5f7>] ? user_exit+0x47/0x90
[<ffffffff81002a88>] do_notify_resume+0x78/0x90
[<ffffffff816fdb0a>] int_signal+0x12/0x17
Code: 48 89 45 b8 48 89 45 b0 48 89 45 a8 66 0f 1f 44 00 00 65 c7 04 25 80 0f 1d 00 00 00 00 00 e8 57 34 06 00 fb 49 c7 c6 00 41 c0 81 <eb> 0e 0f 1f 44 00 00 49 83 c6 08 41 d1 ef 74 6c 41 f6 c7 01 74

BUG: soft lockup - CPU#3 stuck for 23s! [trinity-child1:14565]
Modules linked in: dlci snd_seq_dummy hidp tun fuse bnep rfcomm ipt_ULOG nfnetlink scsi_transport_iscsi af_802154 can_bcm af_rxrpc bluetooth irda af_key rose atm x25 netrom can_raw phonet caif_socket can caif crc_ccitt appletalk llc2 ipx nfc p8023 rds rfkill psnap pppoe p8022 llc ax25 pppox ppp_generic slhc snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_intel snd_hda_codec pcspkr snd_hwdep snd_seq snd_seq_device snd_pcm e1000e snd_page_alloc snd_timer ptp snd pps_core soundcore
irq event stamp: 0
hardirqs last enabled at (0): [< (null)>] (null)
hardirqs last disabled at (0): [<ffffffff8104aa61>] copy_process.part.27+0x611/0x15b0
softirqs last enabled at (0): [<ffffffff8104aa61>] copy_process.part.27+0x611/0x15b0
softirqs last disabled at (0): [< (null)>] (null)
CPU: 3 PID: 14565 Comm: trinity-child1 Not tainted 3.10.0+ #20
task: ffff88022caa0000 ti: ffff880212676000 task.ti: ffff880212676000
RIP: 0010:[<ffffffff81056ba1>] [<ffffffff81056ba1>] __do_softirq+0xb1/0x440
RSP: 0000:ffff880244e03f10 EFLAGS: 00000246
RAX: ffff880212677fd8 RBX: ffffffff816fc660 RCX: 0000000010624dd3
RDX: 0000000000000100 RSI: 0000000000000000 RDI: ffffffff81056b99
RBP: ffff880244e03f78 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff880244e03e88
R13: ffffffff817053ef R14: ffff880244e03f78 R15: ffff88022caa0000
FS: 00007f959367e740(0000) GS:ffff880244e00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f8c6657e968 CR3: 00000001cb73e000 CR4: 00000000001407e0
DR0: 0000000000a51000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
Stack:
0000000a00406040 000000010003d260 ffff880212677fd8 ffff880212677fd8
ffff880212677fd8 ffff88022caa03f8 ffff880212677fd8 ffffffff00000003
ffff88022caa0000 ffff880212677fd8 0000000002461be8 ffff880212677f58
Call Trace:
<IRQ>
[<ffffffff810570ed>] irq_exit+0xcd/0xe0
[<ffffffff81706765>] smp_apic_timer_interrupt+0x45/0x60
[<ffffffff817053ef>] apic_timer_interrupt+0x6f/0x80
<EOI>
[<ffffffff816fc660>] ? retint_restore_args+0xe/0xe
[<ffffffff816fc776>] ? retint_kernel+0x26/0x30
[<ffffffff816fa24d>] ? preempt_schedule_irq+0x4d/0x90
[<ffffffff816f9230>] ? wait_for_completion_interruptible+0x180/0x180
[<ffffffff816fa253>] ? preempt_schedule_irq+0x53/0x90
[<ffffffff816fc776>] retint_kernel+0x26/0x30
[<ffffffff8117da38>] ? find_vma+0x58/0x60
[<ffffffff816ffa79>] __do_page_fault+0x109/0x5d0
[<ffffffff810b7a6d>] ? trace_hardirqs_off+0xd/0x10
[<ffffffff811106ef>] ? __acct_update_integrals+0x7f/0xf0
[<ffffffff8108fc5d>] ? get_parent_ip+0xd/0x50
[<ffffffff817004a1>] ? sub_preempt_count+0x71/0x100
[<ffffffff816fb931>] ? _raw_spin_unlock+0x31/0x60
[<ffffffff81095ab9>] ? vtime_account_user+0x69/0x70
[<ffffffff816fff6c>] do_page_fault+0x2c/0x50
[<ffffffff816fc8a2>] page_fault+0x22/0x30
Code: 48 89 45 b8 48 89 45 b0 48 89 45 a8 66 0f 1f 44 00 00 65 c7 04 25 80 0d 1d 00 00 00 00 00 e8 17 47 06 00 fb 49 c7 c6 00 41 c0 81 <eb> 0e 0f 1f 44 00 00 49 83 c6 08 41 d1 ef 74 6c 41 f6 c7 01 74
Bridge firewalling registered
perf samples too long (2519 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
INFO: NMI handler (perf_event_nmi_handler) took too long to run: 238147.002 msecs

2013-07-05 15:15:11

by Thomas Gleixner

[permalink] [raw]

Subject: scheduling while atomic & hang.

Subject: Re: scheduling while atomic & hang.

Subject: Re: scheduling while atomic & hang.

Subject: Re: scheduling while atomic & hang.

Subject: Re: scheduling while atomic & hang.

Subject: Re: scheduling while atomic & hang.

Subject: Re: scheduling while atomic & hang.

Subject: Re: scheduling while atomic & hang.

Subject: Re: scheduling while atomic & hang.

Subject: Yet more softlockups.

Subject: Re: Yet more softlockups.

Subject: Re: Yet more softlockups.

Subject: Re: Yet more softlockups.

Subject: Re: Yet more softlockups.

Subject: RE: Yet more softlockups.

Subject: Re: scheduling while atomic & hang.

Subject: RE: Yet more softlockups.

Subject: Re: Yet more softlockups.

Subject: Re: scheduling while atomic & hang.

Subject: Re: scheduling while atomic & hang.

Subject: Re: Yet more softlockups.

Subject: Re: Yet more softlockups.

Subject: Re: Yet more softlockups.

Subject: Re: Yet more softlockups.

Subject: Re: Yet more softlockups.

Subject: Re: Yet more softlockups.

Subject: Re: scheduling while atomic & hang.

Subject: Re: Yet more softlockups.

Subject: Re: Yet more softlockups.

Subject: Re: Yet more softlockups.

Subject: Re: Yet more softlockups.

Subject: Re: Yet more softlockups.

Subject: Re: Yet more softlockups.

Subject: Re: Yet more softlockups.

Subject: Re: Yet more softlockups.

Subject: Re: Yet more softlockups.

Subject: Re: Yet more softlockups.

Subject: Re: Yet more softlockups.

Subject: Re: Yet more softlockups.

Subject: Re: Yet more softlockups.

Subject: Re: Yet more softlockups.