2013-07-04 01:55:38

by Dave Jones

[permalink] [raw]
Subject: scheduling while atomic & hang.

This is a pretty context free trace. What the hell happened here?


BUG: scheduling while atomic: trinity-child0/13280/0xefffffff
INFO: lockdep is turned off.
Modules linked in: dlci dccp_ipv6 dccp_ipv4 dccp sctp bridge 8021q garp stp snd_seq_dummy tun fuse hidp rfcomm bnep can_raw can_bcm nfnetlink phonet llc2 pppoe pppox ppp_generic slhc appletalk af_rxrpc ipt_ULOG irda af_key
can atm scsi_transport_iscsi rds rose x25 af_802154 caif_socket ipx caif p8023 psnap crc_ccitt p8022 llc bluetooth nfc rfkill netrom ax25 snd_hda_codec_realtek microcode snd_hda_codec_hdmi pcspkr snd_hda_intel snd_hda_codec snd_hwdep sn
d_seq snd_seq_device usb_debug snd_pcm e1000e snd_page_alloc snd_timer ptp snd pps_core soundcore xfs libcrc32c
CPU: 0 PID: 13280 Comm: trinity-child0 Not tainted 3.10.0+ #40
0000000000000000 ffff880228533ee0 ffffffff816ec1a2 ffff880228533ef8
ffffffff816e782e 0000000000000db5 ffff880228533f60 ffffffff816f42bf
ffff88023bdeca40 ffff880228533fd8 ffff880228533fd8 ffff880228533fd8
Call Trace:
[<ffffffff816ec1a2>] dump_stack+0x19/0x1b
[<ffffffff816e782e>] __schedule_bug+0x61/0x70
[<ffffffff816f42bf>] __schedule+0x94f/0x9c0
[<ffffffff816f487e>] schedule_user+0x2e/0x70
[<ffffffff816f6de4>] retint_careful+0x12/0x2e
BUG: scheduling while atomic: trinity-child0/13280/0xefffffff
INFO: lockdep is turned off.
Modules linked in: dlci dccp_ipv6 dccp_ipv4 dccp sctp bridge 8021q garp stp snd_seq_dummy tun fuse hidp rfcomm bnep can_raw can_bcm nfnetlink phonet llc2 pppoe pppox ppp_generic slhc appletalk af_rxrpc ipt_ULOG irda af_key can atm scsi_transport_iscsi rds rose x25 af_802154 caif_socket ipx caif p8023 psnap crc_ccitt p8022 llc bluetooth nfc rfkill netrom ax25 snd_hda_codec_realtek microcode snd_hda_codec_hdmi pcspkr snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device usb_debug snd_pcm e1000e snd_page_alloc snd_timer ptp snd pps_core soundcore xfs libcrc32c
CPU: 0 PID: 13280 Comm: trinity-child0 Tainted: G W 3.10.0+ #40
0000000000000000 ffff880228533e80 ffffffff816ec1a2 ffff880228533e98
ffffffff816e782e ffff880228533fd8 ffff880228533f00 ffffffff816f42bf
ffff88023bdeca40 ffff880228533fd8 ffff880228533fd8 ffff880228533fd8
Call Trace:
[<ffffffff816ec1a2>] dump_stack+0x19/0x1b
[<ffffffff816e782e>] __schedule_bug+0x61/0x70
[<ffffffff816f42bf>] __schedule+0x94f/0x9c0
[<ffffffff816f487e>] schedule_user+0x2e/0x70
[<ffffffff816ff169>] int_careful+0x12/0x1e
[<ffffffff816f487e>] ? schedule_user+0x2e/0x70
[<ffffffff816f6de4>] ? retint_careful+0x12/0x2e
Kernel panic - not syncing: Aiee, killing interrupt handler!


Box is wedged, and I won't be able to get to it until Friday to poke it.

Dave


2013-07-04 02:49:21

by Linus Torvalds

[permalink] [raw]
Subject: Re: scheduling while atomic & hang.

On Wed, Jul 3, 2013 at 6:55 PM, Dave Jones <[email protected]> wrote:
> This is a pretty context free trace. What the hell happened here?

That lack of call trace looks like it happened at the final stage of
an interrupt or page fault or other trap that is about to return to
user space.

My guess would be that the trap/irq/whatever handler for some odd
reason ended up with an unbalanced spinlock or something. But since
there is no trace of it, I can't even begin to guess what it would be.

Does trinity save enough pseudo-random state that it can be
repeatable, because if it's something repeatable it might be
interesting to see what the last few system calls and traps were...

> Box is wedged, and I won't be able to get to it until Friday to poke it.

Oh well. Adding the x86 people to the cc, since the whole
"retint_careful" does seem imply that it's not a system call entry,
and more likely to be something like a page fault or debug trap or
something. Any ideas, guys?

>From the " 3.10.0+" I assume this is from the merge window, and
possibly a new failure. Do you have an actual git ID? I can heartily
recommend CONFIG_LOCALVERSION_AUTO=y as a way to get commit ID's
encoded in the version string (which is obviously more useful if you
end up running mainly kernels without extra commits of your own on top
of them - if you have your own local commits you'd still need to
translate it into "your kernel XYZ with commits of mine on top")

Linus

---
> BUG: scheduling while atomic: trinity-child0/13280/0xefffffff
> INFO: lockdep is turned off.
> Modules linked in: dlci dccp_ipv6 dccp_ipv4 dccp sctp bridge 8021q garp stp snd_seq_dummy tun fuse hidp rfcomm bnep can_raw can_bcm nfnetlink phonet llc2 pppoe pppox ppp_generic slhc appletalk af_rxrpc ipt_ULOG irda af_key
> can atm scsi_transport_iscsi rds rose x25 af_802154 caif_socket ipx caif p8023 psnap crc_ccitt p8022 llc bluetooth nfc rfkill netrom ax25 snd_hda_codec_realtek microcode snd_hda_codec_hdmi pcspkr snd_hda_intel snd_hda_codec snd_hwdep sn
> d_seq snd_seq_device usb_debug snd_pcm e1000e snd_page_alloc snd_timer ptp snd pps_core soundcore xfs libcrc32c
> CPU: 0 PID: 13280 Comm: trinity-child0 Not tainted 3.10.0+ #40
> 0000000000000000 ffff880228533ee0 ffffffff816ec1a2 ffff880228533ef8
> ffffffff816e782e 0000000000000db5 ffff880228533f60 ffffffff816f42bf
> ffff88023bdeca40 ffff880228533fd8 ffff880228533fd8 ffff880228533fd8
> Call Trace:
> [<ffffffff816ec1a2>] dump_stack+0x19/0x1b
> [<ffffffff816e782e>] __schedule_bug+0x61/0x70
> [<ffffffff816f42bf>] __schedule+0x94f/0x9c0
> [<ffffffff816f487e>] schedule_user+0x2e/0x70
> [<ffffffff816f6de4>] retint_careful+0x12/0x2e
> BUG: scheduling while atomic: trinity-child0/13280/0xefffffff
> INFO: lockdep is turned off.
> Modules linked in: dlci dccp_ipv6 dccp_ipv4 dccp sctp bridge 8021q garp stp snd_seq_dummy tun fuse hidp rfcomm bnep can_raw can_bcm nfnetlink phonet llc2 pppoe pppox ppp_generic slhc appletalk af_rxrpc ipt_ULOG irda af_key can atm scsi_transport_iscsi rds rose x25 af_802154 caif_socket ipx caif p8023 psnap crc_ccitt p8022 llc bluetooth nfc rfkill netrom ax25 snd_hda_codec_realtek microcode snd_hda_codec_hdmi pcspkr snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device usb_debug snd_pcm e1000e snd_page_alloc snd_timer ptp snd pps_core soundcore xfs libcrc32c
> CPU: 0 PID: 13280 Comm: trinity-child0 Tainted: G W 3.10.0+ #40
> 0000000000000000 ffff880228533e80 ffffffff816ec1a2 ffff880228533e98
> ffffffff816e782e ffff880228533fd8 ffff880228533f00 ffffffff816f42bf
> ffff88023bdeca40 ffff880228533fd8 ffff880228533fd8 ffff880228533fd8
> Call Trace:
> [<ffffffff816ec1a2>] dump_stack+0x19/0x1b
> [<ffffffff816e782e>] __schedule_bug+0x61/0x70
> [<ffffffff816f42bf>] __schedule+0x94f/0x9c0
> [<ffffffff816f487e>] schedule_user+0x2e/0x70
> [<ffffffff816ff169>] int_careful+0x12/0x1e
> [<ffffffff816f487e>] ? schedule_user+0x2e/0x70
> [<ffffffff816f6de4>] ? retint_careful+0x12/0x2e
> Kernel panic - not syncing: Aiee, killing interrupt handler!

2013-07-04 04:43:41

by H. Peter Anvin

[permalink] [raw]
Subject: Re: scheduling while atomic & hang.

I'll look harder at the backtrace tomorrow, but my guess is that the cpu has just gotten a scheduling interrupt (time quantum expired.)

Linus Torvalds <[email protected]> wrote:

>On Wed, Jul 3, 2013 at 6:55 PM, Dave Jones <[email protected]> wrote:
>> This is a pretty context free trace. What the hell happened here?
>
>That lack of call trace looks like it happened at the final stage of
>an interrupt or page fault or other trap that is about to return to
>user space.
>
>My guess would be that the trap/irq/whatever handler for some odd
>reason ended up with an unbalanced spinlock or something. But since
>there is no trace of it, I can't even begin to guess what it would be.
>
>Does trinity save enough pseudo-random state that it can be
>repeatable, because if it's something repeatable it might be
>interesting to see what the last few system calls and traps were...
>
>> Box is wedged, and I won't be able to get to it until Friday to poke
>it.
>
>Oh well. Adding the x86 people to the cc, since the whole
>"retint_careful" does seem imply that it's not a system call entry,
>and more likely to be something like a page fault or debug trap or
>something. Any ideas, guys?
>
>From the " 3.10.0+" I assume this is from the merge window, and
>possibly a new failure. Do you have an actual git ID? I can heartily
>recommend CONFIG_LOCALVERSION_AUTO=y as a way to get commit ID's
>encoded in the version string (which is obviously more useful if you
>end up running mainly kernels without extra commits of your own on top
>of them - if you have your own local commits you'd still need to
>translate it into "your kernel XYZ with commits of mine on top")
>
> Linus
>
>---
>> BUG: scheduling while atomic: trinity-child0/13280/0xefffffff
>> INFO: lockdep is turned off.
>> Modules linked in: dlci dccp_ipv6 dccp_ipv4 dccp sctp bridge 8021q
>garp stp snd_seq_dummy tun fuse hidp rfcomm bnep can_raw can_bcm
>nfnetlink phonet llc2 pppoe pppox ppp_generic slhc appletalk af_rxrpc
>ipt_ULOG irda af_key
>> can atm scsi_transport_iscsi rds rose x25 af_802154 caif_socket ipx
>caif p8023 psnap crc_ccitt p8022 llc bluetooth nfc rfkill netrom ax25
>snd_hda_codec_realtek microcode snd_hda_codec_hdmi pcspkr snd_hda_intel
>snd_hda_codec snd_hwdep sn
>> d_seq snd_seq_device usb_debug snd_pcm e1000e snd_page_alloc
>snd_timer ptp snd pps_core soundcore xfs libcrc32c
>> CPU: 0 PID: 13280 Comm: trinity-child0 Not tainted 3.10.0+ #40
>> 0000000000000000 ffff880228533ee0 ffffffff816ec1a2 ffff880228533ef8
>> ffffffff816e782e 0000000000000db5 ffff880228533f60 ffffffff816f42bf
>> ffff88023bdeca40 ffff880228533fd8 ffff880228533fd8 ffff880228533fd8
>> Call Trace:
>> [<ffffffff816ec1a2>] dump_stack+0x19/0x1b
>> [<ffffffff816e782e>] __schedule_bug+0x61/0x70
>> [<ffffffff816f42bf>] __schedule+0x94f/0x9c0
>> [<ffffffff816f487e>] schedule_user+0x2e/0x70
>> [<ffffffff816f6de4>] retint_careful+0x12/0x2e
>> BUG: scheduling while atomic: trinity-child0/13280/0xefffffff
>> INFO: lockdep is turned off.
>> Modules linked in: dlci dccp_ipv6 dccp_ipv4 dccp sctp bridge 8021q
>garp stp snd_seq_dummy tun fuse hidp rfcomm bnep can_raw can_bcm
>nfnetlink phonet llc2 pppoe pppox ppp_generic slhc appletalk af_rxrpc
>ipt_ULOG irda af_key can atm scsi_transport_iscsi rds rose x25
>af_802154 caif_socket ipx caif p8023 psnap crc_ccitt p8022 llc
>bluetooth nfc rfkill netrom ax25 snd_hda_codec_realtek microcode
>snd_hda_codec_hdmi pcspkr snd_hda_intel snd_hda_codec snd_hwdep snd_seq
>snd_seq_device usb_debug snd_pcm e1000e snd_page_alloc snd_timer ptp
>snd pps_core soundcore xfs libcrc32c
>> CPU: 0 PID: 13280 Comm: trinity-child0 Tainted: G W 3.10.0+
>#40
>> 0000000000000000 ffff880228533e80 ffffffff816ec1a2 ffff880228533e98
>> ffffffff816e782e ffff880228533fd8 ffff880228533f00 ffffffff816f42bf
>> ffff88023bdeca40 ffff880228533fd8 ffff880228533fd8 ffff880228533fd8
>> Call Trace:
>> [<ffffffff816ec1a2>] dump_stack+0x19/0x1b
>> [<ffffffff816e782e>] __schedule_bug+0x61/0x70
>> [<ffffffff816f42bf>] __schedule+0x94f/0x9c0
>> [<ffffffff816f487e>] schedule_user+0x2e/0x70
>> [<ffffffff816ff169>] int_careful+0x12/0x1e
>> [<ffffffff816f487e>] ? schedule_user+0x2e/0x70
>> [<ffffffff816f6de4>] ? retint_careful+0x12/0x2e
>> Kernel panic - not syncing: Aiee, killing interrupt handler!

--
Sent from my mobile phone. Please excuse brevity and lack of formatting.

2013-07-04 07:49:20

by Dave Jones

[permalink] [raw]
Subject: Re: scheduling while atomic & hang.

On Wed, Jul 03, 2013 at 07:49:18PM -0700, Linus Torvalds wrote:
> On Wed, Jul 3, 2013 at 6:55 PM, Dave Jones <[email protected]> wrote:
> > This is a pretty context free trace. What the hell happened here?
>
> That lack of call trace looks like it happened at the final stage of
> an interrupt or page fault or other trap that is about to return to
> user space.
>
> My guess would be that the trap/irq/whatever handler for some odd
> reason ended up with an unbalanced spinlock or something. But since
> there is no trace of it, I can't even begin to guess what it would be.
>
> Does trinity save enough pseudo-random state that it can be
> repeatable, because if it's something repeatable it might be
> interesting to see what the last few system calls and traps were...

It should, but I'll have to see if anything useful hit the disk before
it wedged.

> >From the " 3.10.0+" I assume this is from the merge window, and
> possibly a new failure. Do you have an actual git ID? I can heartily
> recommend CONFIG_LOCALVERSION_AUTO=y as a way to get commit ID's
> encoded in the version string (which is obviously more useful if you
> end up running mainly kernels without extra commits of your own on top
> of them - if you have your own local commits you'd still need to
> translate it into "your kernel XYZ with commits of mine on top")

top of tree was 0b0585c3e192967cb2ef0ac0816eb8a8c8d99840 I think.
(That's what it is on my local box that I pull all my test trees from,
and I don't think it changed after I started that run, but I'll
double check on Friday)

I don't use the auto config, because I end up filling up /boot
unless I go through and clean them out by hand every time I install
a new one (which I do probably a dozen or so times a day).
Is there some easy way to prune old builds I'm missing ?

Dave

2013-07-04 17:22:13

by Linus Torvalds

[permalink] [raw]
Subject: Re: scheduling while atomic & hang.

On Thu, Jul 4, 2013 at 12:49 AM, Dave Jones <[email protected]> wrote:
>
> top of tree was 0b0585c3e192967cb2ef0ac0816eb8a8c8d99840 I think.
> (That's what it is on my local box that I pull all my test trees from,
> and I don't think it changed after I started that run, but I'll
> double check on Friday)

Ok. So that has all the scheduler changes, and all the x86 changes
(and all the cgroup workqueue changes too - but without any call chain
at all, I'm disinclined to blame those yet).

> I don't use the auto config, because I end up filling up /boot
> unless I go through and clean them out by hand every time I install
> a new one (which I do probably a dozen or so times a day).
> Is there some easy way to prune old builds I'm missing ?

I'm sure it could be done automatically some way, but yeah, I just do
it by hand. I often *compile* dozens of kernels a day, but I seldom
install more than one or two even during the merge window..

Linus

2013-07-04 18:08:51

by H. Peter Anvin

[permalink] [raw]
Subject: Re: scheduling while atomic & hang.

On 07/03/2013 07:49 PM, Linus Torvalds wrote:
>> [<ffffffff816f42bf>] __schedule+0x94f/0x9c0
>> [<ffffffff816f487e>] schedule_user+0x2e/0x70
>> [<ffffffff816f6de4>] retint_careful+0x12/0x2e

This call trace does indeed indicate that we took a hardware interrupt
which caused a reschedule. It doesn't necessarily have to be a quantum
expiration interrupt, it could just as easily be an I/O completion
interrupt.

-hpa

2013-07-05 06:51:18

by Ingo Molnar

[permalink] [raw]
Subject: Re: scheduling while atomic & hang.


* H. Peter Anvin <[email protected]> wrote:

> On 07/03/2013 07:49 PM, Linus Torvalds wrote:
> >> [<ffffffff816f42bf>] __schedule+0x94f/0x9c0
> >> [<ffffffff816f487e>] schedule_user+0x2e/0x70
> >> [<ffffffff816f6de4>] retint_careful+0x12/0x2e
>
> This call trace does indeed indicate that we took a hardware
> interrupt which caused a reschedule. It doesn't necessarily have to
> be a quantum expiration interrupt, it could just as easily be an I/O
> completion interrupt.

Too bad the kernel stack frame at this point is so small [we are just
before return to userspace], so we don't really get our usual trace of
recent activities via the '?' entries ...

It would be nice to have a full-kernel-stack backtrace printout as an
option, which prints entries 'above' the current RSP.

I've Cc:-ed Frederic, maybe he has interest in implementing that?

Thanks,

Ingo

2013-07-05 07:19:09

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: scheduling while atomic & hang.

On Fri, Jul 05, 2013 at 08:51:13AM +0200, Ingo Molnar wrote:
>
> * H. Peter Anvin <[email protected]> wrote:
>
> > On 07/03/2013 07:49 PM, Linus Torvalds wrote:
> > >> [<ffffffff816f42bf>] __schedule+0x94f/0x9c0
> > >> [<ffffffff816f487e>] schedule_user+0x2e/0x70
> > >> [<ffffffff816f6de4>] retint_careful+0x12/0x2e
> >
> > This call trace does indeed indicate that we took a hardware
> > interrupt which caused a reschedule. It doesn't necessarily have to
> > be a quantum expiration interrupt, it could just as easily be an I/O
> > completion interrupt.
>
> Too bad the kernel stack frame at this point is so small [we are just
> before return to userspace], so we don't really get our usual trace of
> recent activities via the '?' entries ...
>
> It would be nice to have a full-kernel-stack backtrace printout as an
> option, which prints entries 'above' the current RSP.
>
> I've Cc:-ed Frederic, maybe he has interest in implementing that?

Yeah that sounds interesting. I may give it a try.

Thanks.

2013-07-05 10:05:36

by Ingo Molnar

[permalink] [raw]
Subject: Re: scheduling while atomic & hang.


* Frederic Weisbecker <[email protected]> wrote:

> On Fri, Jul 05, 2013 at 08:51:13AM +0200, Ingo Molnar wrote:
> >
> > * H. Peter Anvin <[email protected]> wrote:
> >
> > > On 07/03/2013 07:49 PM, Linus Torvalds wrote:
> > > >> [<ffffffff816f42bf>] __schedule+0x94f/0x9c0
> > > >> [<ffffffff816f487e>] schedule_user+0x2e/0x70
> > > >> [<ffffffff816f6de4>] retint_careful+0x12/0x2e
> > >
> > > This call trace does indeed indicate that we took a hardware
> > > interrupt which caused a reschedule. It doesn't necessarily have to
> > > be a quantum expiration interrupt, it could just as easily be an I/O
> > > completion interrupt.
> >
> > Too bad the kernel stack frame at this point is so small [we are just
> > before return to userspace], so we don't really get our usual trace of
> > recent activities via the '?' entries ...
> >
> > It would be nice to have a full-kernel-stack backtrace printout as an
> > option, which prints entries 'above' the current RSP.
> >
> > I've Cc:-ed Frederic, maybe he has interest in implementing that?
>
> Yeah that sounds interesting. I may give it a try.

Which would be visible clearly in the output, via some clear delineation
from the 'real' portion of the stack trace - a separator line or so?

Thanks,

Ingo

2013-07-05 14:38:44

by Dave Jones

[permalink] [raw]
Subject: Yet more softlockups.

On Wed, Jul 03, 2013 at 07:49:18PM -0700, Linus Torvalds wrote:

> Does trinity save enough pseudo-random state that it can be
> repeatable, because if it's something repeatable it might be
> interesting to see what the last few system calls and traps were...
>
> > Box is wedged, and I won't be able to get to it until Friday to poke it.

Looks like nothing hit the disk but a bunch of zero byte log files.
I'll add some more syncing to todays runs to see if that helps.

A second almost identical machine (same cpu/chipset/ram) has been hitting
a bunch of different soft lockups btw. (What happened to the good
old days when kernels oopsed instead of locked up?)

These are the ones I sucked out of its logs this morning that looked unique.
They're all merge window builds, but I don't have a log of which git hash
corresponds to each build.


BUG: soft lockup - CPU#3 stuck for 22s! [trinity-child3:20119]
irq event stamp: 2453171
hardirqs last enabled at (2453170): [<ffffffff816ec620>] restore_args+0x0/0x30
hardirqs last disabled at (2453171): [<ffffffff816f512a>] apic_timer_interrupt+0x6a/0x80
softirqs last enabled at (262552): [<ffffffff810542e4>] __do_softirq+0x194/0x440
softirqs last disabled at (262555): [<ffffffff8105474d>] irq_exit+0xcd/0xe0
CPU: 3 PID: 20119 Comm: trinity-child3 Not tainted 3.10.0+ #16
task: ffff880228c22520 ti: ffff88021e8c8000 task.ti: ffff88021e8c8000
RIP: 0010:[<ffffffff8109bd05>] [<ffffffff8109bd05>] run_rebalance_domains+0x185/0x1b0
RSP: 0018:ffff880244e03eb0 EFLAGS: 00000246
RAX: ffff880244e00000 RBX: ffffffff816ec620 RCX: 000000000000b910
RDX: 000000000000628a RSI: 0000000010000100 RDI: 0000000000000001
RBP: ffff880244e03ef8 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000001 R12: ffff880244e03e28
R13: ffffffff816f512f R14: ffff880244e03ef8 R15: 0000000000000003
FS: 00007f8d790df740(0000) GS:ffff880244e00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007feed4713010 CR3: 00000002208d6000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 00000000012169c0 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
Stack:
ffff88021e8c9fd8 0000000000000003 ffff880244e03ec0 ffff880244e03ec0
0000000000000007 0000000000000007 0000000000000001 ffffffff81c04138
0000000000000005 ffff880244e03f70 ffffffff8105424f 0000000a00406040
Call Trace:
<IRQ>
[<ffffffff8105424f>] __do_softirq+0xff/0x440
[<ffffffff8105474d>] irq_exit+0xcd/0xe0
[<ffffffff816f5fcb>] smp_apic_timer_interrupt+0x6b/0x9b
[<ffffffff816f512f>] apic_timer_interrupt+0x6f/0x80
<EOI>
[<ffffffff816e9640>] ? wait_for_completion_interruptible+0x170/0x170
[<ffffffff816ea6e3>] ? preempt_schedule_irq+0x53/0x90
[<ffffffff816ec736>] retint_kernel+0x26/0x30
[<ffffffff811458a7>] ? user_enter+0x87/0xd0
[<ffffffff8100f6a8>] syscall_trace_leave+0x78/0x140
[<ffffffff816f47af>] int_check_syscall_exit_work+0x34/0x3d
Code: 85 28 09 00 00 48 89 05 ea 6e 06 01 48 8b 45 c0 48 8b 04 c5 40 51 cf 81 49 c7 c7 00 3f 1d 00 49 01 c7 f0 41 80 a7 88 00 00 00 fd <48> 83 c4 20 5b 41 5c 41 5d 41 5e 41 5f 5d c3 48 89 7d b8 e8 23



BUG: soft lockup - CPU#0 stuck for 22s! [trinity-child2:26807]
Modules linked in: bridge 8021q garp stp snd_seq_dummy tun fuse hidp rfcomm ipt_ULOG can_raw nfnetlink scsi_transport_iscsi bnep can_bcm bluetooth ipx p8023 x25 p8022 irda netrom rose llc2 pppoe ax25 pppox ppp_generic appletalk slhc psnap llc can rds af_rxrpc caif_socket caif phonet crc_ccitt af_key atm nfc rfkill af_802154 snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm e1000e snd_page_alloc snd_timer microcode pcspkr ptp snd pps_core soundcore
irq event stamp: 0
hardirqs last enabled at (0): [< (null)>] (null)
hardirqs last disabled at (0): [<ffffffff8104b009>] copy_process.part.28+0x619/0x15a0
softirqs last enabled at (0): [<ffffffff8104b009>] copy_process.part.28+0x619/0x15a0
softirqs last disabled at (0): [< (null)>] (null)
CPU: 0 PID: 26807 Comm: trinity-child2 Not tainted 3.10.0+ #18
task: ffff8801cb21a520 ti: ffff8801e06da000 task.ti: ffff8801e06da000
RIP: 0010:[<ffffffff810570c1>] [<ffffffff810570c1>] __do_softirq+0xb1/0x440
RSP: 0018:ffff880244803f10 EFLAGS: 00000246
RAX: ffff8801e06dbfd8 RBX: ffffffff816f56e0 RCX: 0000000000000001
RDX: 0000000000000100 RSI: 0000000000000000 RDI: ffffffff810570b9
RBP: ffff880244803f78 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff880244803e88
R13: ffffffff816fe46f R14: ffff880244803f78 R15: ffff8801cb21a520
FS: 00007f3ac43bd740(0000) GS:ffff880244800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000001ebb648 CR3: 000000022f70e000 CR4: 00000000001407f0
DR0: 0000000000b76000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
Stack:
0000000a00406444 0000000100153dba ffff8801e06dbfd8 ffff8801e06dbfd8
ffff8801e06dbfd8 ffff8801cb21a918 ffff8801e06dbfd8 ffffffff00000000
ffff8801cb21a520 ffff880229fd8ba1 ffffea00074a15c0 ffff8801e06dbc38
Call Trace:
<IRQ>
[<ffffffff8105760d>] irq_exit+0xcd/0xe0
[<ffffffff816ff7e5>] smp_apic_timer_interrupt+0x45/0x60
[<ffffffff816fe46f>] apic_timer_interrupt+0x6f/0x80
<EOI>
[<ffffffff816f56e0>] ? retint_restore_args+0xe/0xe
[<ffffffff81180aa8>] ? page_remove_rmap+0x18/0xa0
[<ffffffff81180ae5>] ? page_remove_rmap+0x55/0xa0
[<ffffffff81173e8d>] unmap_single_vma+0x3dd/0x850
[<ffffffff811752a9>] unmap_vmas+0x49/0x90
[<ffffffff8117db41>] exit_mmap+0xc1/0x170
[<ffffffff8104a1af>] mmput+0x6f/0x100
[<ffffffff810521dc>] do_exit+0x27c/0xcd0
[<ffffffff81066371>] ? get_signal_to_deliver+0xb1/0x940
[<ffffffff8105401c>] do_group_exit+0x4c/0xc0
[<ffffffff810665a1>] get_signal_to_deliver+0x2e1/0x940
[<ffffffff810ba51d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffff810024b8>] do_signal+0x48/0x5a0
[<ffffffff8131198e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[<ffffffff816f57f6>] ? retint_kernel+0x26/0x30
[<ffffffff816f32f3>] ? preempt_schedule_irq+0x63/0x90
[<ffffffff816f56e0>] ? retint_restore_args+0xe/0xe
[<ffffffff810954f9>] ? vtime_account_user+0x69/0x70
[<ffffffff8114a5f5>] ? user_exit+0x45/0x90
[<ffffffff8114a5f7>] ? user_exit+0x47/0x90
[<ffffffff81002a88>] do_notify_resume+0x78/0x90
[<ffffffff816fdb0a>] int_signal+0x12/0x17
Code: 48 89 45 b8 48 89 45 b0 48 89 45 a8 66 0f 1f 44 00 00 65 c7 04 25 80 0f 1d 00 00 00 00 00 e8 57 34 06 00 fb 49 c7 c6 00 41 c0 81 <eb> 0e 0f 1f 44 00 00 49 83 c6 08 41 d1 ef 74 6c 41 f6 c7 01 74


BUG: soft lockup - CPU#3 stuck for 23s! [trinity-child1:14565]
Modules linked in: dlci snd_seq_dummy hidp tun fuse bnep rfcomm ipt_ULOG nfnetlink scsi_transport_iscsi af_802154 can_bcm af_rxrpc bluetooth irda af_key rose atm x25 netrom can_raw phonet caif_socket can caif crc_ccitt appletalk llc2 ipx nfc p8023 rds rfkill psnap pppoe p8022 llc ax25 pppox ppp_generic slhc snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_intel snd_hda_codec pcspkr snd_hwdep snd_seq snd_seq_device snd_pcm e1000e snd_page_alloc snd_timer ptp snd pps_core soundcore
irq event stamp: 0
hardirqs last enabled at (0): [< (null)>] (null)
hardirqs last disabled at (0): [<ffffffff8104aa61>] copy_process.part.27+0x611/0x15b0
softirqs last enabled at (0): [<ffffffff8104aa61>] copy_process.part.27+0x611/0x15b0
softirqs last disabled at (0): [< (null)>] (null)
CPU: 3 PID: 14565 Comm: trinity-child1 Not tainted 3.10.0+ #20
task: ffff88022caa0000 ti: ffff880212676000 task.ti: ffff880212676000
RIP: 0010:[<ffffffff81056ba1>] [<ffffffff81056ba1>] __do_softirq+0xb1/0x440
RSP: 0000:ffff880244e03f10 EFLAGS: 00000246
RAX: ffff880212677fd8 RBX: ffffffff816fc660 RCX: 0000000010624dd3
RDX: 0000000000000100 RSI: 0000000000000000 RDI: ffffffff81056b99
RBP: ffff880244e03f78 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff880244e03e88
R13: ffffffff817053ef R14: ffff880244e03f78 R15: ffff88022caa0000
FS: 00007f959367e740(0000) GS:ffff880244e00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f8c6657e968 CR3: 00000001cb73e000 CR4: 00000000001407e0
DR0: 0000000000a51000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
Stack:
0000000a00406040 000000010003d260 ffff880212677fd8 ffff880212677fd8
ffff880212677fd8 ffff88022caa03f8 ffff880212677fd8 ffffffff00000003
ffff88022caa0000 ffff880212677fd8 0000000002461be8 ffff880212677f58
Call Trace:
<IRQ>
[<ffffffff810570ed>] irq_exit+0xcd/0xe0
[<ffffffff81706765>] smp_apic_timer_interrupt+0x45/0x60
[<ffffffff817053ef>] apic_timer_interrupt+0x6f/0x80
<EOI>
[<ffffffff816fc660>] ? retint_restore_args+0xe/0xe
[<ffffffff816fc776>] ? retint_kernel+0x26/0x30
[<ffffffff816fa24d>] ? preempt_schedule_irq+0x4d/0x90
[<ffffffff816f9230>] ? wait_for_completion_interruptible+0x180/0x180
[<ffffffff816fa253>] ? preempt_schedule_irq+0x53/0x90
[<ffffffff816fc776>] retint_kernel+0x26/0x30
[<ffffffff8117da38>] ? find_vma+0x58/0x60
[<ffffffff816ffa79>] __do_page_fault+0x109/0x5d0
[<ffffffff810b7a6d>] ? trace_hardirqs_off+0xd/0x10
[<ffffffff811106ef>] ? __acct_update_integrals+0x7f/0xf0
[<ffffffff8108fc5d>] ? get_parent_ip+0xd/0x50
[<ffffffff817004a1>] ? sub_preempt_count+0x71/0x100
[<ffffffff816fb931>] ? _raw_spin_unlock+0x31/0x60
[<ffffffff81095ab9>] ? vtime_account_user+0x69/0x70
[<ffffffff816fff6c>] do_page_fault+0x2c/0x50
[<ffffffff816fc8a2>] page_fault+0x22/0x30
Code: 48 89 45 b8 48 89 45 b0 48 89 45 a8 66 0f 1f 44 00 00 65 c7 04 25 80 0d 1d 00 00 00 00 00 e8 17 47 06 00 fb 49 c7 c6 00 41 c0 81 <eb> 0e 0f 1f 44 00 00 49 83 c6 08 41 d1 ef 74 6c 41 f6 c7 01 74
Bridge firewalling registered
perf samples too long (2519 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
INFO: NMI handler (perf_event_nmi_handler) took too long to run: 238147.002 msecs

2013-07-05 15:15:11

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Yet more softlockups.

On Fri, 5 Jul 2013, Dave Jones wrote:

> BUG: soft lockup - CPU#3 stuck for 23s! [trinity-child1:14565]
> perf samples too long (2519 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
> INFO: NMI handler (perf_event_nmi_handler) took too long to run: 238147.002 msecs

So we see a softlockup of 23 seconds and the perf_event_nmi_handler
claims it did run 23.8 seconds.

Are there more instances of NMI handler messages ?

Thanks,

tglx

2013-07-05 16:01:30

by Dave Jones

[permalink] [raw]
Subject: Re: Yet more softlockups.

On Fri, Jul 05, 2013 at 05:15:07PM +0200, Thomas Gleixner wrote:
> On Fri, 5 Jul 2013, Dave Jones wrote:
>
> > BUG: soft lockup - CPU#3 stuck for 23s! [trinity-child1:14565]
> > perf samples too long (2519 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
> > INFO: NMI handler (perf_event_nmi_handler) took too long to run: 238147.002 msecs
>
> So we see a softlockup of 23 seconds and the perf_event_nmi_handler
> claims it did run 23.8 seconds.
>
> Are there more instances of NMI handler messages ?

[ 2552.006181] perf samples too long (2511 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
[ 2552.008680] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 500392.002 msecs

Dave

2013-07-05 16:02:45

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Yet more softlockups.

On Fri, 5 Jul 2013, Dave Jones wrote:
> On Fri, Jul 05, 2013 at 05:15:07PM +0200, Thomas Gleixner wrote:
> > On Fri, 5 Jul 2013, Dave Jones wrote:
> >
> > > BUG: soft lockup - CPU#3 stuck for 23s! [trinity-child1:14565]
> > > perf samples too long (2519 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
> > > INFO: NMI handler (perf_event_nmi_handler) took too long to run: 238147.002 msecs
> >
> > So we see a softlockup of 23 seconds and the perf_event_nmi_handler
> > claims it did run 23.8 seconds.
> >
> > Are there more instances of NMI handler messages ?
>
> [ 2552.006181] perf samples too long (2511 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
> [ 2552.008680] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 500392.002 msecs

Yuck. Spending 50 seconds in NMI context surely explains a softlockup :)

2013-07-05 16:42:11

by H. Peter Anvin

[permalink] [raw]
Subject: Re: Yet more softlockups.

On 07/05/2013 09:02 AM, Thomas Gleixner wrote:
> On Fri, 5 Jul 2013, Dave Jones wrote:
>> On Fri, Jul 05, 2013 at 05:15:07PM +0200, Thomas Gleixner wrote:
>> > On Fri, 5 Jul 2013, Dave Jones wrote:
>> >
>> > > BUG: soft lockup - CPU#3 stuck for 23s! [trinity-child1:14565]
>> > > perf samples too long (2519 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
>> > > INFO: NMI handler (perf_event_nmi_handler) took too long to run: 238147.002 msecs
>> >
>> > So we see a softlockup of 23 seconds and the perf_event_nmi_handler
>> > claims it did run 23.8 seconds.
>> >
>> > Are there more instances of NMI handler messages ?
>>
>> [ 2552.006181] perf samples too long (2511 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
>> [ 2552.008680] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 500392.002 msecs
>
> Yuck. Spending 50 seconds in NMI context surely explains a softlockup :)
>

Hmmm... this makes me wonder if the interrupt tracepoint stuff is at
fault here, as it changes the IDT handling for NMI context.

-hpa

2013-07-05 18:21:32

by Seiji Aguchi

[permalink] [raw]
Subject: RE: Yet more softlockups.



> -----Original Message-----
> From: H. Peter Anvin [mailto:[email protected]]
> Sent: Friday, July 05, 2013 12:41 PM
> To: Thomas Gleixner
> Cc: Dave Jones; Linus Torvalds; Linux Kernel; Ingo Molnar; Peter Zijlstra; Seiji Aguchi
> Subject: Re: Yet more softlockups.
>
> On 07/05/2013 09:02 AM, Thomas Gleixner wrote:
> > On Fri, 5 Jul 2013, Dave Jones wrote:
> >> On Fri, Jul 05, 2013 at 05:15:07PM +0200, Thomas Gleixner wrote:
> >> > On Fri, 5 Jul 2013, Dave Jones wrote:
> >> >
> >> > > BUG: soft lockup - CPU#3 stuck for 23s! [trinity-child1:14565]
> >> > > perf samples too long (2519 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
> >> > > INFO: NMI handler (perf_event_nmi_handler) took too long to run: 238147.002 msecs
> >> >
> >> > So we see a softlockup of 23 seconds and the perf_event_nmi_handler
> >> > claims it did run 23.8 seconds.
> >> >
> >> > Are there more instances of NMI handler messages ?
> >>
> >> [ 2552.006181] perf samples too long (2511 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
> >> [ 2552.008680] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 500392.002 msecs
> >
> > Yuck. Spending 50 seconds in NMI context surely explains a softlockup :)
> >
>
> Hmmm... this makes me wonder if the interrupt tracepoint stuff is at
> fault here, as it changes the IDT handling for NMI context.

This softlockup happens while disabling the interrupt tracepoints,
Because if it is enabled, "smp_trace_apic_timer_interrupt" is displayed
instead of "smp_apic_timer_interrupt" in the call trace below.

But I can't say anything how this issue is related to the tracepoint stuff,
I need to reproduce it on my machine first.

Call Trace:
<IRQ>
[<ffffffff8105424f>] __do_softirq+0xff/0x440
[<ffffffff8105474d>] irq_exit+0xcd/0xe0
[<ffffffff816f5fcb>] smp_apic_timer_interrupt+0x6b/0x9b
[<ffffffff816f512f>] apic_timer_interrupt+0x6f/0x80

Seiji

2013-07-05 19:27:53

by Linus Torvalds

[permalink] [raw]
Subject: Re: scheduling while atomic & hang.

On Thu, Jul 4, 2013 at 11:51 PM, Ingo Molnar <[email protected]> wrote:
>
> It would be nice to have a full-kernel-stack backtrace printout as an
> option, which prints entries 'above' the current RSP.

One problem with that is that it is likely to be mostly overwritten by
the debug code that is about to print this all out - and any non-debug
code may well be really old, just with a sufficiently deep stack trace
to show up. Not to mention missing any irq-stack changes etc anyway.

So it can easily be *very* misleading - I'm not sure how useful it
would be as an idea.

Linus

2013-07-05 22:21:59

by Thomas Gleixner

[permalink] [raw]
Subject: RE: Yet more softlockups.

On Fri, 5 Jul 2013, Seiji Aguchi wrote:
> > -----Original Message-----
> > Hmmm... this makes me wonder if the interrupt tracepoint stuff is at
> > fault here, as it changes the IDT handling for NMI context.
>
> This softlockup happens while disabling the interrupt tracepoints,
> Because if it is enabled, "smp_trace_apic_timer_interrupt" is displayed
> instead of "smp_apic_timer_interrupt" in the call trace below.
>
> But I can't say anything how this issue is related to the tracepoint stuff,

I doubt it is related. I rather suspect that trinity is able to fuzz
perf into a DoS facility.

Dave, can you trace the perf sys calls and dump that data over serial
when the softlockup hits?

Thanks,

tglx

2013-07-06 07:24:14

by Ingo Molnar

[permalink] [raw]
Subject: Re: Yet more softlockups.


* Dave Jones <[email protected]> wrote:

> On Fri, Jul 05, 2013 at 05:15:07PM +0200, Thomas Gleixner wrote:
> > On Fri, 5 Jul 2013, Dave Jones wrote:
> >
> > > BUG: soft lockup - CPU#3 stuck for 23s! [trinity-child1:14565]
> > > perf samples too long (2519 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
> > > INFO: NMI handler (perf_event_nmi_handler) took too long to run: 238147.002 msecs
> >
> > So we see a softlockup of 23 seconds and the perf_event_nmi_handler
> > claims it did run 23.8 seconds.
> >
> > Are there more instances of NMI handler messages ?
>
> [ 2552.006181] perf samples too long (2511 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
> [ 2552.008680] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 500392.002 msecs

Dave, could you pull in the latest perf fixes at:

git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git perf/urgent

In particular this:

e5302920da9e perf: Fix interrupt handler timing harness

could make a difference - if your tests somehow end up activating perf.

Thanks,

Ingo

2013-07-06 08:02:38

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: scheduling while atomic & hang.

On Fri, Jul 05, 2013 at 12:27:51PM -0700, Linus Torvalds wrote:
> On Thu, Jul 4, 2013 at 11:51 PM, Ingo Molnar <[email protected]> wrote:
> >
> > It would be nice to have a full-kernel-stack backtrace printout as an
> > option, which prints entries 'above' the current RSP.
>
> One problem with that is that it is likely to be mostly overwritten by
> the debug code that is about to print this all out - and any non-debug
> code may well be really old, just with a sufficiently deep stack trace
> to show up. Not to mention missing any irq-stack changes etc anyway.
>
> So it can easily be *very* misleading - I'm not sure how useful it
> would be as an idea.
>
> Linus

So if the unwider doesn't run with the same stak than the target,
it migh work. This can work with BUG for example But I believe that WARN
doesn't do a trap in x86.

2013-07-06 08:07:51

by Ingo Molnar

[permalink] [raw]
Subject: Re: scheduling while atomic & hang.


* Frederic Weisbecker <[email protected]> wrote:

> On Fri, Jul 05, 2013 at 12:27:51PM -0700, Linus Torvalds wrote:
> > On Thu, Jul 4, 2013 at 11:51 PM, Ingo Molnar <[email protected]> wrote:
> > >
> > > It would be nice to have a full-kernel-stack backtrace printout as an
> > > option, which prints entries 'above' the current RSP.
> >
> > One problem with that is that it is likely to be mostly overwritten by
> > the debug code that is about to print this all out - and any non-debug
> > code may well be really old, just with a sufficiently deep stack trace
> > to show up. Not to mention missing any irq-stack changes etc anyway.
> >
> > So it can easily be *very* misleading - I'm not sure how useful it
> > would be as an idea.
> >
> > Linus
>
> So if the unwider doesn't run with the same stak than the target, it
> migh work. This can work with BUG for example But I believe that WARN
> doesn't do a trap in x86.

Instead of switching stacks it might be better to create an early snapshot
of the stack into a temporary area before calling anything, and to print
that. This is debug code, so overhead is not a big issue.

But yeah, Linus is right, it gets hairy, can be misleading and has limited
utility :-/

Thanks,

Ingo

2013-07-07 00:25:36

by Dave Jones

[permalink] [raw]
Subject: Re: Yet more softlockups.

On Sat, Jul 06, 2013 at 09:24:08AM +0200, Ingo Molnar wrote:
>
> * Dave Jones <[email protected]> wrote:
>
> > On Fri, Jul 05, 2013 at 05:15:07PM +0200, Thomas Gleixner wrote:
> > > On Fri, 5 Jul 2013, Dave Jones wrote:
> > >
> > > > BUG: soft lockup - CPU#3 stuck for 23s! [trinity-child1:14565]
> > > > perf samples too long (2519 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
> > > > INFO: NMI handler (perf_event_nmi_handler) took too long to run: 238147.002 msecs
> > >
> > > So we see a softlockup of 23 seconds and the perf_event_nmi_handler
> > > claims it did run 23.8 seconds.
> > >
> > > Are there more instances of NMI handler messages ?
> >
> > [ 2552.006181] perf samples too long (2511 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
> > [ 2552.008680] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 500392.002 msecs
>
> Dave, could you pull in the latest perf fixes at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git perf/urgent
>
> In particular this:
>
> e5302920da9e perf: Fix interrupt handler timing harness
>
> could make a difference - if your tests somehow end up activating perf.

Not seeing the NMI printk (yet?), but saw this..

[ 2322.758913] perf samples too long (2502 > 2500), lowering kernel.perf_event_max_sample_rate to 50000

Dave

2013-07-10 15:14:12

by Dave Jones

[permalink] [raw]
Subject: Re: Yet more softlockups.

On Sat, Jul 06, 2013 at 09:24:08AM +0200, Ingo Molnar wrote:
>
> * Dave Jones <[email protected]> wrote:
>
> > On Fri, Jul 05, 2013 at 05:15:07PM +0200, Thomas Gleixner wrote:
> > > On Fri, 5 Jul 2013, Dave Jones wrote:
> > >
> > > > BUG: soft lockup - CPU#3 stuck for 23s! [trinity-child1:14565]
> > > > perf samples too long (2519 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
> > > > INFO: NMI handler (perf_event_nmi_handler) took too long to run: 238147.002 msecs
> > >
> > > So we see a softlockup of 23 seconds and the perf_event_nmi_handler
> > > claims it did run 23.8 seconds.
> > >
> > > Are there more instances of NMI handler messages ?
> >
> > [ 2552.006181] perf samples too long (2511 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
> > [ 2552.008680] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 500392.002 msecs
>
> Dave, could you pull in the latest perf fixes at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git perf/urgent
>
> In particular this:
>
> e5302920da9e perf: Fix interrupt handler timing harness
>
> could make a difference - if your tests somehow end up activating perf.

Something is really fucked up in the kernel side of perf.
I get this right after booting..

[ 114.516619] perf samples too long (4262 > 2500), lowering kernel.perf_event_max_sample_rate to 50000

That's before I even get a chance to log in, so I'm pretty sure perf isn't even being run.

And this is just booting up my desktop, not fuzzing.

Dave

2013-07-10 15:20:19

by Markus Trippelsdorf

[permalink] [raw]
Subject: Re: Yet more softlockups.

On 2013.07.10 at 11:13 -0400, Dave Jones wrote:
> On Sat, Jul 06, 2013 at 09:24:08AM +0200, Ingo Molnar wrote:
> >
> > * Dave Jones <[email protected]> wrote:
> >
> > > On Fri, Jul 05, 2013 at 05:15:07PM +0200, Thomas Gleixner wrote:
> > > > On Fri, 5 Jul 2013, Dave Jones wrote:
> > > >
> > > > > BUG: soft lockup - CPU#3 stuck for 23s! [trinity-child1:14565]
> > > > > perf samples too long (2519 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
> > > > > INFO: NMI handler (perf_event_nmi_handler) took too long to run: 238147.002 msecs
> > > >
> > > > So we see a softlockup of 23 seconds and the perf_event_nmi_handler
> > > > claims it did run 23.8 seconds.
> > > >
> > > > Are there more instances of NMI handler messages ?
> > >
> > > [ 2552.006181] perf samples too long (2511 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
> > > [ 2552.008680] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 500392.002 msecs
> >
> > Dave, could you pull in the latest perf fixes at:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git perf/urgent
> >
> > In particular this:
> >
> > e5302920da9e perf: Fix interrupt handler timing harness
> >
> > could make a difference - if your tests somehow end up activating perf.
>
> Something is really fucked up in the kernel side of perf.
> I get this right after booting..
>
> [ 114.516619] perf samples too long (4262 > 2500), lowering kernel.perf_event_max_sample_rate to 50000

You can disable this warning by:

echo 0 > /proc/sys/kernel/perf_cpu_time_max_percent

--
Markus

2013-07-10 15:39:06

by Vince Weaver

[permalink] [raw]
Subject: Re: Yet more softlockups.

On Wed, 10 Jul 2013, Dave Jones wrote:

> Something is really fucked up in the kernel side of perf.
> I get this right after booting..
>
> [ 114.516619] perf samples too long (4262 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
>
> That's before I even get a chance to log in, so I'm pretty sure perf isn't even being run.
>
> And this is just booting up my desktop, not fuzzing.

This is probably related to the problem described in the thread
perf: fix interrupt handler timing harness
http://comments.gmane.org/gmane.linux.kernel/1520775
which in theory has a fix. Not sure why it'd trigger during boot.
NMI watchdog?

Vince

2013-07-10 15:41:13

by Dave Jones

[permalink] [raw]
Subject: Re: Yet more softlockups.

On Wed, Jul 10, 2013 at 05:20:15PM +0200, Markus Trippelsdorf wrote:
> On 2013.07.10 at 11:13 -0400, Dave Jones wrote:
> > I get this right after booting..
> >
> > [ 114.516619] perf samples too long (4262 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
>
> You can disable this warning by:
>
> echo 0 > /proc/sys/kernel/perf_cpu_time_max_percent

Yes, but why is this even being run when I'm not running perf ?

The only NMI source running should be the watchdog.

Dave

2013-07-10 15:46:19

by Dave Jones

[permalink] [raw]
Subject: Re: Yet more softlockups.

On Wed, Jul 10, 2013 at 11:39:50AM -0400, Vince Weaver wrote:
> On Wed, 10 Jul 2013, Dave Jones wrote:
>
> > Something is really fucked up in the kernel side of perf.
> > I get this right after booting..
> >
> > [ 114.516619] perf samples too long (4262 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
> >
> > That's before I even get a chance to log in, so I'm pretty sure perf isn't even being run.
> >
> > And this is just booting up my desktop, not fuzzing.
>
> This is probably related to the problem described in the thread
> perf: fix interrupt handler timing harness
> http://comments.gmane.org/gmane.linux.kernel/1520775
> which in theory has a fix. Not sure why it'd trigger during boot.

Ah, missed that thread. Thanks for the pointer. That looks important,
and might actually explain some of the other oddness I've been seeing.

> NMI watchdog?

likely.

Davek

2013-07-10 20:54:40

by J. Bruce Fields

[permalink] [raw]
Subject: Re: scheduling while atomic & hang.

On Thu, Jul 04, 2013 at 03:49:02AM -0400, Dave Jones wrote:
> I don't use the auto config, because I end up filling up /boot
> unless I go through and clean them out by hand every time I install
> a new one (which I do probably a dozen or so times a day).
> Is there some easy way to prune old builds I'm missing ?

I run this by hand every now and then. I'm probably doing it all wrong.

--b.

#!/bin/bash

# because I use CONFIG_LOCALVERSION_AUTO, not the same version again and
# again, /boot and /lib/modules/ eventually fill up.
# Dumb script to purge that stuff:

ssh "root@$1" '

for f in $(ls /lib/modules); do
if rpm -qf "/lib/modules/$f" >/dev/null; then
echo "keeping $f (installed from rpm)"
elif [ $(uname -r) = "$f" ]; then
echo "keeping $f (running kernel) "
else
echo "removing $f"
rm -f "/boot/initramfs-$f.img" "/boot/System.map-$f"
rm -f "/boot/vmlinuz-$f" "/boot/config-$f"
rm -rf "/lib/modules/$f"
new-kernel-pkg --remove $f
fi
done
'

2013-07-10 21:54:56

by Dave Jones

[permalink] [raw]
Subject: Re: Yet more softlockups.

On Wed, Jul 10, 2013 at 11:39:50AM -0400, Vince Weaver wrote:
> On Wed, 10 Jul 2013, Dave Jones wrote:
>
> > Something is really fucked up in the kernel side of perf.
> > I get this right after booting..
> >
> > [ 114.516619] perf samples too long (4262 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
> >
> > That's before I even get a chance to log in, so I'm pretty sure perf isn't even being run.
> >
> > And this is just booting up my desktop, not fuzzing.
>
> This is probably related to the problem described in the thread
> perf: fix interrupt handler timing harness
> http://comments.gmane.org/gmane.linux.kernel/1520775
> which in theory has a fix. Not sure why it'd trigger during boot.

Bah. That's not it either.

[ 1903.293233] perf samples too long (2514 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
[ 1903.295248] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 19498.002 msecs

Dave

2013-07-12 10:31:23

by Ingo Molnar

[permalink] [raw]
Subject: Re: Yet more softlockups.


* Dave Jones <[email protected]> wrote:

> On Wed, Jul 10, 2013 at 05:20:15PM +0200, Markus Trippelsdorf wrote:
> > On 2013.07.10 at 11:13 -0400, Dave Jones wrote:
> > > I get this right after booting..
> > >
> > > [ 114.516619] perf samples too long (4262 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
> >
> > You can disable this warning by:
> >
> > echo 0 > /proc/sys/kernel/perf_cpu_time_max_percent
>
> Yes, but why is this even being run when I'm not running perf ?
>
> The only NMI source running should be the watchdog.

The NMI watchdog is a perf event.

I've Cc:-ed Dave Hansen, the author of those changes - is this a false
positive or some real problem?

Thanks,

Ingo

2013-07-12 15:39:15

by Dave Hansen

[permalink] [raw]
Subject: Re: Yet more softlockups.

On 07/12/2013 03:31 AM, Ingo Molnar wrote:
> * Dave Jones <[email protected]> wrote:
>> On Wed, Jul 10, 2013 at 05:20:15PM +0200, Markus Trippelsdorf wrote:
>> > On 2013.07.10 at 11:13 -0400, Dave Jones wrote:
>> > > I get this right after booting..
>> > >
>> > > [ 114.516619] perf samples too long (4262 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
>> >
>> > You can disable this warning by:
>> >
>> > echo 0 > /proc/sys/kernel/perf_cpu_time_max_percent
>>
>> Yes, but why is this even being run when I'm not running perf ?
>>
>> The only NMI source running should be the watchdog.
>
> The NMI watchdog is a perf event.
>
> I've Cc:-ed Dave Hansen, the author of those changes - is this a false
> positive or some real problem?

The warning comes from calling perf_sample_event_took(), which is only
called from one place: perf_event_nmi_handler().

So we can be pretty sure that the perf NMI is firing, or at least that
this handler code is running.

nmi_handle() says:
/*
* NMIs are edge-triggered, which means if you have enough
* of them concurrently, you can lose some because only one
* can be latched at any given time. Walk the whole list
* to handle those situations.
*/

perf_event_nmi_handler() probably gets _called_ when the watchdog NMI
goes off. But, it should hit this check:

if (!atomic_read(&active_events))
return NMI_DONE;

and return quickly. This is before it has a chance to call
perf_sample_event_took().

Dave, for your case, my suspicion would be that it got turned on
inadvertently, or that we somehow have a bug which bumped up
perf_event.c's 'active_events' and we're running some perf code that we
don't have to.

But, I'm suspicious. I was having all kinds of issues with perf and
NMIs taking hundreds of milliseconds. I never isolated it to having a
real, single, cause. I attributed it to my large NUMA system just being
slow. Your description makes me wonder what I missed, though.

2013-07-12 15:46:09

by Dave Jones

[permalink] [raw]
Subject: Re: Yet more softlockups.

On Fri, Jul 12, 2013 at 08:38:52AM -0700, Dave Hansen wrote:

> The warning comes from calling perf_sample_event_took(), which is only
> called from one place: perf_event_nmi_handler().
>
> So we can be pretty sure that the perf NMI is firing, or at least that
> this handler code is running.
>
> nmi_handle() says:
> /*
> * NMIs are edge-triggered, which means if you have enough
> * of them concurrently, you can lose some because only one
> * can be latched at any given time. Walk the whole list
> * to handle those situations.
> */
>
> perf_event_nmi_handler() probably gets _called_ when the watchdog NMI
> goes off. But, it should hit this check:
>
> if (!atomic_read(&active_events))
> return NMI_DONE;
>
> and return quickly. This is before it has a chance to call
> perf_sample_event_took().
>
> Dave, for your case, my suspicion would be that it got turned on
> inadvertently, or that we somehow have a bug which bumped up
> perf_event.c's 'active_events' and we're running some perf code that we
> don't have to.

What do you 'inadvertantly' ? I see this during bootup every time.
Unless systemd or something has started playing with perf, (which afaik it isn't)

> But, I'm suspicious. I was having all kinds of issues with perf and
> NMIs taking hundreds of milliseconds. I never isolated it to having a
> real, single, cause. I attributed it to my large NUMA system just being
> slow. Your description makes me wonder what I missed, though.

Here's a fun trick:

trinity -c perf_event_open -C4 -q -l off

Within about a minute, that brings any of my boxes to its knees.
The softlockup detector starts going nuts, and then the box wedges solid.

(You may need to bump -C depending on your CPU count. I've never seen it happen
with a single process, but -C2 seems to be a minimum)

That *is* using perf though, so I kind of expect bad shit to happen when there are bugs.
The "during bootup" case is still a head-scratcher.

Dave

2013-07-12 15:55:40

by Dave Hansen

[permalink] [raw]
Subject: Re: Yet more softlockups.

On 07/12/2013 08:45 AM, Dave Jones wrote:
> On Fri, Jul 12, 2013 at 08:38:52AM -0700, Dave Hansen wrote:
> > Dave, for your case, my suspicion would be that it got turned on
> > inadvertently, or that we somehow have a bug which bumped up
> > perf_event.c's 'active_events' and we're running some perf code that we
> > don't have to.
>
> What do you 'inadvertantly' ? I see this during bootup every time.
> Unless systemd or something has started playing with perf, (which afaik it isn't)

I mean that somebody turned 'active_events' on without actually wanting
perf to be on. I'd be curious how it got set to something nonzero.
Could you stick a WARN_ONCE() or printk_ratelimit() on the three sites
that modify it?

> > But, I'm suspicious. I was having all kinds of issues with perf and
> > NMIs taking hundreds of milliseconds. I never isolated it to having a
> > real, single, cause. I attributed it to my large NUMA system just being
> > slow. Your description makes me wonder what I missed, though.
>
> Here's a fun trick:
>
> trinity -c perf_event_open -C4 -q -l off
>
> Within about a minute, that brings any of my boxes to its knees.
> The softlockup detector starts going nuts, and then the box wedges solid.

On my box, the same happens with 'perf top'. ;)

*But* dropping the perf sample rate has been really effective at keeping
me from hitting it, and I've had to use _lots_ of CPUs (60-160) doing
those NMIs at once to trigger the lockups.

Being able to trigger it with so few CPUs is interesting though. I'll
try on my hardware.

2013-07-12 17:03:50

by Dave Jones

[permalink] [raw]
Subject: Re: Yet more softlockups.

On Fri, Jul 12, 2013 at 08:55:31AM -0700, Dave Hansen wrote:

> I mean that somebody turned 'active_events' on without actually wanting
> perf to be on. I'd be curious how it got set to something nonzero.
> Could you stick a WARN_ONCE() or printk_ratelimit() on the three sites
> that modify it?

I'll add to my list of things to get to, but it probably won't be until post-weekend.

> > Here's a fun trick:
> >
> > trinity -c perf_event_open -C4 -q -l off
> >
> > Within about a minute, that brings any of my boxes to its knees.
> > The softlockup detector starts going nuts, and then the box wedges solid.
>
> *But* dropping the perf sample rate has been really effective at keeping
> me from hitting it, and I've had to use _lots_ of CPUs (60-160) doing
> those NMIs at once to trigger the lockups.
>
> Being able to trigger it with so few CPUs is interesting though. I'll
> try on my hardware.

I hacked trinity to pause for 24s after each call, and changed the kernel
to taint on lockup (so that trinity would immediatly stop).
My hope was to find the combination of perf_event_open calls that triggered this.
After 12 hours, it's still ticking along. It's now done about a thousand
more calls than is usually necessary to hit the bug. Which makes me wonder
if this is timing related somehow.

Perhaps also worth noting that the majority of calls trinity makes will -EINVAL

Dave

2013-07-12 17:12:19

by David Ahern

[permalink] [raw]
Subject: Re: Yet more softlockups.

On 7/12/13 9:45 AM, Dave Jones wrote:
> Here's a fun trick:
>
> trinity -c perf_event_open -C4 -q -l off
>
> Within about a minute, that brings any of my boxes to its knees.
> The softlockup detector starts going nuts, and then the box wedges solid.

I tried that in a VM running latest Linus tree. I see trinity children
getting nuked regularly from oom.

I was dumping Vm elements using:

while [ 1 ]; do echo $(date) $(egrep Vm /proc/$pid/status); sleep 1; done

And right before the process is killed was the line:

Fri Jul 12 11:00:19 MDT 2013 VmPeak: 2867472 kB VmSize: 2867472 kB
VmLck: 0 kB VmPin: 0 kB VmHWM: 1493092 kB VmRSS: 1493092 kB VmData:
2857944 kB VmStk: 136 kB VmExe: 100 kB VmLib: 1844 kB VmPTE: 5628 kB
VmSwap: 0 kB

The VmData is growing fairly steadily and strace shows a lot of brk
calls. Is that normal for trinity - or this command line?

Looking at the perf_event_open calls I see a lot of E2BIG errors in
addition to EINVAL. e.g,

...
perf_event_open(0xba9000, 0, 0x4c, 0xcc, 0) = -1 EINVAL (Invalid argument)
alarm(0) = 1
getppid() = 9031
alarm(1) = 0
perf_event_open(0xba9000, 0x2a6e, 0xe, 0xfd, 0) = -1 E2BIG (Argument
list too long)
alarm(0) = 1
getppid() = 9031
alarm(1) = 0
...

David

2013-07-12 17:19:13

by Dave Jones

[permalink] [raw]
Subject: Re: Yet more softlockups.

On Fri, Jul 12, 2013 at 11:12:13AM -0600, David Ahern wrote:
> On 7/12/13 9:45 AM, Dave Jones wrote:
> > Here's a fun trick:
> >
> > trinity -c perf_event_open -C4 -q -l off
> >
> > Within about a minute, that brings any of my boxes to its knees.
> > The softlockup detector starts going nuts, and then the box wedges solid.
>
> I tried that in a VM running latest Linus tree. I see trinity children
> getting nuked regularly from oom.

Weird. I'm curious what the backtrace looks like in those cases.
Where is it trying to allocate memory ?
(Though that isn't usually too helpful in most cases, but in absense of
anything else..)

> I was dumping Vm elements using:
>
> while [ 1 ]; do echo $(date) $(egrep Vm /proc/$pid/status); sleep 1; done
>
> And right before the process is killed was the line:
>
> Fri Jul 12 11:00:19 MDT 2013 VmPeak: 2867472 kB VmSize: 2867472 kB
> VmLck: 0 kB VmPin: 0 kB VmHWM: 1493092 kB VmRSS: 1493092 kB VmData:
> 2857944 kB VmStk: 136 kB VmExe: 100 kB VmLib: 1844 kB VmPTE: 5628 kB
> VmSwap: 0 kB
>
> The VmData is growing fairly steadily and strace shows a lot of brk
> calls. Is that normal for trinity - or this command line?

Hmm, there are a few known leaks in trinity that I still haven't got
around to fixing, but none that should be triggered from this.

I wonder if something about being virtualised is perturbing its behaviour
somehow. Though I can't think what would matter.

> Looking at the perf_event_open calls I see a lot of E2BIG errors in
> addition to EINVAL. e.g,
>
> ...
> perf_event_open(0xba9000, 0, 0x4c, 0xcc, 0) = -1 EINVAL (Invalid argument)
> alarm(0) = 1
> getppid() = 9031
> alarm(1) = 0
> perf_event_open(0xba9000, 0x2a6e, 0xe, 0xfd, 0) = -1 E2BIG (Argument
> list too long)
> alarm(0) = 1
> getppid() = 9031
> alarm(1) = 0

Yep, that'll happen.

Perhaps try CONFIG_MEMLEAK ?

Dave

2013-07-12 17:40:13

by David Ahern

[permalink] [raw]
Subject: Re: Yet more softlockups.

On 7/12/13 11:18 AM, Dave Jones wrote:
> On Fri, Jul 12, 2013 at 11:12:13AM -0600, David Ahern wrote:
> > On 7/12/13 9:45 AM, Dave Jones wrote:
> > > Here's a fun trick:
> > >
> > > trinity -c perf_event_open -C4 -q -l off
> > >
> > > Within about a minute, that brings any of my boxes to its knees.
> > > The softlockup detector starts going nuts, and then the box wedges solid.
> >
> > I tried that in a VM running latest Linus tree. I see trinity children
> > getting nuked regularly from oom.
>
> Weird. I'm curious what the backtrace looks like in those cases.
> Where is it trying to allocate memory ?
> (Though that isn't usually too helpful in most cases, but in absense of
> anything else..)

(gdb) bt
#0 0x000000313b27f3e0 in malloc () from /lib64/libc.so.6
#1 0x0000000000404405 in _get_address
(null_allowed=null_allowed@entry=1 '\001') at generic-sanitise.c:151
#2 0x00000000004044ca in get_address () at generic-sanitise.c:182
#3 0x00000000004052a0 in fill_arg (childno=<optimized out>,
call=call@entry=298, argnum=argnum@entry=1) at generic-sanitise.c:415
#4 0x000000000040548d in generic_sanitise (childno=childno@entry=0) at
generic-sanitise.c:615
#5 0x0000000000405620 in mkcall (childno=childno@entry=0) at syscall.c:131
#6 0x0000000000407d85 in child_process () at child.c:219
#7 0x00000000004073ad in fork_children () at main.c:103
#8 main_loop () at main.c:308
#9 do_main_loop () at main.c:342
#10 0x000000000040253a in main (argc=<optimized out>, argv=<optimized
out>) at trinity.c:180


In _get_address, case 8 must be happening a lot and I don't see a free
when that address comes from malloc. Perhaps all of the rand() calls are
breaking down in the VM. If I change that case from malloc to something
static - like page_rand - memory stays flat.

David

2013-07-12 17:51:15

by Dave Jones

[permalink] [raw]
Subject: Re: Yet more softlockups.

On Fri, Jul 12, 2013 at 11:40:06AM -0600, David Ahern wrote:
> On 7/12/13 11:18 AM, Dave Jones wrote:
> > On Fri, Jul 12, 2013 at 11:12:13AM -0600, David Ahern wrote:
> > > On 7/12/13 9:45 AM, Dave Jones wrote:
> > > > Here's a fun trick:
> > > >
> > > > trinity -c perf_event_open -C4 -q -l off
> > > >
>
> In _get_address, case 8 must be happening a lot and I don't see a free
> when that address comes from malloc. Perhaps all of the rand() calls are
> breaking down in the VM. If I change that case from malloc to something
> static - like page_rand - memory stays flat.

Oh right, yeah, that's a known bug. I forgot about the ARG_ADDRESS case.
Another hacky workaround is to bound the trinity runs with -N1000000 or something
and do that in a shell loop so it frees it up on exit.

I need to find the time to add proper callbacks to release allocated memory.

Given you can run trinity long enough that you hit this however, makes me
think you won't be able to triger the bug I'm talking about.
Perhaps virtualised perf counters are somehow immune to this problem, because
on bare-metal, it literally takes seconds.

Dave

2013-07-12 18:07:34

by David Ahern

[permalink] [raw]
Subject: Re: Yet more softlockups.

On 7/12/13 11:50 AM, Dave Jones wrote:

> Given you can run trinity long enough that you hit this however, makes me
> think you won't be able to triger the bug I'm talking about.
> Perhaps virtualised perf counters are somehow immune to this problem, because
> on bare-metal, it literally takes seconds.

I can generate the message it just does not lock up:

...
[ 131.837129] perf samples too long (35849 > 35714), lowering
kernel.perf_event_max_sample_rate to 4000
...
[ 237.271170] INFO: NMI handler (perf_event_nmi_handler) took too long
to run: 57900.001 msecs


And Dave Hansen: I think nmi.c has the same do_div problem as
kernel/events/core.c that Stephane fixed. Your patch has:

whole_msecs = do_div(delta, (1000 * 1000));
decimal_msecs = do_div(delta, 1000) % 1000;

David

2013-07-12 18:22:22

by Dave Hansen

[permalink] [raw]
Subject: Re: Yet more softlockups.

On 07/12/2013 11:07 AM, David Ahern wrote:
> And Dave Hansen: I think nmi.c has the same do_div problem as
> kernel/events/core.c that Stephane fixed. Your patch has:
>
> whole_msecs = do_div(delta, (1000 * 1000));
> decimal_msecs = do_div(delta, 1000) % 1000;

Yup. There should be fixes in the x86 tip tree for both of these now.

2013-07-12 20:13:22

by Dave Hansen

[permalink] [raw]
Subject: Re: Yet more softlockups.

I added the WARN_ONCE() the first time we enable a perf event: The
watchdog code looks to use perf these days:

> [ 1.003260] ------------[ cut here ]------------
> [ 1.007943] WARNING: at /home/davehans/linux.git/arch/x86/kernel/cpu/perf_event.c:471 x86_pmu_event_init+0x249/0x430()
> [ 1.018906] Modules linked in:
> [ 1.021999] CPU: 0 PID: 11 Comm: watchdog/0 Not tainted 3.10.0-05973-gd2b4a64-dirty #190
> [ 1.030313] Hardware name: FUJITSU-SV PRIMEQUEST 1800E2/SB, BIOS PRIMEQUEST 1000 Series BIOS Version 1.24 09/14/2011
> [ 1.041093] ffffffff819dc0b8 ffff881fd33e3c98 ffffffff816270c5 0000000000001a9d
> [ 1.048601] 0000000000000009 ffff881fd33e3cd8 ffffffff81090f50 ffff881fd3248048
> [ 1.056112] ffff881fff26a800 ffff881fff26a800 0000000000000000 0000000000000000
> [ 1.063621] Call Trace:
> [ 1.066106] [<ffffffff816270c5>] dump_stack+0x55/0x76
> [ 1.071313] [<ffffffff81090f50>] warn_slowpath_common+0x70/0xa0
> [ 1.077410] [<ffffffff81090f9a>] warn_slowpath_null+0x1a/0x20
> [ 1.083330] [<ffffffff81060cb9>] x86_pmu_event_init+0x249/0x430
> [ 1.089427] [<ffffffff81144f0f>] perf_init_event+0xef/0x130
> [ 1.095170] [<ffffffff811452a8>] perf_event_alloc+0x358/0x4a0
> [ 1.101091] [<ffffffff810ca27d>] ? set_next_entity+0x3d/0x80
> [ 1.106924] [<ffffffff811081a0>] ? touch_nmi_watchdog+0x80/0x80
> [ 1.113018] [<ffffffff8114570e>] perf_event_create_kernel_counter+0x2e/0xe0
> [ 1.120176] [<ffffffff8110835d>] watchdog_enable+0xfd/0x1e0
> [ 1.125919] [<ffffffff810bf8e3>] smpboot_thread_fn+0xe3/0x1f0
> [ 1.131839] [<ffffffff810bf800>] ? smpboot_unregister_percpu_thread+0x50/0x50
> [ 1.139250] [<ffffffff810b7d10>] kthread+0xc0/0xd0
> [ 1.144200] [<ffffffff810b7c50>] ? flush_kthread_worker+0xb0/0xb0
> [ 1.150474] [<ffffffff8163735c>] ret_from_fork+0x7c/0xb0
> [ 1.155950] [<ffffffff810b7c50>] ? flush_kthread_worker+0xb0/0xb0
> [ 1.162227] ---[ end trace 32c04e859824e00d ]---

static int watchdog_nmi_enable(unsigned int cpu)
{
...
/* Try to register using hardware perf events */
event = perf_event_create_kernel_counter(wd_attr, cpu, NULL, ...

That at least solves the boot-time mystery.

2013-07-13 01:39:39

by Vince Weaver

[permalink] [raw]
Subject: Re: Yet more softlockups.

On Fri, 12 Jul 2013, Dave Jones wrote:

>
> Here's a fun trick:
>
> trinity -c perf_event_open -C4 -q -l off
>
> Within about a minute, that brings any of my boxes to its knees.
> The softlockup detector starts going nuts, and then the box wedges solid.

are you running with the patch

[PATCH 1/2] perf: Clone child context from parent context pmu

https://lkml.org/lkml/2013/7/9/310

It hasn't hit Linus git yet.

It fixes a bug that my perf_fuzzer would hit within seconds but it took me
over a month of trace bisecting and kernel bisecting to isolate it.

The symptoms were stuck processes and NMI tracebacks leading to hard
locks.

With the patch applied my perf_fuzzer (which uses the same perf_event_open
syscall generator as trinity) runs for hours w/o problems.

Vince