2014-01-23 01:08:15

by Sasha Levin

[permalink] [raw]
Subject: mm: BUG: Bad rss-counter state

Hi all,

While fuzzing with trinity running inside a KVM tools guest using latest -next kernel,
I've stumbled on a "mm: BUG: Bad rss-counter state" error which was pretty non-obvious
in the mix of the kernel spew (why?).

I've added a small BUG() after the printk() in check_mm(), and here's the full output:

[ 318.334905] BUG: Bad rss-counter state mm:ffff8801e6dec000 idx:0 val:1
[ 318.335955] ------------[ cut here ]------------
[ 318.336507] kernel BUG at kernel/fork.c:562!
[ 318.336930] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[ 318.337826] Dumping ftrace buffer:
[ 318.338431] (ftrace buffer empty)
[ 318.338951] Modules linked in:
[ 318.339287] CPU: 45 PID: 10022 Comm: trinity-c190 Tainted: G W 3.13.0-next
-20140122-sasha-00011-gcc8342a-dirty #4
[ 318.340120] task: ffff8801e6a9b000 ti: ffff8801e6aee000 task.ti: ffff8801e6aee000
[ 318.340120] RIP: 0010:[<ffffffff8113ca4a>] [<ffffffff8113ca4a>] __mmdrop+0x9a/0xc0
[ 318.340120] RSP: 0000:ffff8801e6aefe68 EFLAGS: 00010292
[ 318.340120] RAX: 000000000000003a RBX: ffff8801e6dec000 RCX: 0000000000000001
[ 318.340120] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000286
[ 318.340120] RBP: ffff8801e6aefe78 R08: 0000000000000001 R09: 0000000000000000
[ 318.340120] R10: 0000000000000001 R11: 0000000000000001 R12: ffff8801e6dec138
[ 318.340120] R13: ffff8801e6dec000 R14: ffff8801e6dec0a8 R15: 00000000000000a3
[ 318.340120] FS: 00007f6bc5915700(0000) GS:ffff88007b400000(0000) knlGS:000000000000
0000
[ 318.340120] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 318.340120] CR2: 00007fffd3d62588 CR3: 0000000005e26000 CR4: 00000000000006e0
[ 318.340120] Stack:
[ 318.340120] ffff8801e6dec138 ffff8801e6dec000 ffff8801e6aefe98 ffffffff8113cb3b
[ 318.340120] ffff8801e6a9bbb0 ffff8801e6a9b000 ffff8801e6aefef8 ffffffff81140ced
[ 318.340120] ffff8801e6c4db00 ffff8801e6c4db00 ffff8801e6aefef8 ffffffff811f3ea5
[ 318.340120] Call Trace:
[ 318.340120] [<ffffffff8113cb3b>] mmput+0xcb/0xe0
[ 318.340120] [<ffffffff81140ced>] exit_mm+0x18d/0x1a0
[ 318.340120] [<ffffffff811f3ea5>] ? acct_collect+0x175/0x1b0
[ 318.340120] [<ffffffff8114315f>] do_exit+0x26f/0x520
[ 318.355754] [<ffffffff811434b9>] do_group_exit+0xa9/0xe0
[ 318.355754] [<ffffffff81143507>] SyS_exit_group+0x17/0x20
[ 318.355754] [<ffffffff8444b7d0>] tracesys+0xdd/0xe2
[ 318.355754] Code: 00 00 eb 16 0f 1f 44 00 00 48 8b 8b 68 03 00 00 48 85 c9 74 24 ba 02 00 00 00
48 89 de 48 c7 c7 10 16 68 85 31 c0 e8 a2 d2 2f 03 <0f> 0b 0f 1f 40 00 eb fe 66 0f 1f 44 00 00 48 89
de 48 8b 3d 1e
[ 318.355754] RIP [<ffffffff8113ca4a>] __mmdrop+0x9a/0xc0
[ 318.355754] RSP <ffff8801e6aefe68>
[ 318.363991] ---[ end trace 7d85aceb881be62b ]---


Thanks,
Sasha


2014-01-23 01:39:30

by David Rientjes

[permalink] [raw]
Subject: Re: mm: BUG: Bad rss-counter state

On Wed, 22 Jan 2014, Sasha Levin wrote:

> Hi all,
>
> While fuzzing with trinity running inside a KVM tools guest using latest -next
> kernel,
> I've stumbled on a "mm: BUG: Bad rss-counter state" error which was pretty
> non-obvious
> in the mix of the kernel spew (why?).
>

It's not a fatal condition and there's only a few possible stack traces
that could be emitted during the exit() path. I don't see how we could
make it more visible other than its log-level which is already KERN_ALERT.

> I've added a small BUG() after the printk() in check_mm(), and here's the full
> output:
>

Worst place to add it :) At line 562 of kernel/fork.c in linux-next
you're going to hit BUG() when there may be other counters that are also
bad and they don't get printed.

> [ 318.334905] BUG: Bad rss-counter state mm:ffff8801e6dec000 idx:0 val:1

So our mm has a non-zero MM_FILEPAGES count, but there's nothing that was
cited that would tell us what that is so there's not much to go on, unless
someone already recognizes this as another issue. Is this reproducible on
3.13 or only on linux-next?

2014-01-23 01:53:04

by Dave Jones

[permalink] [raw]
Subject: Re: mm: BUG: Bad rss-counter state

On Wed, Jan 22, 2014 at 05:39:25PM -0800, David Rientjes wrote:

> > While fuzzing with trinity running inside a KVM tools guest using latest -next
> > kernel,
> > I've stumbled on a "mm: BUG: Bad rss-counter state" error which was pretty
> > non-obvious
> > in the mix of the kernel spew (why?).
> >
>
> It's not a fatal condition and there's only a few possible stack traces
> that could be emitted during the exit() path. I don't see how we could
> make it more visible other than its log-level which is already KERN_ALERT.
>
> > I've added a small BUG() after the printk() in check_mm(), and here's the full
> > output:
> >
>
> Worst place to add it :) At line 562 of kernel/fork.c in linux-next
> you're going to hit BUG() when there may be other counters that are also
> bad and they don't get printed.
>
> > [ 318.334905] BUG: Bad rss-counter state mm:ffff8801e6dec000 idx:0 val:1
>
> So our mm has a non-zero MM_FILEPAGES count, but there's nothing that was
> cited that would tell us what that is so there's not much to go on, unless
> someone already recognizes this as another issue. Is this reproducible on
> 3.13 or only on linux-next?

Sasha, is this the current git tree version of Trinity ?
(I'm wondering if yesterdays munmap changes might be tickling this bug).

Dave

2014-01-23 02:15:36

by Sasha Levin

[permalink] [raw]
Subject: Re: mm: BUG: Bad rss-counter state

On 01/22/2014 08:39 PM, David Rientjes wrote:
> On Wed, 22 Jan 2014, Sasha Levin wrote:
>
>> Hi all,
>>
>> While fuzzing with trinity running inside a KVM tools guest using latest -next
>> kernel,
>> I've stumbled on a "mm: BUG: Bad rss-counter state" error which was pretty
>> non-obvious
>> in the mix of the kernel spew (why?).
>>
>
> It's not a fatal condition and there's only a few possible stack traces
> that could be emitted during the exit() path. I don't see how we could
> make it more visible other than its log-level which is already KERN_ALERT.

Would it make sense to add a VM_BUG_ON() to make it more obvious when we have
CONFIG_VM_DEBUG enabled? Many of the VM_BUG_ON test cases are non-fatal either,
and it would make it easier spotting this issue.

>> I've added a small BUG() after the printk() in check_mm(), and here's the full
>> output:
>>
>
> Worst place to add it :) At line 562 of kernel/fork.c in linux-next
> you're going to hit BUG() when there may be other counters that are also
> bad and they don't get printed.

I gave the condition before curly braces :)

if (unlikely(x)) {
printk(KERN_ALERT "BUG: Bad rss-counter state "
"mm:%p idx:%d val:%ld\n", mm, i, x);
BUG();
}

>> [ 318.334905] BUG: Bad rss-counter state mm:ffff8801e6dec000 idx:0 val:1
>
> So our mm has a non-zero MM_FILEPAGES count, but there's nothing that was
> cited that would tell us what that is so there's not much to go on, unless
> someone already recognizes this as another issue. Is this reproducible on
> 3.13 or only on linux-next?

Yup, I see it in v3.13 too, which is odd.


Thanks,
Sasha

2014-01-23 02:16:16

by Sasha Levin

[permalink] [raw]
Subject: Re: mm: BUG: Bad rss-counter state

On 01/22/2014 08:52 PM, Dave Jones wrote:
> Sasha, is this the current git tree version of Trinity ?
> (I'm wondering if yesterdays munmap changes might be tickling this bug).

Ah yes, my tree has the munmap patch from yesterday, which would explain why we
started seeing this issue just now.


Thanks,
Sasha

2014-01-23 02:22:06

by Dave Jones

[permalink] [raw]
Subject: Re: mm: BUG: Bad rss-counter state

On Wed, Jan 22, 2014 at 09:16:03PM -0500, Sasha Levin wrote:
> On 01/22/2014 08:52 PM, Dave Jones wrote:
> > Sasha, is this the current git tree version of Trinity ?
> > (I'm wondering if yesterdays munmap changes might be tickling this bug).
>
> Ah yes, my tree has the munmap patch from yesterday, which would explain why we
> started seeing this issue just now.

So that change is basically allowing trinity to munmap just part of a prior mmap.
So it may do things like..

mmap |--------------|

munmap |----XXX-------|

munmap |------XXX-----|

ie, it might try unmapping some pages more than once, and may even overlap prior munmaps.

until yesterdays change, it would only munmap the entire mmap.

There's no easy way to tell exactly what happened without a trinity log of course.

Dave

2014-01-23 03:14:22

by Sasha Levin

[permalink] [raw]
Subject: Re: mm: BUG: Bad rss-counter state

On 01/22/2014 09:21 PM, Dave Jones wrote:
> On Wed, Jan 22, 2014 at 09:16:03PM -0500, Sasha Levin wrote:
> > On 01/22/2014 08:52 PM, Dave Jones wrote:
> > > Sasha, is this the current git tree version of Trinity ?
> > > (I'm wondering if yesterdays munmap changes might be tickling this bug).
> >
> > Ah yes, my tree has the munmap patch from yesterday, which would explain why we
> > started seeing this issue just now.
>
> So that change is basically allowing trinity to munmap just part of a prior mmap.
> So it may do things like..
>
> mmap |--------------|
>
> munmap |----XXX-------|
>
> munmap |------XXX-----|
>
> ie, it might try unmapping some pages more than once, and may even overlap prior munmaps.
>
> until yesterdays change, it would only munmap the entire mmap.
>
> There's no easy way to tell exactly what happened without a trinity log of course.

I've attached the trinity log of the child that triggered the bug. Odd thing is that I
don't see any munmaps in it.


Thanks,
Sasha


Attachments:
trinity-child234.log (26.36 kB)