2014-11-07 04:27:45

by Sasha Levin

[permalink] [raw]
Subject: mm: shmem: freeing mlocked page

Hi all,

While fuzzing with trinity inside a KVM tools guest running the latest -next
kernel, I've stumbled on the following spew:

[ 1441.564471] BUG: Bad page state in process trinity-c612 pfn:12593a
[ 1441.564476] page:ffffea0006e175c0 count:0 mapcount:0 mapping: (null) index:
0x49
[ 1441.564488] flags: 0xafffff8028000c(referenced|uptodate|swapbacked|mlocked)
[ 1441.564491] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
[ 1441.564493] bad because of flags:
[ 1441.564498] flags: 0x200000(mlocked)
[ 1441.564503] Modules linked in:
[ 1441.564511] CPU: 2 PID: 11657 Comm: trinity-c612 Not tainted 3.18.0-rc3-next-20141106-sasha-00054-g09b7ccf-dirty #1447
[ 1441.564519] 0000000000000000 0000000000000000 1ffffffff3b44e48 ffff8805c969b868
[ 1441.564526] ffffffff9c085024 0000000000000000 ffffea0006e175c0 ffff8805c969b898
[ 1441.564532] ffffffff925fd0a1 ffffea0006e17628 dfffe90000000000 0000000000000000
[ 1441.564534] Call Trace:
[ 1441.568496] dump_stack (lib/dump_stack.c:52)
[ 1441.568516] bad_page (mm/page_alloc.c:338)
[ 1441.568523] free_pages_prepare (mm/page_alloc.c:649 mm/page_alloc.c:755)
[ 1441.568531] free_hot_cold_page (mm/page_alloc.c:1436)
[ 1441.568541] free_hot_cold_page_list (mm/page_alloc.c:1482 (discriminator 3))
[ 1441.568555] release_pages (mm/swap.c:961)
[ 1441.568566] __pagevec_release (include/linux/pagevec.h:44 mm/swap.c:978)
[ 1441.568579] shmem_undo_range (include/linux/pagevec.h:69 mm/shmem.c:451)
[ 1441.568591] shmem_truncate_range (mm/shmem.c:546)
[ 1441.568599] shmem_fallocate (include/linux/spinlock.h:309 mm/shmem.c:2092)
[ 1441.568612] ? __sb_start_write (fs/super.c:1208)
[ 1441.568622] ? __sb_start_write (fs/super.c:1208)
[ 1441.568633] do_fallocate (fs/open.c:297)
[ 1441.568648] SyS_madvise (mm/madvise.c:332 mm/madvise.c:381 mm/madvise.c:531 mm/madvise.c:462)
[ 1441.568660] ? syscall_trace_enter_phase1 (include/linux/context_tracking.h:27 arch/x86/kernel/ptrace.c:1486)
[ 1441.568672] tracesys_phase2 (arch/x86/kernel/entry_64.S:529)

I'm slightly confused here, because the page is mapcount==0, not LOCKED but still MLOCKED...

Thanks,
Sasha


2014-11-14 14:39:49

by Sasha Levin

[permalink] [raw]
Subject: Re: mm: shmem: freeing mlocked page

On 11/06/2014 11:27 PM, Sasha Levin wrote:
> Hi all,
>
> While fuzzing with trinity inside a KVM tools guest running the latest -next
> kernel, I've stumbled on the following spew:
>
> [ 1441.564471] BUG: Bad page state in process trinity-c612 pfn:12593a
> [ 1441.564476] page:ffffea0006e175c0 count:0 mapcount:0 mapping: (null) index:
> 0x49
> [ 1441.564488] flags: 0xafffff8028000c(referenced|uptodate|swapbacked|mlocked)
> [ 1441.564491] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
> [ 1441.564493] bad because of flags:
> [ 1441.564498] flags: 0x200000(mlocked)
> [ 1441.564503] Modules linked in:
> [ 1441.564511] CPU: 2 PID: 11657 Comm: trinity-c612 Not tainted 3.18.0-rc3-next-20141106-sasha-00054-g09b7ccf-dirty #1447
> [ 1441.564519] 0000000000000000 0000000000000000 1ffffffff3b44e48 ffff8805c969b868
> [ 1441.564526] ffffffff9c085024 0000000000000000 ffffea0006e175c0 ffff8805c969b898
> [ 1441.564532] ffffffff925fd0a1 ffffea0006e17628 dfffe90000000000 0000000000000000
> [ 1441.564534] Call Trace:
> [ 1441.568496] dump_stack (lib/dump_stack.c:52)
> [ 1441.568516] bad_page (mm/page_alloc.c:338)
> [ 1441.568523] free_pages_prepare (mm/page_alloc.c:649 mm/page_alloc.c:755)
> [ 1441.568531] free_hot_cold_page (mm/page_alloc.c:1436)
> [ 1441.568541] free_hot_cold_page_list (mm/page_alloc.c:1482 (discriminator 3))
> [ 1441.568555] release_pages (mm/swap.c:961)
> [ 1441.568566] __pagevec_release (include/linux/pagevec.h:44 mm/swap.c:978)
> [ 1441.568579] shmem_undo_range (include/linux/pagevec.h:69 mm/shmem.c:451)
> [ 1441.568591] shmem_truncate_range (mm/shmem.c:546)
> [ 1441.568599] shmem_fallocate (include/linux/spinlock.h:309 mm/shmem.c:2092)
> [ 1441.568612] ? __sb_start_write (fs/super.c:1208)
> [ 1441.568622] ? __sb_start_write (fs/super.c:1208)
> [ 1441.568633] do_fallocate (fs/open.c:297)
> [ 1441.568648] SyS_madvise (mm/madvise.c:332 mm/madvise.c:381 mm/madvise.c:531 mm/madvise.c:462)
> [ 1441.568660] ? syscall_trace_enter_phase1 (include/linux/context_tracking.h:27 arch/x86/kernel/ptrace.c:1486)
> [ 1441.568672] tracesys_phase2 (arch/x86/kernel/entry_64.S:529)
>
> I'm slightly confused here, because the page is mapcount==0, not LOCKED but still MLOCKED...

So I got this as well:

[ 1026.988043] BUG: Bad page state in process trinity-c374 pfn:23f70
[ 1026.989684] page:ffffea0000b3d300 count:0 mapcount:0 mapping: (null) index:0x5b
[ 1026.991151] flags: 0x1fffff8028000c(referenced|uptodate|swapbacked|mlocked)
[ 1026.992410] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
[ 1026.993479] bad because of flags:
[ 1026.994125] flags: 0x200000(mlocked)
[ 1026.994816] Modules linked in:
[ 1026.995378] CPU: 7 PID: 7879 Comm: trinity-c374 Not tainted 3.18.0-rc4-next-20141113-sasha-00047-gd1763ce-dirty #1455
[ 1026.996123] FAULT_INJECTION: forcing a failure.
[ 1026.996123] name failslab, interval 100, probability 30, space 0, times -1
[ 1026.999050] 0000000000000000 0000000000000000 0000000000b3d300 ffff88061295bbd8
[ 1027.000676] ffffffff92f71097 0000000000000000 ffffea0000b3d300 ffff88061295bc08
[ 1027.002020] ffffffff8197ef7a ffffea0000b3d300 ffffffff942dd148 dfffe90000000000
[ 1027.003359] Call Trace:
[ 1027.003831] dump_stack (lib/dump_stack.c:52)
[ 1027.004725] bad_page (mm/page_alloc.c:338)
[ 1027.005623] free_pages_prepare (mm/page_alloc.c:657 mm/page_alloc.c:763)
[ 1027.006761] free_hot_cold_page (mm/page_alloc.c:1438)
[ 1027.007772] ? __page_cache_release (mm/swap.c:66)
[ 1027.008815] put_page (mm/swap.c:270)
[ 1027.009665] page_cache_pipe_buf_release (fs/splice.c:93)
[ 1027.010888] __splice_from_pipe (fs/splice.c:784 fs/splice.c:886)
[ 1027.011917] ? might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3734)
[ 1027.012856] ? pipe_lock (fs/pipe.c:69)
[ 1027.013728] ? write_pipe_buf (fs/splice.c:1534)
[ 1027.014756] vmsplice_to_user (fs/splice.c:1574)
[ 1027.015725] ? rcu_read_lock_held (kernel/rcu/update.c:169)
[ 1027.016757] ? __fget_light (include/linux/fdtable.h:80 fs/file.c:684)
[ 1027.017782] SyS_vmsplice (fs/splice.c:1656 fs/splice.c:1639)
[ 1027.018863] tracesys_phase2 (arch/x86/kernel/entry_64.S:529)

Which makes me suspect I blamed shmem for nothing.


Thanks,
Sasha

2014-11-18 21:58:46

by Andrew Morton

[permalink] [raw]
Subject: Re: mm: shmem: freeing mlocked page

On Fri, 14 Nov 2014 09:39:40 -0500 Sasha Levin <[email protected]> wrote:

>
> [ 1026.988043] BUG: Bad page state in process trinity-c374 pfn:23f70
> [ 1026.989684] page:ffffea0000b3d300 count:0 mapcount:0 mapping: (null) index:0x5b
> [ 1026.991151] flags: 0x1fffff8028000c(referenced|uptodate|swapbacked|mlocked)
> [ 1026.992410] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
> [ 1026.993479] bad because of flags:
> [ 1026.994125] flags: 0x200000(mlocked)

Gee that new page dumping code is nice!

> [ 1026.994816] Modules linked in:
> [ 1026.995378] CPU: 7 PID: 7879 Comm: trinity-c374 Not tainted 3.18.0-rc4-next-20141113-sasha-00047-gd1763ce-dirty #1455
> [ 1026.996123] FAULT_INJECTION: forcing a failure.
> [ 1026.996123] name failslab, interval 100, probability 30, space 0, times -1
> [ 1026.999050] 0000000000000000 0000000000000000 0000000000b3d300 ffff88061295bbd8
> [ 1027.000676] ffffffff92f71097 0000000000000000 ffffea0000b3d300 ffff88061295bc08
> [ 1027.002020] ffffffff8197ef7a ffffea0000b3d300 ffffffff942dd148 dfffe90000000000
> [ 1027.003359] Call Trace:
> [ 1027.003831] dump_stack (lib/dump_stack.c:52)
> [ 1027.004725] bad_page (mm/page_alloc.c:338)
> [ 1027.005623] free_pages_prepare (mm/page_alloc.c:657 mm/page_alloc.c:763)
> [ 1027.006761] free_hot_cold_page (mm/page_alloc.c:1438)
> [ 1027.007772] ? __page_cache_release (mm/swap.c:66)
> [ 1027.008815] put_page (mm/swap.c:270)
> [ 1027.009665] page_cache_pipe_buf_release (fs/splice.c:93)
> [ 1027.010888] __splice_from_pipe (fs/splice.c:784 fs/splice.c:886)
> [ 1027.011917] ? might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3734)
> [ 1027.012856] ? pipe_lock (fs/pipe.c:69)
> [ 1027.013728] ? write_pipe_buf (fs/splice.c:1534)
> [ 1027.014756] vmsplice_to_user (fs/splice.c:1574)
> [ 1027.015725] ? rcu_read_lock_held (kernel/rcu/update.c:169)
> [ 1027.016757] ? __fget_light (include/linux/fdtable.h:80 fs/file.c:684)
> [ 1027.017782] SyS_vmsplice (fs/splice.c:1656 fs/splice.c:1639)
> [ 1027.018863] tracesys_phase2 (arch/x86/kernel/entry_64.S:529)
>

So what happened here? Userspace fed some mlocked memory into splice()
and then, while splice() was running, userspace dropped its reference
to the memory, leaving splice() with the last reference. Yet somehow,
that page was still marked as being mlocked. I wouldn't expect the
kernel to permit userspace to drop its reference to the memory without
first clearing the mlocked state.

Is it possible to work out from trinity sources what the exact sequence
was? Which syscalls are being used, for example?

2014-11-19 03:44:12

by Sasha Levin

[permalink] [raw]
Subject: Re: mm: shmem: freeing mlocked page

On 11/18/2014 04:58 PM, Andrew Morton wrote:
> On Fri, 14 Nov 2014 09:39:40 -0500 Sasha Levin <[email protected]> wrote:
>
>>
>> [ 1026.988043] BUG: Bad page state in process trinity-c374 pfn:23f70
>> [ 1026.989684] page:ffffea0000b3d300 count:0 mapcount:0 mapping: (null) index:0x5b
>> [ 1026.991151] flags: 0x1fffff8028000c(referenced|uptodate|swapbacked|mlocked)
>> [ 1026.992410] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
>> [ 1026.993479] bad because of flags:
>> [ 1026.994125] flags: 0x200000(mlocked)
>
> Gee that new page dumping code is nice!
>
>> [ 1026.994816] Modules linked in:
>> [ 1026.995378] CPU: 7 PID: 7879 Comm: trinity-c374 Not tainted 3.18.0-rc4-next-20141113-sasha-00047-gd1763ce-dirty #1455
>> [ 1026.996123] FAULT_INJECTION: forcing a failure.
>> [ 1026.996123] name failslab, interval 100, probability 30, space 0, times -1
>> [ 1026.999050] 0000000000000000 0000000000000000 0000000000b3d300 ffff88061295bbd8
>> [ 1027.000676] ffffffff92f71097 0000000000000000 ffffea0000b3d300 ffff88061295bc08
>> [ 1027.002020] ffffffff8197ef7a ffffea0000b3d300 ffffffff942dd148 dfffe90000000000
>> [ 1027.003359] Call Trace:
>> [ 1027.003831] dump_stack (lib/dump_stack.c:52)
>> [ 1027.004725] bad_page (mm/page_alloc.c:338)
>> [ 1027.005623] free_pages_prepare (mm/page_alloc.c:657 mm/page_alloc.c:763)
>> [ 1027.006761] free_hot_cold_page (mm/page_alloc.c:1438)
>> [ 1027.007772] ? __page_cache_release (mm/swap.c:66)
>> [ 1027.008815] put_page (mm/swap.c:270)
>> [ 1027.009665] page_cache_pipe_buf_release (fs/splice.c:93)
>> [ 1027.010888] __splice_from_pipe (fs/splice.c:784 fs/splice.c:886)
>> [ 1027.011917] ? might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3734)
>> [ 1027.012856] ? pipe_lock (fs/pipe.c:69)
>> [ 1027.013728] ? write_pipe_buf (fs/splice.c:1534)
>> [ 1027.014756] vmsplice_to_user (fs/splice.c:1574)
>> [ 1027.015725] ? rcu_read_lock_held (kernel/rcu/update.c:169)
>> [ 1027.016757] ? __fget_light (include/linux/fdtable.h:80 fs/file.c:684)
>> [ 1027.017782] SyS_vmsplice (fs/splice.c:1656 fs/splice.c:1639)
>> [ 1027.018863] tracesys_phase2 (arch/x86/kernel/entry_64.S:529)
>>
>
> So what happened here? Userspace fed some mlocked memory into splice()
> and then, while splice() was running, userspace dropped its reference
> to the memory, leaving splice() with the last reference. Yet somehow,
> that page was still marked as being mlocked. I wouldn't expect the
> kernel to permit userspace to drop its reference to the memory without
> first clearing the mlocked state.
>
> Is it possible to work out from trinity sources what the exact sequence
> was? Which syscalls are being used, for example?

Trinity can't really log anything because attempts to log syscalls slow everything
down to a crawl to the point nothing reproduces.

I've just looked at that trace above, and got a bit more confused. I didn't think
that you can mlock page cache. How would a user do that exactly?


Thanks,
Sasha

2014-11-19 03:56:27

by Dave Jones

[permalink] [raw]
Subject: Re: mm: shmem: freeing mlocked page

On Tue, Nov 18, 2014 at 10:44:02PM -0500, Sasha Levin wrote:
> On 11/18/2014 04:58 PM, Andrew Morton wrote:

> >> [ 1027.012856] ? pipe_lock (fs/pipe.c:69)
> >> [ 1027.013728] ? write_pipe_buf (fs/splice.c:1534)
> >> [ 1027.014756] vmsplice_to_user (fs/splice.c:1574)
> >> [ 1027.015725] ? rcu_read_lock_held (kernel/rcu/update.c:169)
> >> [ 1027.016757] ? __fget_light (include/linux/fdtable.h:80 fs/file.c:684)
> >> [ 1027.017782] SyS_vmsplice (fs/splice.c:1656 fs/splice.c:1639)
> >> [ 1027.018863] tracesys_phase2 (arch/x86/kernel/entry_64.S:529)
> >
> > So what happened here? Userspace fed some mlocked memory into splice()
> > and then, while splice() was running, userspace dropped its reference
> > to the memory, leaving splice() with the last reference. Yet somehow,
> > that page was still marked as being mlocked. I wouldn't expect the
> > kernel to permit userspace to drop its reference to the memory without
> > first clearing the mlocked state.
> >
> > Is it possible to work out from trinity sources what the exact sequence
> > was? Which syscalls are being used, for example?
>
> Trinity can't really log anything because attempts to log syscalls slow everything
> down to a crawl to the point nothing reproduces.

If the machine is still alive after /proc/sys/kernel/tainted changes,
trinity will dump a trinity-post-mortem.log somewhere[*] that should
contain the last two syscalls each process did. (Even if logging
is disabled).

It's not perfect however, and knowing that we passed a pointer to
a syscall isn't always useful unless we also dump the data that pointer
pointed at. It's a work in progress. I don't know if I'm going to
get time to improve it any time soon though.

Dave

[*] wherever cwd happened to be when the main process exited.

2014-11-19 03:56:33

by Andrew Morton

[permalink] [raw]
Subject: Re: mm: shmem: freeing mlocked page

On Tue, 18 Nov 2014 22:44:02 -0500 Sasha Levin <[email protected]> wrote:

> On 11/18/2014 04:58 PM, Andrew Morton wrote:
> > On Fri, 14 Nov 2014 09:39:40 -0500 Sasha Levin <[email protected]> wrote:
> >
> >>
> >> [ 1026.988043] BUG: Bad page state in process trinity-c374 pfn:23f70
> >> [ 1026.989684] page:ffffea0000b3d300 count:0 mapcount:0 mapping: (null) index:0x5b
> >> [ 1026.991151] flags: 0x1fffff8028000c(referenced|uptodate|swapbacked|mlocked)
> >> [ 1026.992410] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
> >> [ 1026.993479] bad because of flags:
> >> [ 1026.994125] flags: 0x200000(mlocked)
> >
> > Gee that new page dumping code is nice!
> >
> >> [ 1026.994816] Modules linked in:
> >> [ 1026.995378] CPU: 7 PID: 7879 Comm: trinity-c374 Not tainted 3.18.0-rc4-next-20141113-sasha-00047-gd1763ce-dirty #1455
> >> [ 1026.996123] FAULT_INJECTION: forcing a failure.
> >> [ 1026.996123] name failslab, interval 100, probability 30, space 0, times -1
> >> [ 1026.999050] 0000000000000000 0000000000000000 0000000000b3d300 ffff88061295bbd8
> >> [ 1027.000676] ffffffff92f71097 0000000000000000 ffffea0000b3d300 ffff88061295bc08
> >> [ 1027.002020] ffffffff8197ef7a ffffea0000b3d300 ffffffff942dd148 dfffe90000000000
> >> [ 1027.003359] Call Trace:
> >> [ 1027.003831] dump_stack (lib/dump_stack.c:52)
> >> [ 1027.004725] bad_page (mm/page_alloc.c:338)
> >> [ 1027.005623] free_pages_prepare (mm/page_alloc.c:657 mm/page_alloc.c:763)
> >> [ 1027.006761] free_hot_cold_page (mm/page_alloc.c:1438)
> >> [ 1027.007772] ? __page_cache_release (mm/swap.c:66)
> >> [ 1027.008815] put_page (mm/swap.c:270)
> >> [ 1027.009665] page_cache_pipe_buf_release (fs/splice.c:93)
> >> [ 1027.010888] __splice_from_pipe (fs/splice.c:784 fs/splice.c:886)
> >> [ 1027.011917] ? might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3734)
> >> [ 1027.012856] ? pipe_lock (fs/pipe.c:69)
> >> [ 1027.013728] ? write_pipe_buf (fs/splice.c:1534)
> >> [ 1027.014756] vmsplice_to_user (fs/splice.c:1574)
> >> [ 1027.015725] ? rcu_read_lock_held (kernel/rcu/update.c:169)
> >> [ 1027.016757] ? __fget_light (include/linux/fdtable.h:80 fs/file.c:684)
> >> [ 1027.017782] SyS_vmsplice (fs/splice.c:1656 fs/splice.c:1639)
> >> [ 1027.018863] tracesys_phase2 (arch/x86/kernel/entry_64.S:529)
> >>
> >
> > So what happened here? Userspace fed some mlocked memory into splice()
> > and then, while splice() was running, userspace dropped its reference
> > to the memory, leaving splice() with the last reference. Yet somehow,
> > that page was still marked as being mlocked. I wouldn't expect the
> > kernel to permit userspace to drop its reference to the memory without
> > first clearing the mlocked state.
> >
> > Is it possible to work out from trinity sources what the exact sequence
> > was? Which syscalls are being used, for example?
>
> Trinity can't really log anything because attempts to log syscalls slow everything
> down to a crawl to the point nothing reproduces.

Ah. I was thinking that it could be worked out by looking at the
trinity source around where it calls splice(). But I suspect that
doesn't make sense if trinity just creates a zillion threads each of
which sprays semi-random syscalls at the kernel(?).


> I've just looked at that trace above, and got a bit more confused. I didn't think
> that you can mlock page cache. How would a user do that exactly?

mmap it then mlock it! The kernel will fault everything in for you
then pin it down.

2014-11-19 04:13:09

by Sasha Levin

[permalink] [raw]
Subject: Re: mm: shmem: freeing mlocked page

On 11/18/2014 10:56 PM, Andrew Morton wrote:
>> Trinity can't really log anything because attempts to log syscalls slow everything
>> > down to a crawl to the point nothing reproduces.
> Ah. I was thinking that it could be worked out by looking at the
> trinity source around where it calls splice(). But I suspect that
> doesn't make sense if trinity just creates a zillion threads each of
> which sprays semi-random syscalls at the kernel(?).

I think Dave would agree here that this is a rather accurate description
of Trinity :)

>> > I've just looked at that trace above, and got a bit more confused. I didn't think
>> > that you can mlock page cache. How would a user do that exactly?
> mmap it then mlock it! The kernel will fault everything in for you
> then pin it down.

But that's a pipe buffer, I didn't think userspace can mmap pipes? I have
some reading to do.


Thanks,
Sasha

2014-11-19 13:38:26

by Vlastimil Babka

[permalink] [raw]
Subject: Re: mm: shmem: freeing mlocked page

On 11/18/2014 10:58 PM, Andrew Morton wrote:
> On Fri, 14 Nov 2014 09:39:40 -0500 Sasha Levin <[email protected]> wrote:
>
>>
>> [ 1026.988043] BUG: Bad page state in process trinity-c374 pfn:23f70
>> [ 1026.989684] page:ffffea0000b3d300 count:0 mapcount:0 mapping: (null) index:0x5b
>> [ 1026.991151] flags: 0x1fffff8028000c(referenced|uptodate|swapbacked|mlocked)
>> [ 1026.992410] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
>> [ 1026.993479] bad because of flags:
>> [ 1026.994125] flags: 0x200000(mlocked)
>
> Gee that new page dumping code is nice!
>
>> [ 1026.994816] Modules linked in:
>> [ 1026.995378] CPU: 7 PID: 7879 Comm: trinity-c374 Not tainted 3.18.0-rc4-next-20141113-sasha-00047-gd1763ce-dirty #1455
>> [ 1026.996123] FAULT_INJECTION: forcing a failure.
>> [ 1026.996123] name failslab, interval 100, probability 30, space 0, times -1
>> [ 1026.999050] 0000000000000000 0000000000000000 0000000000b3d300 ffff88061295bbd8
>> [ 1027.000676] ffffffff92f71097 0000000000000000 ffffea0000b3d300 ffff88061295bc08
>> [ 1027.002020] ffffffff8197ef7a ffffea0000b3d300 ffffffff942dd148 dfffe90000000000
>> [ 1027.003359] Call Trace:
>> [ 1027.003831] dump_stack (lib/dump_stack.c:52)
>> [ 1027.004725] bad_page (mm/page_alloc.c:338)
>> [ 1027.005623] free_pages_prepare (mm/page_alloc.c:657 mm/page_alloc.c:763)
>> [ 1027.006761] free_hot_cold_page (mm/page_alloc.c:1438)
>> [ 1027.007772] ? __page_cache_release (mm/swap.c:66)
>> [ 1027.008815] put_page (mm/swap.c:270)
>> [ 1027.009665] page_cache_pipe_buf_release (fs/splice.c:93)
>> [ 1027.010888] __splice_from_pipe (fs/splice.c:784 fs/splice.c:886)
>> [ 1027.011917] ? might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3734)
>> [ 1027.012856] ? pipe_lock (fs/pipe.c:69)
>> [ 1027.013728] ? write_pipe_buf (fs/splice.c:1534)
>> [ 1027.014756] vmsplice_to_user (fs/splice.c:1574)
>> [ 1027.015725] ? rcu_read_lock_held (kernel/rcu/update.c:169)
>> [ 1027.016757] ? __fget_light (include/linux/fdtable.h:80 fs/file.c:684)
>> [ 1027.017782] SyS_vmsplice (fs/splice.c:1656 fs/splice.c:1639)
>> [ 1027.018863] tracesys_phase2 (arch/x86/kernel/entry_64.S:529)
>>
>
> So what happened here? Userspace fed some mlocked memory into splice()
> and then, while splice() was running, userspace dropped its reference
> to the memory, leaving splice() with the last reference. Yet somehow,
> that page was still marked as being mlocked. I wouldn't expect the
> kernel to permit userspace to drop its reference to the memory without
> first clearing the mlocked state.

I did check a bit and something caught my eye. Both page_remove_rmap() and
page_remove_file_rmap() contain this:

if (unlikely(PageMlocked(page)))
clear_page_mlock(page);

So could maybe something mlock the page between the check and clear?

I find lru_cache_add_active_or_unevictable somewhat suspicious. But checking if
these two could race will take some time.

> Is it possible to work out from trinity sources what the exact sequence
> was? Which syscalls are being used, for example?
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
>

2014-12-10 02:16:12

by Sasha Levin

[permalink] [raw]
Subject: Re: mm: shmem: freeing mlocked page

On 11/18/2014 04:58 PM, Andrew Morton wrote:
>> [ 1026.994816] Modules linked in:
>> > [ 1026.995378] CPU: 7 PID: 7879 Comm: trinity-c374 Not tainted 3.18.0-rc4-next-20141113-sasha-00047-gd1763ce-dirty #1455
>> > [ 1026.996123] FAULT_INJECTION: forcing a failure.
>> > [ 1026.996123] name failslab, interval 100, probability 30, space 0, times -1
>> > [ 1026.999050] 0000000000000000 0000000000000000 0000000000b3d300 ffff88061295bbd8
>> > [ 1027.000676] ffffffff92f71097 0000000000000000 ffffea0000b3d300 ffff88061295bc08
>> > [ 1027.002020] ffffffff8197ef7a ffffea0000b3d300 ffffffff942dd148 dfffe90000000000
>> > [ 1027.003359] Call Trace:
>> > [ 1027.003831] dump_stack (lib/dump_stack.c:52)
>> > [ 1027.004725] bad_page (mm/page_alloc.c:338)
>> > [ 1027.005623] free_pages_prepare (mm/page_alloc.c:657 mm/page_alloc.c:763)
>> > [ 1027.006761] free_hot_cold_page (mm/page_alloc.c:1438)
>> > [ 1027.007772] ? __page_cache_release (mm/swap.c:66)
>> > [ 1027.008815] put_page (mm/swap.c:270)
>> > [ 1027.009665] page_cache_pipe_buf_release (fs/splice.c:93)
>> > [ 1027.010888] __splice_from_pipe (fs/splice.c:784 fs/splice.c:886)
>> > [ 1027.011917] ? might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3734)
>> > [ 1027.012856] ? pipe_lock (fs/pipe.c:69)
>> > [ 1027.013728] ? write_pipe_buf (fs/splice.c:1534)
>> > [ 1027.014756] vmsplice_to_user (fs/splice.c:1574)
>> > [ 1027.015725] ? rcu_read_lock_held (kernel/rcu/update.c:169)
>> > [ 1027.016757] ? __fget_light (include/linux/fdtable.h:80 fs/file.c:684)
>> > [ 1027.017782] SyS_vmsplice (fs/splice.c:1656 fs/splice.c:1639)
>> > [ 1027.018863] tracesys_phase2 (arch/x86/kernel/entry_64.S:529)
>> >
> So what happened here? Userspace fed some mlocked memory into splice()
> and then, while splice() was running, userspace dropped its reference
> to the memory, leaving splice() with the last reference. Yet somehow,
> that page was still marked as being mlocked. I wouldn't expect the
> kernel to permit userspace to drop its reference to the memory without
> first clearing the mlocked state.
>
> Is it possible to work out from trinity sources what the exact sequence
> was? Which syscalls are being used, for example?

Phew, this took a long while but I've bisected it (with good confidence) down
to:

commit a38246260912ba4a0f8b563704a965a7a97cf3c3
Author: Davidlohr Bueso <[email protected]>
Date: Wed Dec 3 18:54:27 2014 +1100

mm/memory.c: share the i_mmap_rwsem

The unmap_mapping_range family of functions do the unmapping of user pages
(ultimately via zap_page_range_single) without touching the actual
interval tree, thus share the lock.

Signed-off-by: Davidlohr Bueso <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Acked-by: Hugh Dickins <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Peter Zijlstra (Intel) <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Srikar Dronamraju <[email protected]>
Acked-by: Mel Gorman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>


Thanks,
Sasha

2014-12-10 02:23:48

by Sasha Levin

[permalink] [raw]
Subject: Re: mm: shmem: freeing mlocked page

(Apologies for spam, I've Cc'ed a few outdated emails in the previous mail)

On 12/09/2014 09:15 PM, Sasha Levin wrote:
> On 11/18/2014 04:58 PM, Andrew Morton wrote:
>>> [ 1026.994816] Modules linked in:
>>>> [ 1026.995378] CPU: 7 PID: 7879 Comm: trinity-c374 Not tainted 3.18.0-rc4-next-20141113-sasha-00047-gd1763ce-dirty #1455
>>>> [ 1026.996123] FAULT_INJECTION: forcing a failure.
>>>> [ 1026.996123] name failslab, interval 100, probability 30, space 0, times -1
>>>> [ 1026.999050] 0000000000000000 0000000000000000 0000000000b3d300 ffff88061295bbd8
>>>> [ 1027.000676] ffffffff92f71097 0000000000000000 ffffea0000b3d300 ffff88061295bc08
>>>> [ 1027.002020] ffffffff8197ef7a ffffea0000b3d300 ffffffff942dd148 dfffe90000000000
>>>> [ 1027.003359] Call Trace:
>>>> [ 1027.003831] dump_stack (lib/dump_stack.c:52)
>>>> [ 1027.004725] bad_page (mm/page_alloc.c:338)
>>>> [ 1027.005623] free_pages_prepare (mm/page_alloc.c:657 mm/page_alloc.c:763)
>>>> [ 1027.006761] free_hot_cold_page (mm/page_alloc.c:1438)
>>>> [ 1027.007772] ? __page_cache_release (mm/swap.c:66)
>>>> [ 1027.008815] put_page (mm/swap.c:270)
>>>> [ 1027.009665] page_cache_pipe_buf_release (fs/splice.c:93)
>>>> [ 1027.010888] __splice_from_pipe (fs/splice.c:784 fs/splice.c:886)
>>>> [ 1027.011917] ? might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3734)
>>>> [ 1027.012856] ? pipe_lock (fs/pipe.c:69)
>>>> [ 1027.013728] ? write_pipe_buf (fs/splice.c:1534)
>>>> [ 1027.014756] vmsplice_to_user (fs/splice.c:1574)
>>>> [ 1027.015725] ? rcu_read_lock_held (kernel/rcu/update.c:169)
>>>> [ 1027.016757] ? __fget_light (include/linux/fdtable.h:80 fs/file.c:684)
>>>> [ 1027.017782] SyS_vmsplice (fs/splice.c:1656 fs/splice.c:1639)
>>>> [ 1027.018863] tracesys_phase2 (arch/x86/kernel/entry_64.S:529)
>>>>
>> So what happened here? Userspace fed some mlocked memory into splice()
>> and then, while splice() was running, userspace dropped its reference
>> to the memory, leaving splice() with the last reference. Yet somehow,
>> that page was still marked as being mlocked. I wouldn't expect the
>> kernel to permit userspace to drop its reference to the memory without
>> first clearing the mlocked state.
>>
>> Is it possible to work out from trinity sources what the exact sequence
>> was? Which syscalls are being used, for example?
>
> Phew, this took a long while but I've bisected it (with good confidence) down
> to:
>
> commit a38246260912ba4a0f8b563704a965a7a97cf3c3
> Author: Davidlohr Bueso <[email protected]>
> Date: Wed Dec 3 18:54:27 2014 +1100
>
> mm/memory.c: share the i_mmap_rwsem
>
> The unmap_mapping_range family of functions do the unmapping of user pages
> (ultimately via zap_page_range_single) without touching the actual
> interval tree, thus share the lock.
>
> Signed-off-by: Davidlohr Bueso <[email protected]>
> Cc: "Kirill A. Shutemov" <[email protected]>
> Acked-by: Hugh Dickins <[email protected]>
> Cc: Oleg Nesterov <[email protected]>
> Cc: Peter Zijlstra (Intel) <[email protected]>
> Cc: Rik van Riel <[email protected]>
> Cc: Srikar Dronamraju <[email protected]>
> Acked-by: Mel Gorman <[email protected]>
> Signed-off-by: Andrew Morton <[email protected]>
>
>
> Thanks,
> Sasha
>