LinuxLists.cc - 3.15-rc8 oops in copy_page

2014-06-06 17:43:29

Subject: 3.15-rc8 oops in copy_page_rep after page fault.

Not much to go on here. It rebooted right after dumping this.

Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in: fuse sctp tun hidp rfcomm llc2 af_key nfnetlink ipt_ULOG scsi_transport_iscsi bnep can_raw nfc caif_socket caif af_802154 ieee802154 phonet af_rxrpc can_bcm can pppoe pppox p
pp_generic slhc irda crc_ccitt rds rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 cfg80211 coretemp hwmon x86_pkg_temp_thermal kvm_intel kvm xfs libcrc32c crct10dif_pclmul crc32c_intel snd_hda_c
odec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec btusb snd_hwdep bluetooth e1000e snd_seq 6lowpan_iphc ghash_clmulni_intel snd_seq_device snd_pcm snd_timer
shpchp microcode snd rfkill usb_debug ptp serio_raw pcspkr pps_core soundcore
CPU: 3 PID: 7553 Comm: trinity-c196 Not tainted 3.15.0-rc8+ #229
task: ffff880095966390 ti: ffff880002084000 task.ti: ffff880002084000
RIP: 0010:[<ffffffff8b3287b5>] [<ffffffff8b3287b5>] copy_page_rep+0x5/0x10
RSP: 0000:ffff880002087d08 EFLAGS: 00010286
RAX: ffff880000000000 RBX: ffffea000053bf80 RCX: 0000000000000200
RDX: 0000000000000000 RSI: ffff880052766000 RDI: ffff880014efe000
RBP: ffff880002087d80 R08: 000000024e558000 R09: ffff880000000000
R10: 0000000000002c2a R11: 0000000000016ae0 R12: 000000000149d980
R13: ffff8800020a9000 R14: 00000000014a0000 R15: ffff880070d63f08
FS: 00007f3700519780(0000) GS:ffff88024d180000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff880052766000 CR3: 0000000002068000 CR4: 00000000001407e0
DR0: 0000627000019000 DR1: 0000000000a94000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 00000000000b0602
Stack:
ffffffff8b1be8db ffff88011cec5000 80000000526008c5 ffff880070d633d8
ffff880070d633d8 ffff8800020a9000 ffff880002090000 ffffea0001498000
00000c1080000000 0000160000000000 ffff8800020a9000 00000c10800033e4
Call Trace:
[<ffffffff8b1be8db>] ? do_huge_pmd_wp_page+0x5cb/0x850
[<ffffffff8b187010>] handle_mm_fault+0x1e0/0xc50
[<ffffffff8b1b4662>] ? kmem_cache_free+0x1c2/0x200
[<ffffffff8b7472d9>] __do_page_fault+0x1c9/0x630
[<ffffffff8b010a98>] ? perf_trace_sys_enter+0x38/0x180
[<ffffffff8b11897b>] ? __acct_update_integrals+0x8b/0x120
[<ffffffff8b747bfb>] ? preempt_count_sub+0xab/0x100
[<ffffffff8b74775e>] do_page_fault+0x1e/0x70
[<ffffffff8b7441b2>] page_fault+0x22/0x30
Code: 90 90 90 90 90 90 9c fa 65 48 3b 06 75 14 65 48 3b 56 08 75 0d 65 48 89 1e 65 48 89 4e 08 9d b0 01 c3 9d 30 c0 c3 b9 00 02 00 00 <f3> 48 a5 c3 0f 1f 80 00 00 00 00 eb ee 0f 1f 84 00 00 00
00 00
RIP [<ffffffff8b3287b5>] copy_page_rep+0x5/0x10
RSP <ffff880002087d08>
CR2: ffff880052766000

2014-06-06 17:51:36

by Dave Jones

[permalink] [raw]

Subject: Re: 3.15-rc8 oops in copy_page_rep after page fault.

On Fri, Jun 06, 2014 at 01:43:17PM -0400, Dave Jones wrote:
> Not much to go on here. It rebooted right after dumping this.
>
> RIP: 0010:[<ffffffff8b3287b5>] [<ffffffff8b3287b5>] copy_page_rep+0x5/0x10
> Call Trace:
> [<ffffffff8b1be8db>] ? do_huge_pmd_wp_page+0x5cb/0x850
> [<ffffffff8b187010>] handle_mm_fault+0x1e0/0xc50
> [<ffffffff8b1b4662>] ? kmem_cache_free+0x1c2/0x200
> [<ffffffff8b7472d9>] __do_page_fault+0x1c9/0x630
> [<ffffffff8b010a98>] ? perf_trace_sys_enter+0x38/0x180
> [<ffffffff8b11897b>] ? __acct_update_integrals+0x8b/0x120
> [<ffffffff8b747bfb>] ? preempt_count_sub+0xab/0x100
> [<ffffffff8b74775e>] do_page_fault+0x1e/0x70
> [<ffffffff8b7441b2>] page_fault+0x22/0x30

Ok, I can reproduce this fairly easily.

The only prerequisite seems to be that before I start the fuzzer I do..

echo 65536 > /proc/sys/vm/mmap_min_addr

If I don't do that, then it seems to survive, so maybe that's a clue ?

Dave

2014-06-06 18:26:16

by Linus Torvalds

[permalink] [raw]

Subject: Re: 3.15-rc8 oops in copy_page_rep after page fault.

On Fri, Jun 6, 2014 at 10:43 AM, Dave Jones <[email protected]> wrote:
>
> RIP: 0010:[<ffffffff8b3287b5>] [<ffffffff8b3287b5>] copy_page_rep+0x5/0x10

Ok, it's the first iteration of "rep movsq" (%rcx is still 0x200) for
copying a page, and the pages are

RSI: ffff880052766000
RDI: ffff880014efe000

which both look like reasonable kernel addresses. So I'm assuming it's
DEBUG_PAGEALLOC that makes this trigger, and since the error code is
0, and the CR2 value matches RSI, it's the source page that seems to
have been freed.

And I see absolutely _zero_ reason for wht your 64k mmap_min_addr
should make any difference what-so-ever. That's just odd.

Anyway, can you try to figure out _which_ copy_user_highpage() it is
(by looking at what is around the call-site at
"handle_mm_fault+0x1e0". The fact that we have a stale
do_huge_pmd_wp_page() on the stack makes me suspect that we have hit
that VM_FAULT_FALLBACK case and this is related to splitting. Adding a
few more people explicitly to the cc in case anybody sees anything
(original email on lkml and linux-mm for context, guys).

Linus

2014-06-06 18:39:47

by Dave Jones

[permalink] [raw]

Subject: Re: 3.15-rc8 oops in copy_page_rep after page fault.

On Fri, Jun 06, 2014 at 11:26:14AM -0700, Linus Torvalds wrote:
> On Fri, Jun 6, 2014 at 10:43 AM, Dave Jones <[email protected]> wrote:
> >
> > RIP: 0010:[<ffffffff8b3287b5>] [<ffffffff8b3287b5>] copy_page_rep+0x5/0x10
>
> Ok, it's the first iteration of "rep movsq" (%rcx is still 0x200) for
> copying a page, and the pages are
>
> RSI: ffff880052766000
> RDI: ffff880014efe000
>
> which both look like reasonable kernel addresses. So I'm assuming it's
> DEBUG_PAGEALLOC that makes this trigger, and since the error code is
> 0, and the CR2 value matches RSI, it's the source page that seems to
> have been freed.
>
> And I see absolutely _zero_ reason for wht your 64k mmap_min_addr
> should make any difference what-so-ever. That's just odd.

I did some further experimenting. With it set to 4k it ran for a while
until I got bored. With it set to 8k I saw the crash above, but it took
longer to happen. With 64k it takes seconds to reproduce.
It might just be coincidental due to the way what mmaps trinity tries
succeed/fail, but it is curious.

> Anyway, can you try to figure out _which_ copy_user_highpage() it is
> (by looking at what is around the call-site at
> "handle_mm_fault+0x1e0". The fact that we have a stale
> do_huge_pmd_wp_page() on the stack makes me suspect that we have hit
> that VM_FAULT_FALLBACK case and this is related to splitting. Adding a
> few more people explicitly to the cc in case anybody sees anything
> (original email on lkml and linux-mm for context, guys).

full disasm at http://codemonkey.org.uk/junk/memory.S.txt

handle_mm_fault+0x1e0 looks to be 0x49f0 which is..

if (dirty && !pmd_write(orig_pmd)) {
ret = do_huge_pmd_wp_page(mm, vma, address, pmd,
49d8: 4d 89 f8 mov %r15,%r8
49db: 48 89 d9 mov %rbx,%rcx
49de: 4c 89 e2 mov %r12,%rdx
49e1: 44 89 55 d0 mov %r10d,-0x30(%rbp)
49e5: 4c 89 ee mov %r13,%rsi
49e8: 4c 89 f7 mov %r14,%rdi
49eb: e8 00 00 00 00 callq 49f0 <handle_mm_fault+0x1e0>
orig_pmd);
if (!(ret & VM_FAULT_FALLBACK))
49f0: 44 8b 55 d0 mov -0x30(%rbp),%r10d
49f4: f6 c4 08 test $0x8,%ah
49f7: 41 89 c3 mov %eax,%r11d
49fa: 0f 84 5e ff ff ff je 495e <handle_mm_fault+0x14e>
4a00: 48 8b 03 mov (%rbx),%rax

which seems to concur with your VM_FAULT_FALLBACK theory.

Dave

2014-06-06 18:42:14

by Hugh Dickins

[permalink] [raw]

Subject: Re: 3.15-rc8 oops in copy_page_rep after page fault.

On Fri, 6 Jun 2014, Linus Torvalds wrote:
> On Fri, Jun 6, 2014 at 10:43 AM, Dave Jones <[email protected]> wrote:
> >
> > RIP: 0010:[<ffffffff8b3287b5>] [<ffffffff8b3287b5>] copy_page_rep+0x5/0x10
>
> Ok, it's the first iteration of "rep movsq" (%rcx is still 0x200) for
> copying a page, and the pages are
>
> RSI: ffff880052766000
> RDI: ffff880014efe000
>
> which both look like reasonable kernel addresses. So I'm assuming it's
> DEBUG_PAGEALLOC that makes this trigger, and since the error code is
> 0, and the CR2 value matches RSI, it's the source page that seems to
> have been freed.
>
> And I see absolutely _zero_ reason for wht your 64k mmap_min_addr
> should make any difference what-so-ever. That's just odd.
>
> Anyway, can you try to figure out _which_ copy_user_highpage() it is
> (by looking at what is around the call-site at
> "handle_mm_fault+0x1e0". The fact that we have a stale
> do_huge_pmd_wp_page() on the stack makes me suspect that we have hit
> that VM_FAULT_FALLBACK case and this is related to splitting. Adding a
> few more people explicitly to the cc in case anybody sees anything
> (original email on lkml and linux-mm for context, guys).

It's a familiar one, that Sasha first reported over a year ago:
see https://lkml.org/lkml/2013/3/29/103

Somewhere in that thread I suggest that it's due to the source THPage
being split, and a tail page freed, while copy is in progress; and
not a problem without DEBUG_PAGEALLOC, since the pmd_same check
will prevent a miscopy from being made visible.

It's not a v3.15 regression, and it's no worry without DEBUG_PAGEALLOC.

If it's becoming easier to trigger and thus interfering with trinity,
then I guess we shall have to do something about it. Kirill tried one
approach that didn't work out, and we have so far both felt reluctant
to make the code uglier just to satisfy DEBUG_PAGEALLOC.

Hugh

2014-06-06 18:49:44

by Kirill A. Shutemov

[permalink] [raw]

Subject: Re: 3.15-rc8 oops in copy_page_rep after page fault.

On Fri, Jun 06, 2014 at 11:26:14AM -0700, Linus Torvalds wrote:
> On Fri, Jun 6, 2014 at 10:43 AM, Dave Jones <[email protected]> wrote:
> >
> > RIP: 0010:[<ffffffff8b3287b5>] [<ffffffff8b3287b5>] copy_page_rep+0x5/0x10
>
> Ok, it's the first iteration of "rep movsq" (%rcx is still 0x200) for
> copying a page, and the pages are
>
> RSI: ffff880052766000
> RDI: ffff880014efe000
>
> which both look like reasonable kernel addresses. So I'm assuming it's
> DEBUG_PAGEALLOC that makes this trigger, and since the error code is
> 0, and the CR2 value matches RSI, it's the source page that seems to
> have been freed.
>
> And I see absolutely _zero_ reason for wht your 64k mmap_min_addr
> should make any difference what-so-ever. That's just odd.
>
> Anyway, can you try to figure out _which_ copy_user_highpage() it is
> (by looking at what is around the call-site at
> "handle_mm_fault+0x1e0". The fact that we have a stale
> do_huge_pmd_wp_page() on the stack makes me suspect that we have hit
> that VM_FAULT_FALLBACK case and this is related to splitting. Adding a
> few more people explicitly to the cc in case anybody sees anything
> (original email on lkml and linux-mm for context, guys).

Looks like a known false positive from DEBUG_PAGEALLOC:

https://lkml.org/lkml/2013/3/29/103

We huge copy page in do_huge_pmd_wp_page() without ptl taken and the page
can be splitted and freed under us. Once page is copied we take ptl again
and recheck that PMD is not changed. If changed, we don't use new page.
Not a bug, never triggered with DEBUG_PAGEALLOC disabled.

It would be nice to have a way to mark this kind of speculative access.

--
Kirill A. Shutemov

2014-06-06 19:04:01

by Sasha Levin

[permalink] [raw]

Subject: Re: 3.15-rc8 oops in copy_page_rep after page fault.

On 06/06/2014 02:49 PM, Kirill A. Shutemov wrote:
> On Fri, Jun 06, 2014 at 11:26:14AM -0700, Linus Torvalds wrote:
>> > On Fri, Jun 6, 2014 at 10:43 AM, Dave Jones <[email protected]> wrote:
>>> > >
>>> > > RIP: 0010:[<ffffffff8b3287b5>] [<ffffffff8b3287b5>] copy_page_rep+0x5/0x10
>> >
>> > Ok, it's the first iteration of "rep movsq" (%rcx is still 0x200) for
>> > copying a page, and the pages are
>> >
>> > RSI: ffff880052766000
>> > RDI: ffff880014efe000
>> >
>> > which both look like reasonable kernel addresses. So I'm assuming it's
>> > DEBUG_PAGEALLOC that makes this trigger, and since the error code is
>> > 0, and the CR2 value matches RSI, it's the source page that seems to
>> > have been freed.
>> >
>> > And I see absolutely _zero_ reason for wht your 64k mmap_min_addr
>> > should make any difference what-so-ever. That's just odd.
>> >
>> > Anyway, can you try to figure out _which_ copy_user_highpage() it is
>> > (by looking at what is around the call-site at
>> > "handle_mm_fault+0x1e0". The fact that we have a stale
>> > do_huge_pmd_wp_page() on the stack makes me suspect that we have hit
>> > that VM_FAULT_FALLBACK case and this is related to splitting. Adding a
>> > few more people explicitly to the cc in case anybody sees anything
>> > (original email on lkml and linux-mm for context, guys).
> Looks like a known false positive from DEBUG_PAGEALLOC:
>
> https://lkml.org/lkml/2013/3/29/103
>
> We huge copy page in do_huge_pmd_wp_page() without ptl taken and the page
> can be splitted and freed under us. Once page is copied we take ptl again
> and recheck that PMD is not changed. If changed, we don't use new page.
> Not a bug, never triggered with DEBUG_PAGEALLOC disabled.
>
> It would be nice to have a way to mark this kind of speculative access.

FWIW, this issue makes fuzzing with DEBUG_PAGEALLOC nearly impossible since
this thing is so common we never get to do anything "fun" before this issue
triggers.

A fix would be more than welcome.

Thanks,
Sasha

2014-06-16 03:03:36

by Hugh Dickins

[permalink] [raw]

Subject: Re: 3.15-rc8 oops in copy_page_rep after page fault.

On Fri, 6 Jun 2014, Sasha Levin wrote:
> On 06/06/2014 02:49 PM, Kirill A. Shutemov wrote:
> > On Fri, Jun 06, 2014 at 11:26:14AM -0700, Linus Torvalds wrote:
> >> > On Fri, Jun 6, 2014 at 10:43 AM, Dave Jones <[email protected]> wrote:
> >>> > >
> >>> > > RIP: 0010:[<ffffffff8b3287b5>] [<ffffffff8b3287b5>] copy_page_rep+0x5/0x10
> >> >
> >> > Ok, it's the first iteration of "rep movsq" (%rcx is still 0x200) for
> >> > copying a page, and the pages are
> >> >
> >> > RSI: ffff880052766000
> >> > RDI: ffff880014efe000
> >> >
> >> > which both look like reasonable kernel addresses. So I'm assuming it's
> >> > DEBUG_PAGEALLOC that makes this trigger, and since the error code is
> >> > 0, and the CR2 value matches RSI, it's the source page that seems to
> >> > have been freed.
> >> >
> >> > And I see absolutely _zero_ reason for wht your 64k mmap_min_addr
> >> > should make any difference what-so-ever. That's just odd.
> >> >
> >> > Anyway, can you try to figure out _which_ copy_user_highpage() it is
> >> > (by looking at what is around the call-site at
> >> > "handle_mm_fault+0x1e0". The fact that we have a stale
> >> > do_huge_pmd_wp_page() on the stack makes me suspect that we have hit
> >> > that VM_FAULT_FALLBACK case and this is related to splitting. Adding a
> >> > few more people explicitly to the cc in case anybody sees anything
> >> > (original email on lkml and linux-mm for context, guys).
> > Looks like a known false positive from DEBUG_PAGEALLOC:
> >
> > https://lkml.org/lkml/2013/3/29/103
> >
> > We huge copy page in do_huge_pmd_wp_page() without ptl taken and the page
> > can be splitted and freed under us. Once page is copied we take ptl again
> > and recheck that PMD is not changed. If changed, we don't use new page.
> > Not a bug, never triggered with DEBUG_PAGEALLOC disabled.
> >
> > It would be nice to have a way to mark this kind of speculative access.
>
> FWIW, this issue makes fuzzing with DEBUG_PAGEALLOC nearly impossible since
> this thing is so common we never get to do anything "fun" before this issue
> triggers.
>
> A fix would be more than welcome.

Please give this a try: I think it's right, but I could easily be wrong.

[PATCH] thp: fix DEBUG_PAGEALLOC oops in copy_page_rep

Trinity has for over a year been reporting a CONFIG_DEBUG_PAGEALLOC
oops in copy_page_rep() called from copy_user_huge_page() called from
do_huge_pmd_wp_page().

I believe this is a DEBUG_PAGEALLOC false positive, due to the source
page being split, and a tail page freed, while copy is in progress; and
not a problem without DEBUG_PAGEALLOC, since the pmd_same() check will
prevent a miscopy from being made visible.

Fix by adding get_user_huge_page() and put_user_huge_page(): reducing
to the usual get_page() and put_page() on head page in the usual config;
but get and put references to all of the tail pages when DEBUG_PAGEALLOC.

Signed-off-by: Hugh Dickins <[email protected]>
---

mm/huge_memory.c | 35 +++++++++++++++++++++++++++++++----
1 file changed, 31 insertions(+), 4 deletions(-)

--- 3.16-rc1/mm/huge_memory.c 2014-06-08 18:09:10.544479312 -0700
+++ linux/mm/huge_memory.c 2014-06-15 19:32:58.993126929 -0700
@@ -941,6 +941,33 @@ unlock:
spin_unlock(ptl);
}

+/*
+ * Save CONFIG_DEBUG_PAGEALLOC from faulting falsely on tail pages
+ * during copy_user_huge_page()'s copy_page_rep(): in the case when
+ * the source page gets split and a tail freed before copy completes.
+ * Called under pmd_lock of checked pmd, so safe from splitting itself.
+ */
+static void get_user_huge_page(struct page *page)
+{
+ if (IS_ENABLED(CONFIG_DEBUG_PAGEALLOC)) {
+ struct page *endpage = page + HPAGE_PMD_NR;
+ atomic_add(HPAGE_PMD_NR, &page->_count);
+ while (++page < endpage)
+ get_huge_page_tail(page);
+ } else
+ get_page(page);
+}
+
+static void put_user_huge_page(struct page *page)
+{
+ if (IS_ENABLED(CONFIG_DEBUG_PAGEALLOC)) {
+ struct page *endpage = page + HPAGE_PMD_NR;
+ while (page < endpage)
+ put_page(page++);
+ } else
+ put_page(page);
+}
+
static int do_huge_pmd_wp_page_fallback(struct mm_struct *mm,
struct vm_area_struct *vma,
unsigned long address,
@@ -1074,7 +1101,7 @@ int do_huge_pmd_wp_page(struct mm_struct
ret |= VM_FAULT_WRITE;
goto out_unlock;
}
- get_page(page);
+ get_user_huge_page(page);
spin_unlock(ptl);
alloc:
if (transparent_hugepage_enabled(vma) &&
@@ -1095,7 +1122,7 @@ alloc:
split_huge_page(page);
ret |= VM_FAULT_FALLBACK;
}
- put_page(page);
+ put_user_huge_page(page);
}
count_vm_event(THP_FAULT_FALLBACK);
goto out;
@@ -1105,7 +1132,7 @@ alloc:
put_page(new_page);
if (page) {
split_huge_page(page);
- put_page(page);
+ put_user_huge_page(page);
} else
split_huge_page_pmd(vma, address, pmd);
ret |= VM_FAULT_FALLBACK;
@@ -1127,7 +1154,7 @@ alloc:

spin_lock(ptl);
if (page)
- put_page(page);
+ put_user_huge_page(page);
if (unlikely(!pmd_same(*pmd, orig_pmd))) {
spin_unlock(ptl);
mem_cgroup_uncharge_page(new_page);

2014-06-16 13:26:37

by Kirill A. Shutemov

[permalink] [raw]

Subject: Re: 3.15-rc8 oops in copy_page_rep after page fault.

On Sun, Jun 15, 2014 at 08:01:27PM -0700, Hugh Dickins wrote:
> On Fri, 6 Jun 2014, Sasha Levin wrote:
> > On 06/06/2014 02:49 PM, Kirill A. Shutemov wrote:
> > > On Fri, Jun 06, 2014 at 11:26:14AM -0700, Linus Torvalds wrote:
> > >> > On Fri, Jun 6, 2014 at 10:43 AM, Dave Jones <[email protected]> wrote:
> > >>> > >
> > >>> > > RIP: 0010:[<ffffffff8b3287b5>] [<ffffffff8b3287b5>] copy_page_rep+0x5/0x10
> > >> >
> > >> > Ok, it's the first iteration of "rep movsq" (%rcx is still 0x200) for
> > >> > copying a page, and the pages are
> > >> >
> > >> > RSI: ffff880052766000
> > >> > RDI: ffff880014efe000
> > >> >
> > >> > which both look like reasonable kernel addresses. So I'm assuming it's
> > >> > DEBUG_PAGEALLOC that makes this trigger, and since the error code is
> > >> > 0, and the CR2 value matches RSI, it's the source page that seems to
> > >> > have been freed.
> > >> >
> > >> > And I see absolutely _zero_ reason for wht your 64k mmap_min_addr
> > >> > should make any difference what-so-ever. That's just odd.
> > >> >
> > >> > Anyway, can you try to figure out _which_ copy_user_highpage() it is
> > >> > (by looking at what is around the call-site at
> > >> > "handle_mm_fault+0x1e0". The fact that we have a stale
> > >> > do_huge_pmd_wp_page() on the stack makes me suspect that we have hit
> > >> > that VM_FAULT_FALLBACK case and this is related to splitting. Adding a
> > >> > few more people explicitly to the cc in case anybody sees anything
> > >> > (original email on lkml and linux-mm for context, guys).
> > > Looks like a known false positive from DEBUG_PAGEALLOC:
> > >
> > > https://lkml.org/lkml/2013/3/29/103
> > >
> > > We huge copy page in do_huge_pmd_wp_page() without ptl taken and the page
> > > can be splitted and freed under us. Once page is copied we take ptl again
> > > and recheck that PMD is not changed. If changed, we don't use new page.
> > > Not a bug, never triggered with DEBUG_PAGEALLOC disabled.
> > >
> > > It would be nice to have a way to mark this kind of speculative access.
> >
> > FWIW, this issue makes fuzzing with DEBUG_PAGEALLOC nearly impossible since
> > this thing is so common we never get to do anything "fun" before this issue
> > triggers.
> >
> > A fix would be more than welcome.
>
> Please give this a try: I think it's right, but I could easily be wrong.
>
>
> [PATCH] thp: fix DEBUG_PAGEALLOC oops in copy_page_rep
>
> Trinity has for over a year been reporting a CONFIG_DEBUG_PAGEALLOC
> oops in copy_page_rep() called from copy_user_huge_page() called from
> do_huge_pmd_wp_page().
>
> I believe this is a DEBUG_PAGEALLOC false positive, due to the source
> page being split, and a tail page freed, while copy is in progress; and
> not a problem without DEBUG_PAGEALLOC, since the pmd_same() check will
> prevent a miscopy from being made visible.
>
> Fix by adding get_user_huge_page() and put_user_huge_page(): reducing
> to the usual get_page() and put_page() on head page in the usual config;
> but get and put references to all of the tail pages when DEBUG_PAGEALLOC.
>
> Signed-off-by: Hugh Dickins <[email protected]>

Ugly, but should do the job:

Acked-by: Kirill A. Shutemov <[email protected]>

BTW, we will not need this with new THP refcounting I'm playing with:
reference on the page will be enough to protect against splitting.

--
Kirill A. Shutemov

2014-06-17 20:32:22

by Sasha Levin

[permalink] [raw]

Subject: Re: 3.15-rc8 oops in copy_page_rep after page fault.

On 06/15/2014 11:01 PM, Hugh Dickins wrote:
> On Fri, 6 Jun 2014, Sasha Levin wrote:
>> > On 06/06/2014 02:49 PM, Kirill A. Shutemov wrote:
>>> > > On Fri, Jun 06, 2014 at 11:26:14AM -0700, Linus Torvalds wrote:
>>>>> > >> > On Fri, Jun 6, 2014 at 10:43 AM, Dave Jones <[email protected]> wrote:
>>>>>>> > >>> > >
>>>>>>> > >>> > > RIP: 0010:[<ffffffff8b3287b5>] [<ffffffff8b3287b5>] copy_page_rep+0x5/0x10
>>>>> > >> >
>>>>> > >> > Ok, it's the first iteration of "rep movsq" (%rcx is still 0x200) for
>>>>> > >> > copying a page, and the pages are
>>>>> > >> >
>>>>> > >> > RSI: ffff880052766000
>>>>> > >> > RDI: ffff880014efe000
>>>>> > >> >
>>>>> > >> > which both look like reasonable kernel addresses. So I'm assuming it's
>>>>> > >> > DEBUG_PAGEALLOC that makes this trigger, and since the error code is
>>>>> > >> > 0, and the CR2 value matches RSI, it's the source page that seems to
>>>>> > >> > have been freed.
>>>>> > >> >
>>>>> > >> > And I see absolutely _zero_ reason for wht your 64k mmap_min_addr
>>>>> > >> > should make any difference what-so-ever. That's just odd.
>>>>> > >> >
>>>>> > >> > Anyway, can you try to figure out _which_ copy_user_highpage() it is
>>>>> > >> > (by looking at what is around the call-site at
>>>>> > >> > "handle_mm_fault+0x1e0". The fact that we have a stale
>>>>> > >> > do_huge_pmd_wp_page() on the stack makes me suspect that we have hit
>>>>> > >> > that VM_FAULT_FALLBACK case and this is related to splitting. Adding a
>>>>> > >> > few more people explicitly to the cc in case anybody sees anything
>>>>> > >> > (original email on lkml and linux-mm for context, guys).
>>> > > Looks like a known false positive from DEBUG_PAGEALLOC:
>>> > >
>>> > > https://lkml.org/lkml/2013/3/29/103
>>> > >
>>> > > We huge copy page in do_huge_pmd_wp_page() without ptl taken and the page
>>> > > can be splitted and freed under us. Once page is copied we take ptl again
>>> > > and recheck that PMD is not changed. If changed, we don't use new page.
>>> > > Not a bug, never triggered with DEBUG_PAGEALLOC disabled.
>>> > >
>>> > > It would be nice to have a way to mark this kind of speculative access.
>> >
>> > FWIW, this issue makes fuzzing with DEBUG_PAGEALLOC nearly impossible since
>> > this thing is so common we never get to do anything "fun" before this issue
>> > triggers.
>> >
>> > A fix would be more than welcome.
> Please give this a try: I think it's right, but I could easily be wrong.
>
>
> [PATCH] thp: fix DEBUG_PAGEALLOC oops in copy_page_rep
>
> Trinity has for over a year been reporting a CONFIG_DEBUG_PAGEALLOC
> oops in copy_page_rep() called from copy_user_huge_page() called from
> do_huge_pmd_wp_page().
>
> I believe this is a DEBUG_PAGEALLOC false positive, due to the source
> page being split, and a tail page freed, while copy is in progress; and
> not a problem without DEBUG_PAGEALLOC, since the pmd_same() check will
> prevent a miscopy from being made visible.
>
> Fix by adding get_user_huge_page() and put_user_huge_page(): reducing
> to the usual get_page() and put_page() on head page in the usual config;
> but get and put references to all of the tail pages when DEBUG_PAGEALLOC.
>
> Signed-off-by: Hugh Dickins <[email protected]>

Works great, thanks Hugh!

Thanks,
Sasha