2018-07-07 01:21:18

by syzbot

[permalink] [raw]
Subject: kernel BUG at mm/shmem.c:LINE!

Hello,

syzbot found the following crash on:

HEAD commit: 526674536360 Add linux-next specific files for 20180706
git tree: linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=116d16fc400000
kernel config: https://syzkaller.appspot.com/x/.config?x=c8d1cfc0cb798e48
dashboard link: https://syzkaller.appspot.com/bug?extid=b8e0dfee3fd8c9012771
compiler: gcc (GCC) 8.0.1 20180413 (experimental)
syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=170e462c400000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15f1ba2c400000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: [email protected]

raw: 02fffc0000001028 ffffea0007011dc8 ffffea0007058b48 ffff8801a7576ab8
raw: 000000000000016e ffff8801a7588930 00000003ffffffff ffff8801d9a44c80
page dumped because: VM_BUG_ON_PAGE(page_to_pgoff(page) != index)
page->mem_cgroup:ffff8801d9a44c80
------------[ cut here ]------------
kernel BUG at mm/shmem.c:815!
invalid opcode: 0000 [#1] SMP KASAN
CPU: 0 PID: 4429 Comm: syz-executor697 Not tainted
4.18.0-rc3-next-20180706+ #1
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
RIP: 0010:shmem_undo_range+0xdaa/0x29a0 mm/shmem.c:815
Code: 00 0f 85 bd 19 00 00 48 8d 65 d8 5b 41 5c 41 5d 41 5e 41 5f 5d c3 e8
a5 f0 d6 ff 48 c7 c6 e0 32 f1 87 4c 89 e7 e8 16 10 05 00 <0f> 0b e8 8f f0
d6 ff 49 8d 7c 24 20 48 89 f8 48 c1 e8 03 80 3c 18
RSP: 0018:ffff8801ab88e158 EFLAGS: 00010246
RAX: 0000000000000000 RBX: dffffc0000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff81aaab95 RDI: ffffed0035711c18
RBP: ffff8801ab88e8d0 R08: ffff8801a7af04c0 R09: ffffed003b5c4fc0
R10: ffffed003b5c4fc0 R11: ffff8801dae27e07 R12: ffffea0007058b00
R13: ffff8801ab88e8a8 R14: 0000000000000001 R15: 000000000000016e
FS: 0000000000000000(0000) GS:ffff8801dae00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000004b625c CR3: 0000000008e6a000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
shmem_truncate_range+0x27/0xa0 mm/shmem.c:971
shmem_evict_inode+0x3b2/0xcb0 mm/shmem.c:1071
evict+0x4ae/0x990 fs/inode.c:558
iput_final fs/inode.c:1508 [inline]
iput+0x635/0xaa0 fs/inode.c:1534
dentry_unlink_inode+0x4ae/0x640 fs/dcache.c:377
__dentry_kill+0x44c/0x7a0 fs/dcache.c:569
dentry_kill+0xc9/0x5a0 fs/dcache.c:688
dput.part.26+0x66b/0x7a0 fs/dcache.c:849
dput+0x15/0x20 fs/dcache.c:831
__fput+0x558/0x930 fs/file_table.c:235
____fput+0x15/0x20 fs/file_table.c:251
task_work_run+0x1ec/0x2a0 kernel/task_work.c:113
exit_task_work include/linux/task_work.h:22 [inline]
do_exit+0x1b08/0x2750 kernel/exit.c:869
do_group_exit+0x177/0x440 kernel/exit.c:972
get_signal+0x88e/0x1970 kernel/signal.c:2467
do_signal+0x9c/0x21c0 arch/x86/kernel/signal.c:816
exit_to_usermode_loop+0x2e0/0x370 arch/x86/entry/common.c:162
prepare_exit_to_usermode arch/x86/entry/common.c:197 [inline]
syscall_return_slowpath arch/x86/entry/common.c:268 [inline]
do_syscall_64+0x6be/0x820 arch/x86/entry/common.c:293
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x441c29
Code: Bad RIP value.
RSP: 002b:00007fff6e973338 EFLAGS: 00000246 ORIG_RAX: 0000000000000028
RAX: ffffffffffffffe0 RBX: 0000000000000000 RCX: 0000000000441c29
RDX: 0000000020000180 RSI: 0000000000000004 RDI: 0000000000000003
RBP: 00007fff6e973350 R08: 0000000000000001 R09: 0000000000000000
R10: 0a00004000000002 R11: 0000000000000246 R12: ffffffffffffffff
R13: 0000000000000005 R14: 0000000000000000 R15: 0000000000000000
Modules linked in:
Dumping ftrace buffer:
(ftrace buffer empty)
---[ end trace 68c2f261fd3bbf54 ]---
RIP: 0010:shmem_undo_range+0xdaa/0x29a0 mm/shmem.c:815
Code: 00 0f 85 bd 19 00 00 48 8d 65 d8 5b 41 5c 41 5d 41 5e 41 5f 5d c3 e8
a5 f0 d6 ff 48 c7 c6 e0 32 f1 87 4c 89 e7 e8 16 10 05 00 <0f> 0b e8 8f f0
d6 ff 49 8d 7c 24 20 48 89 f8 48 c1 e8 03 80 3c 18
RSP: 0018:ffff8801ab88e158 EFLAGS: 00010246
RAX: 0000000000000000 RBX: dffffc0000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff81aaab95 RDI: ffffed0035711c18
RBP: ffff8801ab88e8d0 R08: ffff8801a7af04c0 R09: ffffed003b5c4fc0
R10: ffffed003b5c4fc0 R11: ffff8801dae27e07 R12: ffffea0007058b00
R13: ffff8801ab88e8a8 R14: 0000000000000001 R15: 000000000000016e
FS: 0000000000000000(0000) GS:ffff8801dae00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000441bff CR3: 0000000008e6a000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at [email protected].

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with
syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches


2018-07-07 02:58:46

by Matthew Wilcox

[permalink] [raw]
Subject: Re: kernel BUG at mm/shmem.c:LINE!

On Fri, Jul 06, 2018 at 06:19:02PM -0700, syzbot wrote:
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: [email protected]
>
> raw: 02fffc0000001028 ffffea0007011dc8 ffffea0007058b48 ffff8801a7576ab8
> raw: 000000000000016e ffff8801a7588930 00000003ffffffff ffff8801d9a44c80
> page dumped because: VM_BUG_ON_PAGE(page_to_pgoff(page) != index)
> page->mem_cgroup:ffff8801d9a44c80
> ------------[ cut here ]------------
> kernel BUG at mm/shmem.c:815!
> invalid opcode: 0000 [#1] SMP KASAN
> CPU: 0 PID: 4429 Comm: syz-executor697 Not tainted 4.18.0-rc3-next-20180706+
> #1
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> RIP: 0010:shmem_undo_range+0xdaa/0x29a0 mm/shmem.c:815

Pretty sure this one's mine. At least I spotted a codepath earlier
today which could lead to it. I'll fix it in the morning.

2018-07-09 14:37:47

by Matthew Wilcox

[permalink] [raw]
Subject: Re: kernel BUG at mm/shmem.c:LINE!

On Fri, Jul 06, 2018 at 06:19:02PM -0700, syzbot wrote:
> Hello,
>
> syzbot found the following crash on:
>
> HEAD commit: 526674536360 Add linux-next specific files for 20180706
> git tree: linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=116d16fc400000
> kernel config: https://syzkaller.appspot.com/x/.config?x=c8d1cfc0cb798e48
> dashboard link: https://syzkaller.appspot.com/bug?extid=b8e0dfee3fd8c9012771
> compiler: gcc (GCC) 8.0.1 20180413 (experimental)
> syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=170e462c400000
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15f1ba2c400000
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: [email protected]

#syz fix: shmem: Convert shmem_add_to_page_cache to XArray

2018-07-23 02:29:16

by Hugh Dickins

[permalink] [raw]
Subject: Re: kernel BUG at mm/shmem.c:LINE!

On Mon, 9 Jul 2018, Matthew Wilcox wrote:
> On Fri, Jul 06, 2018 at 06:19:02PM -0700, syzbot wrote:
> > Hello,
> >
> > syzbot found the following crash on:
> >
> > HEAD commit: 526674536360 Add linux-next specific files for 20180706
> > git tree: linux-next
> > console output: https://syzkaller.appspot.com/x/log.txt?x=116d16fc400000
> > kernel config: https://syzkaller.appspot.com/x/.config?x=c8d1cfc0cb798e48
> > dashboard link: https://syzkaller.appspot.com/bug?extid=b8e0dfee3fd8c9012771
> > compiler: gcc (GCC) 8.0.1 20180413 (experimental)
> > syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=170e462c400000
> > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15f1ba2c400000
> >
> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > Reported-by: [email protected]
>
> #syz fix: shmem: Convert shmem_add_to_page_cache to XArray

I don't see the patch, but I do see a diff in shmem_add_to_page_cache()
between mmotm 4.18.0-rc3-mm1 and current mmotm 4.18.0-rc5-mm1,
relating to use of xas_create_range().

Whether or not that fixed syzbot's kernel BUG at mm/shmem.c:815!
I don't know, but I'm afraid it has not fixed linux-next breakage of
huge tmpfs: I get a similar page_to_pgoff BUG at mm/filemap.c:1466!

Please try something like
mount -o remount,huge=always /dev/shm
cp /dev/zero /dev/shm

Writing soon crashes in find_lock_entry(), looking up offset 0x201
but getting the page for offset 0x3c1 instead.

I've spent a while on it, but better turn over to you, Matthew:
my guess is that xas_create_range() does not create the layout
you expect from it.

Thanks,
Hugh

2018-07-23 14:04:51

by Matthew Wilcox

[permalink] [raw]
Subject: Re: kernel BUG at mm/shmem.c:LINE!

On Sun, Jul 22, 2018 at 07:28:01PM -0700, Hugh Dickins wrote:
> Whether or not that fixed syzbot's kernel BUG at mm/shmem.c:815!
> I don't know, but I'm afraid it has not fixed linux-next breakage of
> huge tmpfs: I get a similar page_to_pgoff BUG at mm/filemap.c:1466!
>
> Please try something like
> mount -o remount,huge=always /dev/shm
> cp /dev/zero /dev/shm
>
> Writing soon crashes in find_lock_entry(), looking up offset 0x201
> but getting the page for offset 0x3c1 instead.

Hmm. I don't see a crash while running that command, but I do see an RCU
stall in find_get_entries() called from shmem_undo_range() when running
'cp' the second time -- ie while truncating the /dev/shm/zero file.
Maybe I'm seeing the same bug as you, and maybe I'm seeing a different
one. Do we have a shmem test suite somewhere?

> I've spent a while on it, but better turn over to you, Matthew:
> my guess is that xas_create_range() does not create the layout
> you expect from it.

I've dumped the XArray tree on my machine and it actually looks fine
*except* that the pages pointed to are free! That indicates to me I
screwed up somebody's reference count somewhere.

2018-07-23 19:15:55

by Hugh Dickins

[permalink] [raw]
Subject: Re: kernel BUG at mm/shmem.c:LINE!

On Mon, 23 Jul 2018, Matthew Wilcox wrote:
> On Sun, Jul 22, 2018 at 07:28:01PM -0700, Hugh Dickins wrote:
> > Whether or not that fixed syzbot's kernel BUG at mm/shmem.c:815!
> > I don't know, but I'm afraid it has not fixed linux-next breakage of
> > huge tmpfs: I get a similar page_to_pgoff BUG at mm/filemap.c:1466!
> >
> > Please try something like
> > mount -o remount,huge=always /dev/shm
> > cp /dev/zero /dev/shm
> >
> > Writing soon crashes in find_lock_entry(), looking up offset 0x201
> > but getting the page for offset 0x3c1 instead.
>
> Hmm. I don't see a crash while running that command,

Thanks for looking.

It is the VM_BUG_ON_PAGE(page_to_pgoff(page) != offset, page)
in find_lock_entry(). Perhaps you didn't have CONFIG_DEBUG_VM=y
on this occasion? Or you don't think of an oops as a kernel crash,
and didn't notice it in dmesg? I see now that I've arranged for oops
to crash, since I don't like to miss them myself; but it is a very
clean oops, no locks held, so can just kill the process and continue.

I recommend CONFIG_DEBUG_VM=y (for developers, not for distros), but
if you'd prefer to avoid it for now, just edit that VM_BUG_ON_PAGE()
in find_lock_entry() to a BUG_ON().

Or is there something more mysterious stopping it from showing up for
you? It's repeatable for me. When not crashing, that "cp" should fill
up about half of RAM before it hits the implicit tmpfs volume limit;
but I am assuming a not entirely fragmented machine - it does need
to allocate two 2MB pages before hitting the VM_BUG_ON_PAGE().

If you still can't see the crash, look to see how long /dev/shm/zero
is after the "cp": mine crashes a page or two over 2MB (I'm being
vague because I'm typing from the laptop I'd prefer not to reproduce
it on at the moment: I think it would be 1 page over, i_size not yet
updated for the page of index 0x201). But the xarray should by that
stage have been populated for two 2MB pages (by your "goto next" loop
in shmem_add_to_page_cache()).

> but I do see an RCU
> stall in find_get_entries() called from shmem_undo_range() when running
> 'cp' the second time -- ie while truncating the /dev/shm/zero file.

When I stopped oops crashing, I did indeed hang on that second attempt:
no "RCU stall" seen, but I've probably missed the relevant config option.

I wouldn't like to predict what happens if find_get_entry() returns the
wrong page when that VM_BUG_ON_PAGE() is compiled out, very confusing.
If it's compiled in, but just killed the process and dmesg was missed,
then there's an unlocked page lock which will indeed hang a subsequent
truncate (if the xarray yields the same wrong page again), though I
don't know if that would amount to an RCU stall.

> Maybe I'm seeing the same bug as you, and maybe I'm seeing a different
> one. Do we have a shmem test suite somewhere?

Not as such. xfstests works on tmpfs, huge or not, but I'd have to write
up a few instructions, note one or two "-g auto" tests to patch out since
they take forever on tmpfs, and the few failures expected; and update my
snapshot of the tree to check that over first (I pulled it last mid-May).

I'd rather not get into that at present: a working "cp" will be a great
step forward, then I can easily run xfstests on the fixed kernel.

>
> > I've spent a while on it, but better turn over to you, Matthew:
> > my guess is that xas_create_range() does not create the layout
> > you expect from it.
>
> I've dumped the XArray tree on my machine and it actually looks fine
> *except* that the pages pointed to are free! That indicates to me I
> screwed up somebody's reference count somewhere.

I don't actually know what a good xarray for two 2MB pages should look
like, since the best I can find seems to be a bad one!

Are you sure that those pages are free, rather than most of them tails
of one of the two compound pages involved? I think it's the same in your
rewrite of struct page, the compound_head field (lru.next), with its low
bit set, were how to recognize a tail page.

Hugh

2018-07-23 20:38:13

by Matthew Wilcox

[permalink] [raw]
Subject: Re: kernel BUG at mm/shmem.c:LINE!

On Mon, Jul 23, 2018 at 12:14:41PM -0700, Hugh Dickins wrote:
> On Mon, 23 Jul 2018, Matthew Wilcox wrote:
> > On Sun, Jul 22, 2018 at 07:28:01PM -0700, Hugh Dickins wrote:
> > > Whether or not that fixed syzbot's kernel BUG at mm/shmem.c:815!
> > > I don't know, but I'm afraid it has not fixed linux-next breakage of
> > > huge tmpfs: I get a similar page_to_pgoff BUG at mm/filemap.c:1466!
> > >
> > > Please try something like
> > > mount -o remount,huge=always /dev/shm
> > > cp /dev/zero /dev/shm
> > >
> > > Writing soon crashes in find_lock_entry(), looking up offset 0x201
> > > but getting the page for offset 0x3c1 instead.
> >
> > Hmm. I don't see a crash while running that command,
>
> Thanks for looking.
>
> It is the VM_BUG_ON_PAGE(page_to_pgoff(page) != offset, page)
> in find_lock_entry(). Perhaps you didn't have CONFIG_DEBUG_VM=y
> on this occasion? Or you don't think of an oops as a kernel crash,
> and didn't notice it in dmesg? I see now that I've arranged for oops
> to crash, since I don't like to miss them myself; but it is a very
> clean oops, no locks held, so can just kill the process and continue.

Usually I run with that turned on, but somehow in my recent messing
with my test system, that got turned off. Once I turned it back on,
it spots the bug instantly.

> Or is there something more mysterious stopping it from showing up for
> you? It's repeatable for me. When not crashing, that "cp" should fill
> up about half of RAM before it hits the implicit tmpfs volume limit;
> but I am assuming a not entirely fragmented machine - it does need
> to allocate two 2MB pages before hitting the VM_BUG_ON_PAGE().

I tried that too, before noticing that DEBUG_VM was off; raised my test
VM's memory from 2GB to 8GB.

> Are you sure that those pages are free, rather than most of them tails
> of one of the two compound pages involved? I think it's the same in your
> rewrite of struct page, the compound_head field (lru.next), with its low
> bit set, were how to recognize a tail page.

Yes, PageTail was set, and so was TAIL_MAPPING (0xdead0000000000400).
What was going on was the first 2MB page was being stored at indices
0-511, then the second 2MB page was being stored at indices 64-575
instead of 512-1023.

I figured out a fix and pushed it to the 'ida' branch in
git://git.infradead.org/users/willy/linux-dax.git

It won't be in linux-next tomorrow because the nvdimm people have
just dumped a pile of patches into their tree that conflict with
the XArray-DAX rewrite, so Stephen has pulled the XArray tree out
of linux-next temporarily. I didn't have time to sort out the merge
conflict today because I judged your bug report more important.

2018-07-23 22:43:38

by Hugh Dickins

[permalink] [raw]
Subject: Re: kernel BUG at mm/shmem.c:LINE!

On Mon, 23 Jul 2018, Matthew Wilcox wrote:
> On Mon, Jul 23, 2018 at 12:14:41PM -0700, Hugh Dickins wrote:
> > On Mon, 23 Jul 2018, Matthew Wilcox wrote:
> > > On Sun, Jul 22, 2018 at 07:28:01PM -0700, Hugh Dickins wrote:
> > > > Whether or not that fixed syzbot's kernel BUG at mm/shmem.c:815!
> > > > I don't know, but I'm afraid it has not fixed linux-next breakage of
> > > > huge tmpfs: I get a similar page_to_pgoff BUG at mm/filemap.c:1466!
> > > >
> > > > Please try something like
> > > > mount -o remount,huge=always /dev/shm
> > > > cp /dev/zero /dev/shm
> > > >
> > > > Writing soon crashes in find_lock_entry(), looking up offset 0x201
> > > > but getting the page for offset 0x3c1 instead.
> > >
> > > Hmm. I don't see a crash while running that command,
> >
> > Thanks for looking.
> >
> > It is the VM_BUG_ON_PAGE(page_to_pgoff(page) != offset, page)
> > in find_lock_entry(). Perhaps you didn't have CONFIG_DEBUG_VM=y
> > on this occasion? Or you don't think of an oops as a kernel crash,
> > and didn't notice it in dmesg? I see now that I've arranged for oops
> > to crash, since I don't like to miss them myself; but it is a very
> > clean oops, no locks held, so can just kill the process and continue.
>
> Usually I run with that turned on, but somehow in my recent messing
> with my test system, that got turned off. Once I turned it back on,
> it spots the bug instantly.
>
> > Or is there something more mysterious stopping it from showing up for
> > you? It's repeatable for me. When not crashing, that "cp" should fill
> > up about half of RAM before it hits the implicit tmpfs volume limit;
> > but I am assuming a not entirely fragmented machine - it does need
> > to allocate two 2MB pages before hitting the VM_BUG_ON_PAGE().
>
> I tried that too, before noticing that DEBUG_VM was off; raised my test
> VM's memory from 2GB to 8GB.
>
> > Are you sure that those pages are free, rather than most of them tails
> > of one of the two compound pages involved? I think it's the same in your
> > rewrite of struct page, the compound_head field (lru.next), with its low
> > bit set, were how to recognize a tail page.
>
> Yes, PageTail was set, and so was TAIL_MAPPING (0xdead0000000000400).
> What was going on was the first 2MB page was being stored at indices
> 0-511, then the second 2MB page was being stored at indices 64-575
> instead of 512-1023.
>
> I figured out a fix and pushed it to the 'ida' branch in
> git://git.infradead.org/users/willy/linux-dax.git

Great, thanks a lot for sorting that out so quickly. But I've cloned
the tree and don't see today's patch, so assume you've folded the fix
into an existing commit? If possible, please append the diff of today's
fix to this thread so that we can try it out. Or if that's difficult,
please at least tell which files were modified, then I can probably
work it out from the diff of those files against mmotm.

Thanks,
Hugh

>
> It won't be in linux-next tomorrow because the nvdimm people have
> just dumped a pile of patches into their tree that conflict with
> the XArray-DAX rewrite, so Stephen has pulled the XArray tree out
> of linux-next temporarily. I didn't have time to sort out the merge
> conflict today because I judged your bug report more important.

2018-07-23 22:57:07

by Matthew Wilcox

[permalink] [raw]
Subject: Re: kernel BUG at mm/shmem.c:LINE!

On Mon, Jul 23, 2018 at 03:42:22PM -0700, Hugh Dickins wrote:
> On Mon, 23 Jul 2018, Matthew Wilcox wrote:
> > I figured out a fix and pushed it to the 'ida' branch in
> > git://git.infradead.org/users/willy/linux-dax.git
>
> Great, thanks a lot for sorting that out so quickly. But I've cloned
> the tree and don't see today's patch, so assume you've folded the fix
> into an existing commit? If possible, please append the diff of today's
> fix to this thread so that we can try it out. Or if that's difficult,
> please at least tell which files were modified, then I can probably
> work it out from the diff of those files against mmotm.

Sure! It's just this:

diff --git a/lib/xarray.c b/lib/xarray.c
index 32a9c2a6a9e9..383c410997eb 100644
--- a/lib/xarray.c
+++ b/lib/xarray.c
@@ -660,6 +660,8 @@ void xas_create_range(struct xa_state *xas)
unsigned char sibs = xas->xa_sibs;

xas->xa_index |= ((sibs + 1) << shift) - 1;
+ if (!xas_top(xas->xa_node) && xas->xa_node->shift == xas->xa_shift)
+ xas->xa_offset |= sibs;
xas->xa_shift = 0;
xas->xa_sibs = 0;


The only other things changed are the test suite, and removing an
unnecessary change, so they can be ignored:

diff --git a/lib/test_xarray.c b/lib/test_xarray.c
index 8a67d4bb1788..ec06c3ca19e9 100644
--- a/lib/test_xarray.c
+++ b/lib/test_xarray.c
@@ -695,19 +695,20 @@ static noinline void check_move(struct xarray *xa)
check_move_small(xa, (1UL << i) - 1);
}

-static noinline void check_create_range_1(struct xarray *xa,
+static noinline void xa_store_many_order(struct xarray *xa,
unsigned long index, unsigned order)
{
XA_STATE_ORDER(xas, xa, index, order);
- unsigned int i;
+ unsigned int i = 0;

do {
xas_lock(&xas);
+ XA_BUG_ON(xa, xas_find_conflict(&xas));
xas_create_range(&xas);
if (xas_error(&xas))
goto unlock;
for (i = 0; i < (1U << order); i++) {
- xas_store(&xas, xa + i);
+ XA_BUG_ON(xa, xas_store(&xas, xa_mk_value(index + i)));
xas_next(&xas);
}
unlock:
@@ -715,7 +716,29 @@ static noinline void check_create_range_1(struct xarray *xa,
} while (xas_nomem(&xas, GFP_KERNEL));

XA_BUG_ON(xa, xas_error(&xas));
- xa_destroy(xa);
+}
+
+static noinline void check_create_range_1(struct xarray *xa,
+ unsigned long index, unsigned order)
+{
+ unsigned long i;
+
+ xa_store_many_order(xa, index, order);
+ for (i = index; i < index + (1UL << order); i++)
+ xa_erase_value(xa, i);
+ XA_BUG_ON(xa, !xa_empty(xa));
+}
+
+static noinline void check_create_range_2(struct xarray *xa, unsigned order)
+{
+ unsigned long i;
+ unsigned long nr = 1UL << order;
+
+ for (i = 0; i < nr * nr; i += nr)
+ xa_store_many_order(xa, i, order);
+ for (i = 0; i < nr * nr; i++)
+ xa_erase_value(xa, i);
+ XA_BUG_ON(xa, !xa_empty(xa));
}

static noinline void check_create_range(struct xarray *xa)
@@ -729,6 +752,8 @@ static noinline void check_create_range(struct xarray *xa)
check_create_range_1(xa, 2U << order, order);
check_create_range_1(xa, 3U << order, order);
check_create_range_1(xa, 1U << 24, order);
+ if (order < 10)
+ check_create_range_2(xa, order);
}
}

diff --git a/mm/shmem.c b/mm/shmem.c
index af2d7fa05af7..3ac507803787 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -589,8 +589,8 @@ static int shmem_add_to_page_cache(struct page *page,
VM_BUG_ON(expected && PageTransHuge(page));

page_ref_add(page, nr);
- page->index = index;
page->mapping = mapping;
+ page->index = index;

do {
void *entry;

2018-07-24 09:14:53

by Hugh Dickins

[permalink] [raw]
Subject: Re: kernel BUG at mm/shmem.c:LINE!

On Mon, 23 Jul 2018, Matthew Wilcox wrote:
> On Mon, Jul 23, 2018 at 03:42:22PM -0700, Hugh Dickins wrote:
> > On Mon, 23 Jul 2018, Matthew Wilcox wrote:
> > > I figured out a fix and pushed it to the 'ida' branch in
> > > git://git.infradead.org/users/willy/linux-dax.git
> >
> > Great, thanks a lot for sorting that out so quickly. But I've cloned
> > the tree and don't see today's patch, so assume you've folded the fix
> > into an existing commit? If possible, please append the diff of today's
> > fix to this thread so that we can try it out. Or if that's difficult,
> > please at least tell which files were modified, then I can probably
> > work it out from the diff of those files against mmotm.
>
> Sure! It's just this:
>
> diff --git a/lib/xarray.c b/lib/xarray.c
> index 32a9c2a6a9e9..383c410997eb 100644
> --- a/lib/xarray.c
> +++ b/lib/xarray.c
> @@ -660,6 +660,8 @@ void xas_create_range(struct xa_state *xas)
> unsigned char sibs = xas->xa_sibs;
>
> xas->xa_index |= ((sibs + 1) << shift) - 1;
> + if (!xas_top(xas->xa_node) && xas->xa_node->shift == xas->xa_shift)
> + xas->xa_offset |= sibs;
> xas->xa_shift = 0;
> xas->xa_sibs = 0;

Yes, that's a big improvement, the huge "cp" is now fine, thank you.

I've updated my xfstests tree, and tried that on mmotm with this patch.
The few failures are exactly the same as on 4.18-rc6, whether mounting
tmpfs as huge or not. But four of the tests, generic/{340,345,346,354}
crash (oops) on 4.18-rc5-mm1 + your patch above, but pass on 4.18-rc6.

That was simply with non-huge tmpfs: I just patched them out and didn't
try for whether they crash with huge tmpfs too: probably they do, but
that won't be very interesting until the non-huge crashes are fixed.

I paid no attention to where the crashes were, I was just pressing on
to skip the problem tests to get as full a run as possible, with that
list of what's problematic and needs further investigation.

To test non-huge tmpfs (as root), I wrap xfstests' check script as
follows (you'll want to mkdir or substitute somewhere else for /xft):

export FSTYP=tmpfs
export DISABLE_UDF_TEST=1
export TEST_DEV=tmpfs1:
export TEST_DIR=/xft
export SCRATCH_DEV=tmpfs2:
export SCRATCH_MNT=/mnt
mount -t $FSTYP -o size=1088M $TEST_DEV $TEST_DIR || exit $?
./check "$@" # typically "-g auto"
umount /xft /mnt 2>/dev/null

But don't bother with "-g auto" for the moment: I have workarounds in
for a few of them, generic/{027,213,449}, which we need not get into
right now (without them, two of those tests can take close to forever).

To test huge tmpfs (as root), I wrap xfstests' check script as:

export FSTYP=tmpfs
export DISABLE_UDF_TEST=1
export TEST_DEV=tmpfs1:
export TEST_DIR=/xft
export SCRATCH_DEV=tmpfs2:
export SCRATCH_MNT=/mnt
export TMPFS_MOUNT_OPTIONS="-o size=1088M,huge=always"
mount -t $FSTYP $TMPFS_MOUNT_OPTIONS $TEST_DEV $TEST_DIR || exit $?
./check "$@" # typically "-g auto"
umount /xft /mnt 2>/dev/null

Hugh

2018-07-26 06:54:33

by Hugh Dickins

[permalink] [raw]
Subject: Re: kernel BUG at mm/shmem.c:LINE!

On Tue, 24 Jul 2018, Hugh Dickins wrote:
> On Mon, 23 Jul 2018, Matthew Wilcox wrote:
> > On Mon, Jul 23, 2018 at 03:42:22PM -0700, Hugh Dickins wrote:
> > > On Mon, 23 Jul 2018, Matthew Wilcox wrote:
> > > > I figured out a fix and pushed it to the 'ida' branch in
> > > > git://git.infradead.org/users/willy/linux-dax.git
> > >
> > > Great, thanks a lot for sorting that out so quickly. But I've cloned
> > > the tree and don't see today's patch, so assume you've folded the fix
> > > into an existing commit? If possible, please append the diff of today's
> > > fix to this thread so that we can try it out. Or if that's difficult,
> > > please at least tell which files were modified, then I can probably
> > > work it out from the diff of those files against mmotm.
> >
> > Sure! It's just this:
> >
> > diff --git a/lib/xarray.c b/lib/xarray.c
> > index 32a9c2a6a9e9..383c410997eb 100644
> > --- a/lib/xarray.c
> > +++ b/lib/xarray.c
> > @@ -660,6 +660,8 @@ void xas_create_range(struct xa_state *xas)
> > unsigned char sibs = xas->xa_sibs;
> >
> > xas->xa_index |= ((sibs + 1) << shift) - 1;
> > + if (!xas_top(xas->xa_node) && xas->xa_node->shift == xas->xa_shift)
> > + xas->xa_offset |= sibs;
> > xas->xa_shift = 0;
> > xas->xa_sibs = 0;
>
> Yes, that's a big improvement, the huge "cp" is now fine, thank you.
>
> I've updated my xfstests tree, and tried that on mmotm with this patch.
> The few failures are exactly the same as on 4.18-rc6, whether mounting
> tmpfs as huge or not. But four of the tests, generic/{340,345,346,354}
> crash (oops) on 4.18-rc5-mm1 + your patch above, but pass on 4.18-rc6.

Now I've learnt that an oops on 0xffffffffffffffbe points to EEXIST,
not to EREMOTE, it's easy: patch below fixes those four xfstests
(and no doubt a similar oops I've seen occasionally under swapping
load): so gives clean xfstests runs for non-huge and huge tmpfs.

I can reproduce a kernel BUG at mm/khugepaged.c:1358! - that's the
VM_BUG_ON(index != xas.xa_index) in collapse_shmem() - but it will
take too long to describe how to reproduce that one, so I'm running
it past you just in case you have a quick idea on it, otherwise I'll
try harder. I did just try an xas_set(&xas, index) before the loop,
in case the xas_create_range(&xas) had interfered with initial state;
but if that made any difference at all, it only delayed the crash.

Hugh

--- mmotm/mm/shmem.c 2018-07-20 17:54:42.002805461 -0700
+++ linux/mm/shmem.c 2018-07-25 23:32:39.170892551 -0700
@@ -597,8 +597,10 @@ static int shmem_add_to_page_cache(struc
void *entry;
xas_lock_irq(&xas);
entry = xas_find_conflict(&xas);
- if (entry != expected)
+ if (entry != expected) {
xas_set_err(&xas, -EEXIST);
+ goto unlock;
+ }
xas_create_range(&xas);
if (xas_error(&xas))
goto unlock;

2018-07-26 14:36:09

by Matthew Wilcox

[permalink] [raw]
Subject: Re: kernel BUG at mm/shmem.c:LINE!

On Wed, Jul 25, 2018 at 11:53:15PM -0700, Hugh Dickins wrote:
> Now I've learnt that an oops on 0xffffffffffffffbe points to EEXIST,
> not to EREMOTE, it's easy: patch below fixes those four xfstests
> (and no doubt a similar oops I've seen occasionally under swapping
> load): so gives clean xfstests runs for non-huge and huge tmpfs.

Excellent!

I'm adding this:

+++ b/lib/test_xarray.c
@@ -741,6 +741,13 @@ static noinline void check_create_range_2(struct xarray *xa
, unsigned order)
XA_BUG_ON(xa, !xa_empty(xa));
}

+static noinline void check_create_range_3(void)
+{
+ XA_STATE(xas, NULL, 0);
+ xas_set_err(&xas, -EEXIST);
+ xas_create_range(&xas);
+}
+
static noinline void check_create_range(struct xarray *xa)
{
unsigned int order;
@@ -755,6 +762,8 @@ static noinline void check_create_range(struct xarray *xa)
if (order < 10)
check_create_range_2(xa, order);
}
+
+ check_create_range_3();
}

static LIST_HEAD(shadow_nodes);

and fixing the bug differently ;-) But many thanks for spotting it!

I'll look into the next bug you reported ...

2018-07-26 17:35:28

by Hugh Dickins

[permalink] [raw]
Subject: Re: kernel BUG at mm/shmem.c:LINE!

On Thu, 26 Jul 2018, Matthew Wilcox wrote:
> On Wed, Jul 25, 2018 at 11:53:15PM -0700, Hugh Dickins wrote:
>
> and fixing the bug differently ;-) But many thanks for spotting it!

I thought you might :)

>
> I'll look into the next bug you reported ...

No need: that idea now works a lot better when I use the initialized
"start", instead of the uninitialized "index".

Hugh

--- mmotm/mm/khugepaged.c 2018-07-20 17:54:41.978805312 -0700
+++ linux/mm/khugepaged.c 2018-07-26 09:20:22.416949014 -0700
@@ -1352,6 +1352,7 @@ static void collapse_shmem(struct mm_str
goto out;
} while (1);

+ xas_set(&xas, start);
for (index = start; index < end; index++) {
struct page *page = xas_next(&xas);


2018-07-26 19:33:53

by Matthew Wilcox

[permalink] [raw]
Subject: Re: kernel BUG at mm/shmem.c:LINE!

On Thu, Jul 26, 2018 at 09:40:20AM -0700, Hugh Dickins wrote:
> On Thu, 26 Jul 2018, Matthew Wilcox wrote:
> > On Wed, Jul 25, 2018 at 11:53:15PM -0700, Hugh Dickins wrote:
> >
> > and fixing the bug differently ;-) But many thanks for spotting it!
>
> I thought you might :)

The xas_* functions are all _expected_ to behave the same way when
passed an XA_STATE containing an error -- do nothing. xas_create_range()
behaved that way initially, then I fixed a bug and broke that invariant.
Now the test suite checks it so I won't break it again.

> > I'll look into the next bug you reported ...
>
> No need: that idea now works a lot better when I use the initialized
> "start", instead of the uninitialized "index".

Ugh. xas_create_range() is _supposed_ to return with xas pointing to
the first index in the range. I wonder what I messed up. I've had a
go at producing a test-case for this and haven't provoked a bug yet.

Still, I don't want to keep xas_create_range() around long-term.
I want to transition all the places that currently use it to use
multi-index entries. So I'm going to put your workaround in and then
work on deleting xas_create_range() altogether.

Thanks so much for all your work on this!