2010-11-15 14:45:45

by Robert Święcki

[permalink] [raw]
Subject: kernel BUG at /build/buildd/linux-2.6.35/mm/filemap.c:128!

Hi,

I was doing some fuzzing with http://code.google.com/p/iknowthis/ and
my system pretty quickly crashes with the BUG() below.

- Even if it's just BUG() it renders my system unusable (I'm able to
type a few characters on the virtual terminal at most)
- Judging from the stacktrace it's sys_madvise(..., ..., MADV_REMOVE)
- I'm testing with ubuntu's 2.6.35-22-server#35 but I got similar
results with 2.6.32 some time ago
- I'm posting this cause diving into linux mm spaghetti code might be
not a trivial task, but if nobody can see anything obvious in a day or
so, I'll try to debug it mysel
- I'm unable to provide a testcase by now, nor any usable state of the
crashing process, cause the system becomes unusable
- It crashes both linux-kernel working on a physical machine as well
as on the VirtualBox emulator
- I'm usually waiting from 0.5h to 12h for this crash to appear, I
think it could be speed up greatly by disabling any irrelevant
syscalls in the fuzzer

[25142.286531] kernel BUG at /build/buildd/linux-2.6.35/mm/filemap.c:128!
[25142.290627] invalid opcode: 0000 [#2] SMP
[25142.290627] last sysfs file: /sys/fs/ext4/sda1/lifetime_write_kbytes
[25142.301039] CPU 1
[25142.301039] Modules linked in: dccp_ipv6 decnet llc2 x25 pppoe
pppox irda crc_ccitt dccp_ipv4 dccp ipx p8023 rds af_key netrom econet
can atm l2cap bluetooth appletalk rose ax25 nouveau snd_hda_codec_idt
ttm snd_hda_intel drm_kms_helper snd_hda_codec led_class drm psmouse
snd_hwdep ppdev snd_pcm snd_timer i2c_algo_bit serio_raw i82975x_edac
snd parport_pc lp soundcore edac_core snd_page_alloc parport dcdbas
floppy ahci firewire_ohci firewire_core crc_itu_t tg3 libahci
[25142.301039]
[25142.301039] Pid: 22612, comm: iknowthis Tainted: G D W
2.6.35-22-server #35-Ubuntu 0GH911/Precision WorkStation 390
[25142.301039] RIP: 0010:[<ffffffff81101645>] [<ffffffff81101645>]
__remove_from_page_cache+0xd5/0xe0
[25142.367617] RSP: 0000:ffff8800cb8a3cf8 EFLAGS: 00010002
[25142.367617] RAX: 0100000000100029 RBX: ffffea0002159670 RCX: 0000000000000000
[25142.367617] RDX: 0000000000000001 RSI: 0000000000000016 RDI: ffff880100000700
[25142.367617] RBP: ffff8800cb8a3d08 R08: 0000000000000018 R09: 0000000000000000
[25142.367617] R10: ffffea0002159678 R11: 0000000000000000 R12: ffff8800cb865188
[25142.367617] R13: ffff8800cb865188 R14: ffff8800cb8a3d88 R15: 0000000000000000
[25142.367617] FS: 0000000000000000(0000) GS:ffff880001e40000(0063)
knlGS:00000000f75276c0
[25142.367617] CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b
[25142.367617] CR2: 00000000080ba1fc CR3: 000000012091d000 CR4: 00000000000006e0
[25142.367617] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[25142.367617] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[25142.367617] Process iknowthis (pid: 22612, threadinfo
ffff8800cb8a2000, task ffff8800cb9516e0)
[25142.367617] Stack:
[25142.367617] ffffea0002159670 ffff8800cb8651a0 ffff8800cb8a3d28
ffffffff81101686
[25142.367617] <0> ffffea0002159670 ffff8800cb865188 ffff8800cb8a3d48
ffffffff8110c6d9
[25142.367617] <0> 0000000000000001 0000000000000008 ffff8800cb8a3e38
ffffffff8110cce0
[25142.367617] Call Trace:
[25142.367617] [<ffffffff81101686>] remove_from_page_cache+0x36/0x60
[25142.367617] [<ffffffff8110c6d9>] truncate_inode_page+0x79/0xb0
[25142.367617] [<ffffffff8110cce0>] truncate_inode_pages_range+0x160/0x460
[25142.367617] [<ffffffff81121250>] ? unmap_mapping_range+0xb0/0x160
[25142.367617] [<ffffffff81121398>] vmtruncate_range+0x98/0x100
[25142.367617] [<ffffffff8111ae5a>] madvise_vma+0x17a/0x210
[25142.367617] [<ffffffff8111b055>] sys_madvise+0x165/0x290
[25142.367617] [<ffffffff810467c3>] ia32_sysret+0x0/0x5
[25142.367617] Code: 00 00 e8 1f 8a 1c 00 48 89 df 57 9d 0f 1f 44 00
00 5b 41 5c c9 c3 be 16 00 00 00 48 89 df e8 a3 68 01 00 48 8b 03 e9
71 ff ff ff <0f> 0b eb fe eb 05 90 90 90 90 90 55 48 89 e5 48 83 ec 10
48 89
[25142.367617] RIP [<ffffffff81101645>] __remove_from_page_cache+0xd5/0xe0
[25142.367617] RSP <ffff8800cb8a3cf8>
[25142.367617] ---[ end trace a262aa785b417723 ]---


--
Robert Święcki


2010-11-22 23:48:00

by Andrew Morton

[permalink] [raw]
Subject: Re: kernel BUG at /build/buildd/linux-2.6.35/mm/filemap.c:128!

(cc linux-mm)

On Mon, 15 Nov 2010 15:45:42 +0100
Robert __wi__cki <[email protected]> wrote:

> Hi,
>
> I was doing some fuzzing with http://code.google.com/p/iknowthis/ and
> my system pretty quickly crashes with the BUG() below.

So it is a repeatable crash?

> - Even if it's just BUG() it renders my system unusable (I'm able to
> type a few characters on the virtual terminal at most)
> - Judging from the stacktrace it's sys_madvise(..., ..., MADV_REMOVE)
> - I'm testing with ubuntu's 2.6.35-22-server#35 but I got similar
> results with 2.6.32 some time ago

It is.

> - I'm posting this cause diving into linux mm spaghetti code might be
> not a trivial task, but if nobody can see anything obvious in a day or
> so, I'll try to debug it mysel
> - I'm unable to provide a testcase by now, nor any usable state of the
> crashing process, cause the system becomes unusable
> - It crashes both linux-kernel working on a physical machine as well
> as on the VirtualBox emulator
> - I'm usually waiting from 0.5h to 12h for this crash to appear, I
> think it could be speed up greatly by disabling any irrelevant
> syscalls in the fuzzer
>
> [25142.286531] kernel BUG at /build/buildd/linux-2.6.35/mm/filemap.c:128!

That's

BUG_ON(page_mapped(page));

in __remove_from_page_cache(). That state is worth a BUG().

> [25142.290627] invalid opcode: 0000 [#2] SMP
> [25142.290627] last sysfs file: /sys/fs/ext4/sda1/lifetime_write_kbytes
> [25142.301039] CPU 1
> [25142.301039] Modules linked in: dccp_ipv6 decnet llc2 x25 pppoe
> pppox irda crc_ccitt dccp_ipv4 dccp ipx p8023 rds af_key netrom econet
> can atm l2cap bluetooth appletalk rose ax25 nouveau snd_hda_codec_idt
> ttm snd_hda_intel drm_kms_helper snd_hda_codec led_class drm psmouse
> snd_hwdep ppdev snd_pcm snd_timer i2c_algo_bit serio_raw i82975x_edac
> snd parport_pc lp soundcore edac_core snd_page_alloc parport dcdbas
> floppy ahci firewire_ohci firewire_core crc_itu_t tg3 libahci
> [25142.301039]
> [25142.301039] Pid: 22612, comm: iknowthis Tainted: G D W
> 2.6.35-22-server #35-Ubuntu 0GH911/Precision WorkStation 390
> [25142.301039] RIP: 0010:[<ffffffff81101645>] [<ffffffff81101645>]
> __remove_from_page_cache+0xd5/0xe0
> [25142.367617] RSP: 0000:ffff8800cb8a3cf8 EFLAGS: 00010002
> [25142.367617] RAX: 0100000000100029 RBX: ffffea0002159670 RCX: 0000000000000000
> [25142.367617] RDX: 0000000000000001 RSI: 0000000000000016 RDI: ffff880100000700
> [25142.367617] RBP: ffff8800cb8a3d08 R08: 0000000000000018 R09: 0000000000000000
> [25142.367617] R10: ffffea0002159678 R11: 0000000000000000 R12: ffff8800cb865188
> [25142.367617] R13: ffff8800cb865188 R14: ffff8800cb8a3d88 R15: 0000000000000000
> [25142.367617] FS: 0000000000000000(0000) GS:ffff880001e40000(0063)
> knlGS:00000000f75276c0
> [25142.367617] CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b
> [25142.367617] CR2: 00000000080ba1fc CR3: 000000012091d000 CR4: 00000000000006e0
> [25142.367617] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [25142.367617] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [25142.367617] Process iknowthis (pid: 22612, threadinfo
> ffff8800cb8a2000, task ffff8800cb9516e0)
> [25142.367617] Stack:
> [25142.367617] ffffea0002159670 ffff8800cb8651a0 ffff8800cb8a3d28
> ffffffff81101686
> [25142.367617] <0> ffffea0002159670 ffff8800cb865188 ffff8800cb8a3d48
> ffffffff8110c6d9
> [25142.367617] <0> 0000000000000001 0000000000000008 ffff8800cb8a3e38
> ffffffff8110cce0
> [25142.367617] Call Trace:
> [25142.367617] [<ffffffff81101686>] remove_from_page_cache+0x36/0x60
> [25142.367617] [<ffffffff8110c6d9>] truncate_inode_page+0x79/0xb0
> [25142.367617] [<ffffffff8110cce0>] truncate_inode_pages_range+0x160/0x460
> [25142.367617] [<ffffffff81121250>] ? unmap_mapping_range+0xb0/0x160
> [25142.367617] [<ffffffff81121398>] vmtruncate_range+0x98/0x100
> [25142.367617] [<ffffffff8111ae5a>] madvise_vma+0x17a/0x210
> [25142.367617] [<ffffffff8111b055>] sys_madvise+0x165/0x290
> [25142.367617] [<ffffffff810467c3>] ia32_sysret+0x0/0x5
> [25142.367617] Code: 00 00 e8 1f 8a 1c 00 48 89 df 57 9d 0f 1f 44 00
> 00 5b 41 5c c9 c3 be 16 00 00 00 48 89 df e8 a3 68 01 00 48 8b 03 e9
> 71 ff ff ff <0f> 0b eb fe eb 05 90 90 90 90 90 55 48 89 e5 48 83 ec 10
> 48 89
> [25142.367617] RIP [<ffffffff81101645>] __remove_from_page_cache+0xd5/0xe0
> [25142.367617] RSP <ffff8800cb8a3cf8>
> [25142.367617] ---[ end trace a262aa785b417723 ]---

2010-11-23 14:55:34

by Robert Święcki

[permalink] [raw]
Subject: Re: kernel BUG at /build/buildd/linux-2.6.35/mm/filemap.c:128!

>> Hi,
>>
>> I was doing some fuzzing with http://code.google.com/p/iknowthis/ and
>> my system pretty quickly crashes with the BUG() below.
>
> So it is a repeatable crash?

Not in a sense that I can provide you with a sequence of syscalls that
led to this state. Generally it repeats after some time (<12 hours on
2 intel-core) of running the
http://code.google.com/p/iknowthis/source/browse/#svn/trunk

>> - Even if it's just BUG() it renders my system unusable (I'm able to
>> type a few characters on the virtual terminal at most)
>> - Judging from the stacktrace it's sys_madvise(..., ..., MADV_REMOVE)
>> - I'm testing with ubuntu's 2.6.35-22-server#35 but I got similar
>> results with 2.6.32 some time ago
>
> It is.
>
>> - I'm posting this cause diving into linux mm spaghetti code might be
>> not a trivial task, but if nobody can see anything obvious in a day or
>> so, I'll try to debug it mysel
>> - I'm unable to provide a testcase by now, nor any usable state of the
>> crashing process, cause the system becomes unusable
>> - It crashes both linux-kernel working on a physical machine as well
>> as on the VirtualBox emulator
>> - I'm usually waiting from 0.5h to 12h for this crash to appear, I
>> think it could be speed up greatly by disabling any irrelevant
>> syscalls in the fuzzer
>>
>> [25142.286531] kernel BUG at /build/buildd/linux-2.6.35/mm/filemap.c:128!
>
> That's
>
>        BUG_ON(page_mapped(page));
>
> in __remove_from_page_cache().  That state is worth a BUG().
>
>> [25142.290627] invalid opcode: 0000 [#2] SMP
>> [25142.290627] last sysfs file: /sys/fs/ext4/sda1/lifetime_write_kbytes
>> [25142.301039] CPU 1
>> [25142.301039] Modules linked in: dccp_ipv6 decnet llc2 x25 pppoe
>> pppox irda crc_ccitt dccp_ipv4 dccp ipx p8023 rds af_key netrom econet
>> can atm l2cap bluetooth appletalk rose ax25 nouveau snd_hda_codec_idt
>> ttm snd_hda_intel drm_kms_helper snd_hda_codec led_class drm psmouse
>> snd_hwdep ppdev snd_pcm snd_timer i2c_algo_bit serio_raw i82975x_edac
>> snd parport_pc lp soundcore edac_core snd_page_alloc parport dcdbas
>> floppy ahci firewire_ohci firewire_core crc_itu_t tg3 libahci
>> [25142.301039]
>> [25142.301039] Pid: 22612, comm: iknowthis Tainted: G      D W
>> 2.6.35-22-server #35-Ubuntu 0GH911/Precision WorkStation 390
>> [25142.301039] RIP: 0010:[<ffffffff81101645>]  [<ffffffff81101645>]
>> __remove_from_page_cache+0xd5/0xe0
>> [25142.367617] RSP: 0000:ffff8800cb8a3cf8  EFLAGS: 00010002
>> [25142.367617] RAX: 0100000000100029 RBX: ffffea0002159670 RCX: 0000000000000000
>> [25142.367617] RDX: 0000000000000001 RSI: 0000000000000016 RDI: ffff880100000700
>> [25142.367617] RBP: ffff8800cb8a3d08 R08: 0000000000000018 R09: 0000000000000000
>> [25142.367617] R10: ffffea0002159678 R11: 0000000000000000 R12: ffff8800cb865188
>> [25142.367617] R13: ffff8800cb865188 R14: ffff8800cb8a3d88 R15: 0000000000000000
>> [25142.367617] FS:  0000000000000000(0000) GS:ffff880001e40000(0063)
>> knlGS:00000000f75276c0
>> [25142.367617] CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
>> [25142.367617] CR2: 00000000080ba1fc CR3: 000000012091d000 CR4: 00000000000006e0
>> [25142.367617] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [25142.367617] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>> [25142.367617] Process iknowthis (pid: 22612, threadinfo
>> ffff8800cb8a2000, task ffff8800cb9516e0)
>> [25142.367617] Stack:
>> [25142.367617]  ffffea0002159670 ffff8800cb8651a0 ffff8800cb8a3d28
>> ffffffff81101686
>> [25142.367617] <0> ffffea0002159670 ffff8800cb865188 ffff8800cb8a3d48
>> ffffffff8110c6d9
>> [25142.367617] <0> 0000000000000001 0000000000000008 ffff8800cb8a3e38
>> ffffffff8110cce0
>> [25142.367617] Call Trace:
>> [25142.367617]  [<ffffffff81101686>] remove_from_page_cache+0x36/0x60
>> [25142.367617]  [<ffffffff8110c6d9>] truncate_inode_page+0x79/0xb0
>> [25142.367617]  [<ffffffff8110cce0>] truncate_inode_pages_range+0x160/0x460
>> [25142.367617]  [<ffffffff81121250>] ? unmap_mapping_range+0xb0/0x160
>> [25142.367617]  [<ffffffff81121398>] vmtruncate_range+0x98/0x100
>> [25142.367617]  [<ffffffff8111ae5a>] madvise_vma+0x17a/0x210
>> [25142.367617]  [<ffffffff8111b055>] sys_madvise+0x165/0x290
>> [25142.367617]  [<ffffffff810467c3>] ia32_sysret+0x0/0x5
>> [25142.367617] Code: 00 00 e8 1f 8a 1c 00 48 89 df 57 9d 0f 1f 44 00
>> 00 5b 41 5c c9 c3 be 16 00 00 00 48 89 df e8 a3 68 01 00 48 8b 03 e9
>> 71 ff ff ff <0f> 0b eb fe eb 05 90 90 90 90 90 55 48 89 e5 48 83 ec 10
>> 48 89
>> [25142.367617] RIP  [<ffffffff81101645>] __remove_from_page_cache+0xd5/0xe0
>> [25142.367617]  RSP <ffff8800cb8a3cf8>
>> [25142.367617] ---[ end trace a262aa785b417723 ]---
>
>



--
Robert Święcki

2010-11-29 23:25:11

by Andrew Morton

[permalink] [raw]
Subject: Re: kernel BUG at /build/buildd/linux-2.6.35/mm/filemap.c:128!

On Tue, 23 Nov 2010 15:55:31 +0100
Robert wi cki <[email protected]> wrote:

> >> Hi,
> >>
> >> I was doing some fuzzing with http://code.google.com/p/iknowthis/ and
> >> my system pretty quickly crashes with the BUG() below.
> >
> > So it is a repeatable crash?
>
> Not in a sense that I can provide you with a sequence of syscalls that
> led to this state. Generally it repeats after some time (<12 hours on
> 2 intel-core) of running the
> http://code.google.com/p/iknowthis/source/browse/#svn/trunk
>
> >> - Even if it's just BUG() it renders my system unusable (I'm able to
> >> type a few characters on the virtual terminal at most)
> >> - Judging from the stacktrace it's sys_madvise(..., ..., MADV_REMOVE)
> >> - I'm testing with ubuntu's 2.6.35-22-server#35 but I got similar
> >> results with 2.6.32 some time ago
> >
> > It is.
> >
> >> - I'm posting this cause diving into linux mm spaghetti code might be
> >> not a trivial task, but if nobody can see anything obvious in a day or
> >> so, I'll try to debug it mysel
> >> - I'm unable to provide a testcase by now, nor any usable state of the
> >> crashing process, cause the system becomes unusable
> >> - It crashes both linux-kernel working on a physical machine as well
> >> as on the VirtualBox emulator
> >> - I'm usually waiting from 0.5h to 12h for this crash to appear, I
> >> think it could be speed up greatly by disabling any irrelevant
> >> syscalls in the fuzzer
> >>
> >> [25142.286531] kernel BUG at /build/buildd/linux-2.6.35/mm/filemap.c:128!
> >
> > That's
> >
> > BUG_ON(page_mapped(page));
> >
> > in remove_from_page_cache(). That state is worth a BUG().

At a guess I'd say that another thread came in and established a
mapping against a page in the to-be-truncated range while
vmtruncate_range() was working on it. In fact I'd be suspecting that
the mapping was established after truncate_inode_page() ran its
page_mapped() test.

Let's take a look at vmtruncate_range():

int vmtruncate_range(struct inode *inode, loff_t offset, loff_t end)
{
struct address_space *mapping = inode->i_mapping;

/*
* If the underlying filesystem is not going to provide
* a way to truncate a range of blocks (punch a hole) -
* we should return failure right now.
*/
if (!inode->i_op->truncate_range)
return -ENOSYS;

mutex_lock(&inode->i_mutex);
down_write(&inode->i_alloc_sem);
unmap_mapping_range(mapping, offset, (end - offset), 1);
truncate_inode_pages_range(mapping, offset, end);
unmap_mapping_range(mapping, offset, (end - offset), 1);
inode->i_op->truncate_range(inode, offset, end);
up_write(&inode->i_alloc_sem);
mutex_unlock(&inode->i_mutex);

return 0;
}

Now, why does it call unmap_mapping_range() twice?

Nick's original 2007 patch d00806b183152af6d2 ("mm: fix fault vs
invalidate race for linear mappings") added the second
unmap_mapping_range() call, along with this nice comment, which
explains it all:


+ /*
+ * unmap_mapping_range is called twice, first simply for efficiency
+ * so that truncate_inode_pages does fewer single-page unmaps. However
+ * after this first call, and before truncate_inode_pages finishes,
+ * it is possible for private pages to be COWed, which remain after
+ * truncate_inode_pages finishes, hence the second unmap_mapping_range
+ * call must be made for correctness.
+ /*

Later, some twirp deleted the damn comment. Why'd we do that? It
still seems to be valid.

If this _is_ still valid, and the first call to unmap_mapping_range() is
really just a best-effort performance thing which won't reliably clear
all the mappings then perhaps the BUG_ON(page_mapped(page)) assertion
in __remove_from_page_cache() is simply bogus.

We don't appear to have mmap_sem coverage around here, perhaps for
lock-ordering reasons. I suspect we'll be struggling to plug all holes
here without that coverage.

Fortunately the comment over madvise_remove() says it's tmpfs-only, so
we can blame Hugh :)


hm, I found the lost comment. It somehow wandered over into
truncate_pagecache(), but is still relevant at the vmtruncate_range()
site.

2010-11-30 18:23:11

by Hugh Dickins

[permalink] [raw]
Subject: Re: kernel BUG at /build/buildd/linux-2.6.35/mm/filemap.c:128!

On Mon, 29 Nov 2010, Andrew Morton wrote:
> On Tue, 23 Nov 2010 15:55:31 +0100
> Robert wi cki <[email protected]> wrote:
> > >> [25142.286531] kernel BUG at /build/buildd/linux-2.6.35/mm/filemap.c:128!
> > >
> > > That's
> > >
> > > BUG_ON(page_mapped(page));
> > >
> > > in remove_from_page_cache(). That state is worth a BUG().
>
> At a guess I'd say that another thread came in and established a
> mapping against a page in the to-be-truncated range while
> vmtruncate_range() was working on it. In fact I'd be suspecting that
> the mapping was established after truncate_inode_page() ran its
> page_mapped() test.

It looks that way, but I don't see how it can be: the page is locked
before calling truncate_inode_page() and unlocked after it: and the
page (certainly in this tmpfs case, perhaps not for every filesystem)
cannot be faulted into an address space without holding its page lock.

Either we've made a change somewhere, and are now dropping and retaking
page lock in a way which exposes this bug? Or truncate_inode_page()'s
unmap_mapping_range() call is somehow missing the page it's called for?

I guess the latter is the more likely: maybe the truncate_count/restart
logic isn't working properly. I'll try to check over that again later -
but will be happy if someone else beats me to it.

> + /*
> + * unmap_mapping_range is called twice, first simply for efficiency
> + * so that truncate_inode_pages does fewer single-page unmaps. However
> + * after this first call, and before truncate_inode_pages finishes,
> + * it is possible for private pages to be COWed, which remain after
> + * truncate_inode_pages finishes, hence the second unmap_mapping_range
> + * call must be made for correctness.
> + /*
>
> Later, some twirp deleted the damn comment. Why'd we do that? It
> still seems to be valid.
>
> If this _is_ still valid, and the first call to unmap_mapping_range() is
> really just a best-effort performance thing which won't reliably clear
> all the mappings then perhaps the BUG_ON(page_mapped(page)) assertion
> in __remove_from_page_cache() is simply bogus.

No, I believe the first call to unmap_mapping_range() is sufficient to
deal correctly with all page cache pages (and Robert's issue is certainly
with a page cache page). The second call to unmap_mapping_range() is to
mop up private copied-on-write copies of page cache pages: being separate
pages, the page locking is not adequate to deal with them as thoroughly,
but standards still require them to be removed (SIGBUS beyond EOF).

>
> We don't appear to have mmap_sem coverage around here, perhaps for
> lock-ordering reasons. I suspect we'll be struggling to plug all holes
> here without that coverage.

I think any help from mmap_sem here would be deceptive: another mm,
with another mmap_sem, could equally operate on the pages of this file,
and presumably introduce the same condition. i_mmap_lock and page lock
should already be handling it.

>
> Fortunately the comment over madvise_remove() says it's tmpfs-only, so
> we can blame Hugh :)

Glad to be of service! But wish I could work out what's happening.

>
> hm, I found the lost comment. It somehow wandered over into
> truncate_pagecache(), but is still relevant at the vmtruncate_range()
> site.

Yes, it is indeed a helpful comment.

Hugh

2010-12-29 20:54:39

by Hugh Dickins

[permalink] [raw]
Subject: Re: kernel BUG at /build/buildd/linux-2.6.35/mm/filemap.c:128!

On Tue, 30 Nov 2010, Hugh Dickins wrote:
> On Mon, 29 Nov 2010, Andrew Morton wrote:
> > On Tue, 23 Nov 2010 15:55:31 +0100
> > Robert wi cki <[email protected]> wrote:
> > > >> [25142.286531] kernel BUG at /build/buildd/linux-2.6.35/mm/filemap.c:128!
> > > >
> > > > That's
> > > >
> > > > BUG_ON(page_mapped(page));
> > > >
> > > > in remove_from_page_cache(). That state is worth a BUG().
> >
> > At a guess I'd say that another thread came in and established a
> > mapping against a page in the to-be-truncated range while
> > vmtruncate_range() was working on it. In fact I'd be suspecting that
> > the mapping was established after truncate_inode_page() ran its
> > page_mapped() test.
>
> It looks that way, but I don't see how it can be: the page is locked
> before calling truncate_inode_page() and unlocked after it: and the
> page (certainly in this tmpfs case, perhaps not for every filesystem)
> cannot be faulted into an address space without holding its page lock.
>
> Either we've made a change somewhere, and are now dropping and retaking
> page lock in a way which exposes this bug? Or truncate_inode_page()'s
> unmap_mapping_range() call is somehow missing the page it's called for?
>
> I guess the latter is the more likely: maybe the truncate_count/restart
> logic isn't working properly. I'll try to check over that again later -
> but will be happy if someone else beats me to it.

I have since found an omission in the restart_addr logic: looking back
at the October 2004 history of vm_truncate_count, I see that originally
I designed it to work one way, but hurriedly added a 7/6 redesign when
vma splitting turned out to leave an ambiguity. I should have updated
the protection in mremap move at that time, but missed it.

Robert, please try out the patch below (should apply fine to 2.6.35):
I'm hoping this will fix what the fuzzer found, but it's still quite
possible that it found something else wrong that I've not yet noticed.
The patch could probably be cleverer (if we exported the notion of
restart_addr out of mm/memory.c), but I'm more in the mood for being
safe than clever at the moment.

I didn't hear whether you'd managed to try out Miklos's patch; but
this one is a better bet to be the fix for your particular issue.

Thanks,
Hugh

--- 2.6.37-rc8/mm/mremap.c 2010-11-01 13:01:32.000000000 -0700
+++ linux/mm/mremap.c 2010-12-29 12:25:46.000000000 -0800
@@ -91,9 +91,7 @@ static void move_ptes(struct vm_area_str
*/
mapping = vma->vm_file->f_mapping;
spin_lock(&mapping->i_mmap_lock);
- if (new_vma->vm_truncate_count &&
- new_vma->vm_truncate_count != vma->vm_truncate_count)
- new_vma->vm_truncate_count = 0;
+ new_vma->vm_truncate_count = 0;
}

/*

2010-12-30 21:25:27

by Robert Święcki

[permalink] [raw]
Subject: Re: kernel BUG at /build/buildd/linux-2.6.35/mm/filemap.c:128!

>> I guess the latter is the more likely: maybe the truncate_count/restart
>> logic isn't working properly.  I'll try to check over that again later -
>> but will be happy if someone else beats me to it.
>
> I have since found an omission in the restart_addr logic: looking back
> at the October 2004 history of vm_truncate_count, I see that originally
> I designed it to work one way, but hurriedly added a 7/6 redesign when
> vma splitting turned out to leave an ambiguity.  I should have updated
> the protection in mremap move at that time, but missed it.
>
> Robert, please try out the patch below (should apply fine to 2.6.35):

In the beginning of Jan (3-4) at earliest I'm afraid, i.e. when I
manage to get to my console-over-rs232 setup.

> I'm hoping this will fix what the fuzzer found, but it's still quite
> possible that it found something else wrong that I've not yet noticed.
> The patch could probably be cleverer (if we exported the notion of
> restart_addr out of mm/memory.c), but I'm more in the mood for being
> safe than clever at the moment.

--
Robert Święcki