2019-03-15 12:20:59

by Oscar Salvador

[permalink] [raw]
Subject: [PATCH] mm: Fix __dump_page when mapping->host is not set

While debugging something, I added a dump_page() into do_swap_page(),
and I got the splat from below.
The issue happens when dereferencing mapping->host in __dump_page():

...
else if (mapping) {
pr_warn("%ps ", mapping->a_ops);
if (mapping->host->i_dentry.first) {
struct dentry *dentry;
dentry = container_of(mapping->host->i_dentry.first, struct dentry, d_u.d_alias);
pr_warn("name:\"%pd\" ", dentry);
}
}
...

Swap address space does not contain an inode information, and so mapping->host
equals NULL.

Although the dump_page() call was added artificially into do_swap_page(),
I am not sure if we can hit this from any other path, so it looks worth
fixing it.
We can easily do that by cheking mapping->host first.

Splat:

kernel: page:ffffea0000630180 count:3 mapcount:0 mapping:0000000000000000 index:0x0
kernel: swap_aops
kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000138
kernel: #PF error: [normal kernel read fault]
kernel: PGD 800000001eaea067 P4D 800000001eaea067 PUD 1eae9067 PMD 0
kernel: Oops: 0000 [#1] SMP PTI
kernel: CPU: 0 PID: 1522 Comm: __mremap Tainted: G E 5.0.0-rc8-mm1-1-default+ #43
kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
kernel: RIP: 0010:__dump_page+0x2d6/0x380
kernel: Code: ff 48 c7 c7 4d ac e3 81 31 c0 e8 e9 03 ef ff e9 27 fe ff ff 49 8b 75 70 31 c0 48 c7 c7 55 ac e3 81 e8 d2 03 ef ff 49 8b 45 00 <48> 8b 80 38 01 00 00 48 85 c0 0f 84 01 fe ff ff 48 8d b0 50 ff ff
kernel: RSP: 0000:ffffc900004c3ae0 EFLAGS: 00010296
kernel: RAX: 0000000000000000 RBX: ffffea0000630180 RCX: 0000000000000000
kernel: RDX: 000000000000000a RSI: ffffffff8276700c RDI: 0000000000000246
kernel: RBP: 0000000000000000 R08: ffffffff82767002 R09: 000000000000000a
kernel: R10: 00000000000005f2 R11: 0000000000012047 R12: ffffffff81e3b2d8
kernel: R13: ffff8880184e60a0 R14: ffff888013095100 R15: ffffea0000630180
kernel: FS: 00007f141813d4c0(0000) GS:ffff88801f200000(0000) knlGS:0000000000000000
kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 0000000000000138 CR3: 000000001eaf2005 CR4: 00000000003606b0
kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
kernel: Call Trace:
kernel: dump_page+0xe/0x20
kernel: do_swap_page+0x6a0/0xa10
kernel: __handle_mm_fault+0xa7f/0xc00
kernel: handle_mm_fault+0xfa/0x210
kernel: __do_page_fault+0x1f4/0x490
kernel: do_page_fault+0x32/0x140
kernel: async_page_fault+0x1e/0x30
kernel: RIP: 0010:copy_user_generic_unrolled+0xa0/0xc0
kernel: Code: 7f 40 ff c9 75 b6 89 d1 83 e2 07 c1 e9 03 74 12 4c 8b 06 4c 89 07 48 8d 76 08 48 8d 7f 08 ff c9 75 ee 21 d2 74 10 89 d1 8a 06 <88> 07 48 ff c6 48 ff c7 ff c9 75 f2 31 c0 0f 01 ca c3 66 66 2e 0f
kernel: RSP: 0018:ffffc900004c3d90 EFLAGS: 00050202
kernel: RAX: 00000000013a620a RBX: 0000000000000001 RCX: 0000000000000001
kernel: RDX: 0000000000000001 RSI: ffffc9000025f07a RDI: 00000000013a6260
kernel: RBP: ffff8880195fa400 R08: ffffc9000025f000 R09: 000000000000001c
kernel: R10: 0000000000000001 R11: 0000000000000fe4 R12: 7fffffffffffffff
kernel: R13: 0000000000000000 R14: 00000000013a6260 R15: 0000000000000000
kernel: _copy_to_user+0x22/0x30
kernel: n_tty_read+0x725/0x8d0
kernel: ? do_wait_intr_irq+0xa0/0xa0
kernel: tty_read+0x90/0xf0
kernel: vfs_read+0x89/0x140
kernel: ksys_read+0x42/0x90
kernel: do_syscall_64+0x5b/0x180
kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
kernel: RIP: 0033:0x7f1417c53b41
kernel: Code: Bad RIP value.
kernel: RSP: 002b:00007fffa7377d88 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
kernel: RAX: ffffffffffffffda RBX: 00007f1417f1e9e0 RCX: 00007f1417c53b41
kernel: RDX: 0000000000000400 RSI: 00000000013a6260 RDI: 0000000000000000
kernel: RBP: 0000000000000d68 R08: 00007f1417f20880 R09: 00007f141813d4c0
kernel: R10: 000000000000019b R11: 0000000000000246 R12: 00007f1417f1a8e0
kernel: R13: 00007f1417f1b420 R14: 00007f1417f1b420 R15: 0000000000000000
kernel: Modules linked in: parport_pc(E) af_packet(E) xt_tcpudp(E) ipt_REJECT(E) xt_conntrack(E) nf_conntrack(E) nf_defrag_ipv4(E) ip_set(E) nfnetlink(E) ebtable_nat(E) ebtable_broute(E) bridge(E) stp(E) llc(E) iptable_mangle(E) iptable_raw(E) iptable_security(E) ebtable_filter(E) ebtables(E) iptable_filter(E) ip_tables(E) x_tables(E) kvm_intel(E) kvm(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) aesni_intel(E) bochs_drm(E) aes_x86_64(E) crypto_simd(E) ttm(E) cryptd(E) glue_helper(E) drm_kms_helper(E) virtio_net(E) drm(E) net_failover(E) pcspkr(E) failover(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) i2c_piix4(E) parport(E) button(E) btrfs(E) libcrc32c(E) xor(E) zstd_decompress(E) zstd_compress(E) xxhash(E) raid6_pq(E) sd_mod(E) ata_generic(E) ata_piix(E) crc32c_intel(E) serio_raw(E) ahci(E) libahci(E) virtio_pci(E) virtio_ring(E) virtio(E) libata(E) sg(E) scsi_mod(E) autofs4(E)
kernel: CR2: 0000000000000138
kernel: ---[ end trace b061d02f3cb1a1d1 ]---
kernel: RIP: 0010:__dump_page+0x2d6/0x380
kernel: Code: ff 48 c7 c7 4d ac e3 81 31 c0 e8 e9 03 ef ff e9 27 fe ff ff 49 8b 75 70 31 c0 48 c7 c7 55 ac e3 81 e8 d2 03 ef ff 49 8b 45 00 <48> 8b 80 38 01 00 00 48 85 c0 0f 84 01 fe ff ff 48 8d b0 50 ff ff
kernel: RSP: 0000:ffffc900004c3ae0 EFLAGS: 00010296
kernel: RAX: 0000000000000000 RBX: ffffea0000630180 RCX: 0000000000000000
kernel: RDX: 000000000000000a RSI: ffffffff8276700c RDI: 0000000000000246
kernel: RBP: 0000000000000000 R08: ffffffff82767002 R09: 000000000000000a
kernel: R10: 00000000000005f2 R11: 0000000000012047 R12: ffffffff81e3b2d8
kernel: R13: ffff8880184e60a0 R14: ffff888013095100 R15: ffffea0000630180
kernel: FS: 00007f141813d4c0(0000) GS:ffff88801f200000(0000) knlGS:0000000000000000
kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 00007f1417c53b17 CR3: 000000001eaf2005 CR4: 00000000003606b0
kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

Fixes: 1c6fb1d89e73c ("mm: print more information about mapping in __dump_page")
Signed-off-by: Oscar Salvador <[email protected]>
---
mm/debug.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/debug.c b/mm/debug.c
index c0b31b6c3877..7759f12a8fbb 100644
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -79,7 +79,7 @@ void __dump_page(struct page *page, const char *reason)
pr_warn("ksm ");
else if (mapping) {
pr_warn("%ps ", mapping->a_ops);
- if (mapping->host->i_dentry.first) {
+ if (mapping->host && mapping->host->i_dentry.first) {
struct dentry *dentry;
dentry = container_of(mapping->host->i_dentry.first, struct dentry, d_u.d_alias);
pr_warn("name:\"%pd\" ", dentry);
--
2.13.7



2019-03-15 12:49:46

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH] mm: Fix __dump_page when mapping->host is not set

[Cc Jack and Hugh - the full patch is http://lkml.kernel.org/r/[email protected]]

On Fri 15-03-19 13:18:26, Oscar Salvador wrote:
> While debugging something, I added a dump_page() into do_swap_page(),
> and I got the splat from below.
> The issue happens when dereferencing mapping->host in __dump_page():
>
> ...
> else if (mapping) {
> pr_warn("%ps ", mapping->a_ops);
> if (mapping->host->i_dentry.first) {
> struct dentry *dentry;
> dentry = container_of(mapping->host->i_dentry.first, struct dentry, d_u.d_alias);
> pr_warn("name:\"%pd\" ", dentry);
> }
> }
> ...
>
> Swap address space does not contain an inode information, and so mapping->host
> equals NULL.
>
> Although the dump_page() call was added artificially into do_swap_page(),
> I am not sure if we can hit this from any other path, so it looks worth
> fixing it.

It is certainly worth fixing. We cannot assume anything about the
calling context for __dump_page

> We can easily do that by cheking mapping->host first.
[...]

The splat is still surprising to me because I thought that all file
backed mappings have a host. Swap file/partition certainly has a
mapping but swapcache mapping is special because the underlying swap
storage is hidden in the swap_info_struct. I am wondering whether we
should do that special casing for PageSwapCache in __dump_page rather
than hid the mapping details instead


diff --git a/mm/debug.c b/mm/debug.c
index 1611cf00a137..499c26d5ebe5 100644
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -78,6 +78,9 @@ void __dump_page(struct page *page, const char *reason)
else if (PageKsm(page))
pr_warn("ksm ");
else if (mapping) {
+ if (PageSwapCache(page))
+ mapping = page_swap_info(page)->swap_file->f_mapping;
+
pr_warn("%ps ", mapping->a_ops);
if (mapping->host->i_dentry.first) {
struct dentry *dentry;

But I am not really sure this will work for all swap cases.

Thanks for reporting this Oscar!

> Fixes: 1c6fb1d89e73c ("mm: print more information about mapping in __dump_page")
> Signed-off-by: Oscar Salvador <[email protected]>
> ---
> mm/debug.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/debug.c b/mm/debug.c
> index c0b31b6c3877..7759f12a8fbb 100644
> --- a/mm/debug.c
> +++ b/mm/debug.c
> @@ -79,7 +79,7 @@ void __dump_page(struct page *page, const char *reason)
> pr_warn("ksm ");
> else if (mapping) {
> pr_warn("%ps ", mapping->a_ops);
> - if (mapping->host->i_dentry.first) {
> + if (mapping->host && mapping->host->i_dentry.first) {
> struct dentry *dentry;
> dentry = container_of(mapping->host->i_dentry.first, struct dentry, d_u.d_alias);
> pr_warn("name:\"%pd\" ", dentry);
> --
> 2.13.7

--
Michal Hocko
SUSE Labs

2019-03-15 14:34:19

by Oscar Salvador

[permalink] [raw]
Subject: Re: [PATCH] mm: Fix __dump_page when mapping->host is not set

On Fri, Mar 15, 2019 at 01:47:33PM +0100, Michal Hocko wrote:
> diff --git a/mm/debug.c b/mm/debug.c
> index 1611cf00a137..499c26d5ebe5 100644
> --- a/mm/debug.c
> +++ b/mm/debug.c
> @@ -78,6 +78,9 @@ void __dump_page(struct page *page, const char *reason)
> else if (PageKsm(page))
> pr_warn("ksm ");
> else if (mapping) {
> + if (PageSwapCache(page))
> + mapping = page_swap_info(page)->swap_file->f_mapping;
> +
> pr_warn("%ps ", mapping->a_ops);
> if (mapping->host->i_dentry.first) {
> struct dentry *dentry;

This looks like a much nicer fix, indeed.
I gave it a spin and it works.

Since the mapping is set during the swapon, I would assume that this should
always work for swap.
Although I am not sure if once you start playing with e.g zswap the picture can
change.

Let us wait for Hugh and Jan.

Thanks Michal

--
Oscar Salvador
SUSE L3

2019-03-15 17:23:03

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH] mm: Fix __dump_page when mapping->host is not set

On Fri, 15 Mar 2019, Oscar Salvador wrote:
> On Fri, Mar 15, 2019 at 01:47:33PM +0100, Michal Hocko wrote:
> > diff --git a/mm/debug.c b/mm/debug.c
> > index 1611cf00a137..499c26d5ebe5 100644
> > --- a/mm/debug.c
> > +++ b/mm/debug.c
> > @@ -78,6 +78,9 @@ void __dump_page(struct page *page, const char *reason)
> > else if (PageKsm(page))
> > pr_warn("ksm ");
> > else if (mapping) {
> > + if (PageSwapCache(page))
> > + mapping = page_swap_info(page)->swap_file->f_mapping;
> > +
> > pr_warn("%ps ", mapping->a_ops);
> > if (mapping->host->i_dentry.first) {
> > struct dentry *dentry;
>
> This looks like a much nicer fix, indeed.
> I gave it a spin and it works.
>
> Since the mapping is set during the swapon, I would assume that this should
> always work for swap.
> Although I am not sure if once you start playing with e.g zswap the picture can
> change.
>
> Let us wait for Hugh and Jan.
>
> Thanks Michal

Sorry, I don't agree that Michal's more sophisticated patch is nicer:
the appropriate patch was your original, just checking for NULL.

Though, would I be too snarky to suggest that your patch description
would be better at 2 lines than 90? Swap mapping->host is NULL,
so of course __dump_page() needs to be careful about that.

I was a little disturbed to see __dump_page() now getting into dentries,
but admit that it can sometimes be very helpful to see the name of the
file involved; so if that is not in danger of breaking anything, okay.

It is very often useful to see if a page is PageSwapCache (typically
because that should account for 1 of its refcount); I cannot think of
a time when it's been useful to know the name of the underlying swap
device (if that's indeed what f_mapping leads to: it's new to me).
And if you need swp_type and swp_offset, they're in the raw output.

The cleverer __dump_page() tries to get, the more likely that it will
itself crash just when you need it most. Please just keep it simple.

Thanks,
Hugh

2019-03-16 08:25:24

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH] mm: Fix __dump_page when mapping->host is not set

On Fri 15-03-19 15:33:07, Oscar Salvador wrote:
> On Fri, Mar 15, 2019 at 01:47:33PM +0100, Michal Hocko wrote:
> > diff --git a/mm/debug.c b/mm/debug.c
> > index 1611cf00a137..499c26d5ebe5 100644
> > --- a/mm/debug.c
> > +++ b/mm/debug.c
> > @@ -78,6 +78,9 @@ void __dump_page(struct page *page, const char *reason)
> > else if (PageKsm(page))
> > pr_warn("ksm ");
> > else if (mapping) {
> > + if (PageSwapCache(page))
> > + mapping = page_swap_info(page)->swap_file->f_mapping;
> > +
> > pr_warn("%ps ", mapping->a_ops);
> > if (mapping->host->i_dentry.first) {
> > struct dentry *dentry;
>
> This looks like a much nicer fix, indeed.

If we go this way then we should swap the order and print the mapping
before we alter it.

> I gave it a spin and it works.

Thanks for testing!

> Since the mapping is set during the swapon, I would assume that this should
> always work for swap.
> Although I am not sure if once you start playing with e.g zswap the picture can
> change.
>
> Let us wait for Hugh and Jan.

Yes, I really cannot tell this is really safe. Maybe we want to do the
check for host anyway. Just to be sure.
--
Michal Hocko
SUSE Labs

2019-03-16 08:35:27

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH] mm: Fix __dump_page when mapping->host is not set

[my mbox didn't get synced completely so our emails "crossed"]

On Fri 15-03-19 10:21:18, Hugh Dickins wrote:
> On Fri, 15 Mar 2019, Oscar Salvador wrote:
> > On Fri, Mar 15, 2019 at 01:47:33PM +0100, Michal Hocko wrote:
> > > diff --git a/mm/debug.c b/mm/debug.c
> > > index 1611cf00a137..499c26d5ebe5 100644
> > > --- a/mm/debug.c
> > > +++ b/mm/debug.c
> > > @@ -78,6 +78,9 @@ void __dump_page(struct page *page, const char *reason)
> > > else if (PageKsm(page))
> > > pr_warn("ksm ");
> > > else if (mapping) {
> > > + if (PageSwapCache(page))
> > > + mapping = page_swap_info(page)->swap_file->f_mapping;
> > > +
> > > pr_warn("%ps ", mapping->a_ops);
> > > if (mapping->host->i_dentry.first) {
> > > struct dentry *dentry;
> >
> > This looks like a much nicer fix, indeed.
> > I gave it a spin and it works.
> >
> > Since the mapping is set during the swapon, I would assume that this should
> > always work for swap.
> > Although I am not sure if once you start playing with e.g zswap the picture can
> > change.
> >
> > Let us wait for Hugh and Jan.
> >
> > Thanks Michal
>
> Sorry, I don't agree that Michal's more sophisticated patch is nicer:
> the appropriate patch was your original, just checking for NULL.
>
> Though, would I be too snarky to suggest that your patch description
> would be better at 2 lines than 90? Swap mapping->host is NULL,
> so of course __dump_page() needs to be careful about that.
>
> I was a little disturbed to see __dump_page() now getting into dentries,
> but admit that it can sometimes be very helpful to see the name of the
> file involved; so if that is not in danger of breaking anything, okay.
>
> It is very often useful to see if a page is PageSwapCache (typically
> because that should account for 1 of its refcount); I cannot think of
> a time when it's been useful to know the name of the underlying swap
> device (if that's indeed what f_mapping leads to: it's new to me).
> And if you need swp_type and swp_offset, they're in the raw output.
>
> The cleverer __dump_page() tries to get, the more likely that it will
> itself crash just when you need it most. Please just keep it simple.

OK, fair enough. If we ever have anybody suggesting to follow the swap
lead then we can add it. I do not have a good use case for that right
now. Let's go with Oscar's original patch. Thanks!

Acked-by: Michal Hocko <[email protected]>
--
Michal Hocko
SUSE Labs