2018-11-25 06:17:39

by Anatoly Trosinenko

[permalink] [raw]
Subject: NFSd: NULL-dereference when writing to v4_end_grace when server is not yet started

Hello,

When manually exploring the kernel NFSd feature, I have stumbled upon
a NULL-dereference when writing to v4_end_grace when server is not yet
started.

How to reproduce with kvm-xfstests:

1) Checkout fresh master Linux branch (tested with commit e195ca6cb)
2) Copy x84_64-config-4.14 to .config, then enable NFS server v4 and build
3) From `kvm-xfstests shell`:

root@kvm-xfstests:~# mount none /proc/fs/nfsd -t nfsd
root@kvm-xfstests:~# echo Y > /proc/fs/nfsd/v4_end_grace
[ 11.986359] BUG: unable to handle kernel NULL pointer dereference
at 0000000000000000
[ 11.987187] PGD 800000007af97067 P4D 800000007af97067 PUD 78e9d067 PMD 0
[ 11.987774] Oops: 0000 [#1] SMP PTI
[ 11.988087] CPU: 0 PID: 281 Comm: bash Not tainted
4.20.0-rc3-xfstests-00306-ge195ca6cb6f #1
[ 11.988808] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.11.1-1ubuntu1 04/01/2014
[ 11.989575] RIP: 0010:__list_del_entry_valid+0x25/0x90
[ 11.990019] Code: c3 0f 1f 40 00 48 b9 00 01 00 00 00 00 ad de 48
8b 07 48 8b 57 08 48 39 c8 74 26 48 b9 00 02 00 00 00 00 ad de 48 39
ca 74 2e <48> 8b 32 48 39 fe 75 3a 48 8b 50 08 48 39 f2 75 48 b8 01 00
00 00
[ 11.991610] RSP: 0018:ffffa7d8c088fde8 EFLAGS: 00010207
[ 11.992066] RAX: 0000000000000000 RBX: ffff9ac7bc10ec28 RCX: dead000000000200
[ 11.992678] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9ac7bc10ec28
[ 11.993291] RBP: ffffa7d8c088fe20 R08: 0000000000000000 R09: 0000000000000001
[ 11.993902] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000002
[ 11.994583] R13: ffff9ac7bd8a9e00 R14: ffff9ac7ba56d008 R15: 0000000000000000
[ 11.995226] FS: 0000000000000000(0000) GS:ffff9ac7bfc00000(0063)
knlGS:00000000f7d76700
[ 11.996018] CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
[ 11.996514] CR2: 0000000000000000 CR3: 0000000078c94005 CR4: 0000000000360ef0
[ 11.997126] Call Trace:
[ 11.997346] locks_end_grace+0x1d/0x50
[ 11.997675] write_v4_end_grace+0xe7/0x1b0
[ 11.998033] ? nfsctl_transaction_write+0x80/0x80
[ 11.998440] nfsctl_transaction_write+0x45/0x80
[ 11.998835] __vfs_write+0x36/0x1a0
[ 11.999141] ? rcu_read_lock_sched_held+0x6c/0x80
[ 11.999550] ? rcu_sync_lockdep_assert+0x2e/0x60
[ 11.999955] ? __sb_start_write+0x147/0x1b0
[ 12.000320] ? vfs_write+0x161/0x1a0
[ 12.000634] vfs_write+0xba/0x1a0
[ 12.000927] ksys_write+0x52/0xc0
[ 12.001220] do_fast_syscall_32+0x97/0x2d0
[ 12.001578] entry_SYSENTER_compat+0x81/0x93
[ 12.001951] CR2: 0000000000000000
[ 12.002243] ---[ end trace 4137b5fb8d67f6b5 ]---
[ 12.002645] RIP: 0010:__list_del_entry_valid+0x25/0x90
[ 12.003089] Code: c3 0f 1f 40 00 48 b9 00 01 00 00 00 00 ad de 48
8b 07 48 8b 57 08 48 39 c8 74 26 48 b9 00 02 00 00 00 00 ad de 48 39
ca 74 2e <48> 8b 32 48 39 fe 75 3a 48 8b 50 08 48 39 f2 75 48 b8 01 00
00 00
[ 12.004682] RSP: 0018:ffffa7d8c088fde8 EFLAGS: 00010207
[ 12.005133] RAX: 0000000000000000 RBX: ffff9ac7bc10ec28 RCX: dead000000000200
[ 12.005746] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9ac7bc10ec28
[ 12.006360] RBP: ffffa7d8c088fe20 R08: 0000000000000000 R09: 0000000000000001
[ 12.006974] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000002
[ 12.007587] R13: ffff9ac7bd8a9e00 R14: ffff9ac7ba56d008 R15: 0000000000000000
[ 12.008206] FS: 0000000000000000(0000) GS:ffff9ac7bfc00000(0063)
knlGS:00000000f7d76700
[ 12.008898] CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
[ 12.009394] CR2: 0000000000000000 CR3: 0000000078c94005 CR4: 0000000000360ef0
[ 12.010004] BUG: sleeping function called from invalid context at
include/linux/percpu-rwsem.h:34
[ 12.010765] in_atomic(): 1, irqs_disabled(): 1, pid: 281, name: bash
[ 12.011311] INFO: lockdep is turned off.
[ 12.011652] irq event stamp: 19366
[ 12.012025] hardirqs last enabled at (19365): [<ffffffff89189da6>]
get_page_from_freelist+0x2c6/0x1660
[ 12.012862] hardirqs last disabled at (19366): [<ffffffff890015f4>]
trace_hardirqs_off_thunk+0x1a/0x1c
[ 12.013658] softirqs last enabled at (18228): [<ffffffff89c0032f>]
__do_softirq+0x32f/0x440
[ 12.014413] softirqs last disabled at (18221): [<ffffffff8908c806>]
irq_exit+0xa6/0xe0
[ 12.015091] CPU: 0 PID: 281 Comm: bash Tainted: G D
4.20.0-rc3-xfstests-00306-ge195ca6cb6f #1
[ 12.015934] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.11.1-1ubuntu1 04/01/2014
[ 12.016751] Call Trace:
[ 12.017056] dump_stack+0x67/0x90
[ 12.017348] ___might_sleep.cold.14+0x9f/0xaf
[ 12.017728] exit_signals+0x1c/0x200
[ 12.018041] do_exit+0xac/0xb00
[ 12.018319] ? ksys_write+0x52/0xc0
[ 12.018626] rewind_stack_do_exit+0x17/0x20
[ 12.019006] note: bash[281] exited with preempt_count 1

Best regards
Anatoly


2018-11-27 20:58:51

by J. Bruce Fields

[permalink] [raw]
Subject: Re: NFSd: NULL-dereference when writing to v4_end_grace when server is not yet started

On Sun, Nov 25, 2018 at 09:17:10AM +0300, Anatoly Trosinenko wrote:
> When manually exploring the kernel NFSd feature, I have stumbled upon
> a NULL-dereference when writing to v4_end_grace when server is not yet
> started.

Thanks for the report!

I think this is what we want--it's what a lot of the other nfsctl
methods do.

--b.

commit ad5fdf47b4e3
Author: J. Bruce Fields <[email protected]>
Date: Tue Nov 27 15:54:17 2018 -0500

nfsd4: fix crash on writing v4_end_grace before nfsd startup

Anatoly Trosinenko reports that this:

1) Checkout fresh master Linux branch (tested with commit e195ca6cb)
2) Copy x84_64-config-4.14 to .config, then enable NFS server v4 and build
3) From `kvm-xfstests shell`:

results in NULL dereference in locks_end_grace.

Check that nfsd has been started before trying to end the grace period.

Reported-by: Anatoly Trosinenko <[email protected]>
Signed-off-by: J. Bruce Fields <[email protected]>

diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index 6384c9b94898..38b223c1378e 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -1126,7 +1126,13 @@ static ssize_t write_v4_end_grace(struct file *file, char *buf, size_t size)
case 'Y':
case 'y':
case '1':
+ mutex_lock(&nfsd_mutex);
+ if (nn->nfsd_serv) {
+ mutex_unlock(&nfsd_mutex);
+ return -EBUSY;
+ }
nfsd4_end_grace(nn);
+ mutex_unlock(&nfsd_mutex);
break;
default:
return -EINVAL;

2018-11-28 01:19:24

by J. Bruce Fields

[permalink] [raw]
Subject: Re: NFSd: NULL-dereference when writing to v4_end_grace when server is not yet started

On Tue, Nov 27, 2018 at 03:58:49PM -0500, J. Bruce Fields wrote:
> On Sun, Nov 25, 2018 at 09:17:10AM +0300, Anatoly Trosinenko wrote:
> > When manually exploring the kernel NFSd feature, I have stumbled upon
> > a NULL-dereference when writing to v4_end_grace when server is not yet
> > started.
>
> Thanks for the report!
>
> I think this is what we want--it's what a lot of the other nfsctl
> methods do.

Hm, no, I'm getting a hang. It looks like in the nfsd4 state startup we
call a cltrack upcall while holding the nfsd_mutex, then nfsdcltrack
tries to write to end_grace. That's kind of ugly.

--b.

> commit ad5fdf47b4e3
> Author: J. Bruce Fields <[email protected]>
> Date: Tue Nov 27 15:54:17 2018 -0500
>
> nfsd4: fix crash on writing v4_end_grace before nfsd startup
>
> Anatoly Trosinenko reports that this:
>
> 1) Checkout fresh master Linux branch (tested with commit e195ca6cb)
> 2) Copy x84_64-config-4.14 to .config, then enable NFS server v4 and build
> 3) From `kvm-xfstests shell`:
>
> results in NULL dereference in locks_end_grace.
>
> Check that nfsd has been started before trying to end the grace period.
>
> Reported-by: Anatoly Trosinenko <[email protected]>
> Signed-off-by: J. Bruce Fields <[email protected]>
>
> diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
> index 6384c9b94898..38b223c1378e 100644
> --- a/fs/nfsd/nfsctl.c
> +++ b/fs/nfsd/nfsctl.c
> @@ -1126,7 +1126,13 @@ static ssize_t write_v4_end_grace(struct file *file, char *buf, size_t size)
> case 'Y':
> case 'y':
> case '1':
> + mutex_lock(&nfsd_mutex);
> + if (nn->nfsd_serv) {
> + mutex_unlock(&nfsd_mutex);
> + return -EBUSY;
> + }
> nfsd4_end_grace(nn);
> + mutex_unlock(&nfsd_mutex);
> break;
> default:
> return -EINVAL;