2023-06-27 19:32:17

by Anna Schumaker

[permalink] [raw]
Subject: Re: Regression: NULL pointer dereference after NFS_V4_2_READ_PLUS (commit 7fd461c47)

Hi Krzysztof,

On Mon, Jun 26, 2023 at 6:28 AM Krzysztof Kozlowski
<[email protected]> wrote:
>
> On 23/06/2023 19:59, Anna Schumaker wrote:
> >>>>>>>
> >>>>>>> Can you swap out yesterday's patch with this patch? I've adjusted what
> >>>>>>> gets printed out, and added printk()s to xdr_copy_to_scratch(). I'm
> >>>>>>> starting to think that the xdr scratch buffer is fine, and that it's
> >>>>>>> the other pointer passed to memcpy() in that function that's the
> >>>>>>> problem, and the output from this patch will confirm for me.
> >>>>>>
> >>>>>> Oh, and can you add this one on top of the v2 patch as well?
> >>>>>
> >>>>> Sorry about the noise today. Can you use this patch instead of the two
> >>>>> I attached earlier? I cleaned up the output and cut down on extra
> >>>>> output..
> >>>>>
> >>>>
> >>>> Here you have - attached.
> >>>
> >>> This is good, thanks! I was finally able to figure out how to hit the
> >>> bug using a 32bit x86 VM, so hopefully the next thing you hear from me
> >>> is a patch fixing the bug!
> >
> > I'm really hopeful that the attached patch finally fixes the issue.
> > Can you try it and let me know?
>
> Just test it yourself on 32-bit system... There is absolutely nothing
> special in the system I reproduced it on. Nothing.
>

I have an updated set of patches for you to try out, hopefully fixing
that last set of warnings from the other day. I gave them the same
amount of testing as my previous patch: connectathon tests, xfstests,
and my own read plus test using x86_64 and i686 VMs mounted with NFS
versions 3, 4.0, 4.1, and 4.2 and sec=sys, sec=krb5, sec=krb5i, and
sec=krb5p. I have not hit any oopses, warnings, or newly-failing
tests.

I would appreciate it if you could try it out on your hardware and let
me know if there is still an issue since trying to compile an
exynos_defconfig for an i686 VM will result in a different set of
Kconfig options getting selected compared to what you have.
Additionally, your odroid-hc1 has a unique CPU setup containing both a
Cortex-A15 and Cortex-A7 which isn't possible to virtualize with
libvirt.

Thanks.
Anna
>
> IP-Config: eth0 hardware address 00:1e:06:30:bf:ac mtu 1500
> IP-Config: eth0 guessed broadcast address 192.168.1.255
> IP-Config: eth0 complete (from 192.168.1.10):
> address: 192.168.1.12 broadcast: 192.168.1.255 netmask:
> 255.255.255.0
> gateway: 192.168.1.1 dns0 : 0.0.0.0 dns1 : 0.0.0.0
>
> rootserver: 192.168.1.10 rootpath:
> filename :
> NFS-Mount: 192.168.1.10:/srv/nfs/odroidhc1
> Waiting 10 seconds for device /dev/nfs ...
> ERROR: device '/dev/nfs' not found. Skipping fsck.
> Mount cmd:
> mount.nfs4 -o vers=4,nolock 192.168.1.10:/srv/nfs/odroidhc1 /new_root
> [ 21.800626] ------------[ cut here ]------------
> [ 21.803891] WARNING: CPU: 7 PID: 154 at mm/highmem.c:603
> xdr_stream_unmap_current_page+0x18/0x24
> [ 21.812729] Modules linked in:
> [ 21.815642] CPU: 7 PID: 154 Comm: mount.nfs4 Not tainted
> 6.4.0-00001-gfbb103bb8df0 #8
> [ 21.823444] Hardware name: Samsung Exynos (Flattened Device Tree)
> [ 21.829525] unwind_backtrace from show_stack+0x10/0x14
> [ 21.834698] show_stack from dump_stack_lvl+0x58/0x70
> [ 21.839730] dump_stack_lvl from __warn+0x7c/0x1bc
> [ 21.844491] __warn from warn_slowpath_fmt+0xbc/0x1b8
> [ 21.849518] warn_slowpath_fmt from
> xdr_stream_unmap_current_page+0x18/0x24
> [ 21.856437] xdr_stream_unmap_current_page from call_decode+0x210/0x2c8
> [ 21.863020] call_decode from __rpc_execute+0xf8/0x764
> [ 21.868134] __rpc_execute from rpc_execute+0xc0/0x1d0
> [ 21.873243] rpc_execute from rpc_run_task+0x148/0x190
> [ 21.878348] rpc_run_task from rpc_create_xprt+0x1a4/0x284
> [ 21.883805] rpc_create_xprt from rpc_create+0xf8/0x254
> [ 21.889004] rpc_create from nfs_create_rpc_client+0x150/0x17c
> [ 21.894812] nfs_create_rpc_client from nfs4_alloc_client+0x360/0x374
> [ 21.901226] nfs4_alloc_client from nfs_get_client+0x16c/0x3e8
> [ 21.907030] nfs_get_client from nfs4_set_client+0xfc/0x1a4
> [ 21.912574] nfs4_set_client from nfs4_create_server+0x11c/0x2fc
> [ 21.918554] nfs4_create_server from nfs4_try_get_tree+0x10/0x50
> [ 21.924534] nfs4_try_get_tree from vfs_get_tree+0x24/0xe4
> [ 21.929993] vfs_get_tree from path_mount+0x3e8/0xb04
> [ 21.935019] path_mount from sys_mount+0x20c/0x254
> [ 21.939784] sys_mount from ret_fast_syscall+0x0/0x1c
> [ 21.944809] Exception stack(0xf0cf9fa8 to 0xf0cf9ff0)
> [ 21.949837] 9fa0: 0047ebe0 00479c64 0047e960
> 0047e9b8 0047e9c8 00000000
> [ 21.957986] 9fc0: 0047ebe0 00479c64 b6f058c8 00000015 00466c08
> 00000010 00479c64 00466bfc
> [ 21.966139] 9fe0: 00479e70 befb69b0 0045a708 b6dca610
> [ 21.971245] irq event stamp: 0
> [ 21.974188] hardirqs last enabled at (0): [<00000000>] 0x0
> [ 21.979736] hardirqs last disabled at (0): [<c012357c>]
> copy_process+0x810/0x1ffc
> [ 21.987227] softirqs last enabled at (0): [<c012357c>]
> copy_process+0x810/0x1ffc
> [ 21.994679] softirqs last disabled at (0): [<00000000>] 0x0
> [ 22.000264] ---[ end trace 0000000000000000 ]---
> [ 22.004781] BUG: sleeping function called from invalid context at
> net/sunrpc/sched.c:953
> [ 22.012876] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid:
> 154, name: mount.nfs4
> [ 22.020936] preempt_count: 1, expected: 0
> [ 22.024881] RCU nest depth: 0, expected: 0
> [ 22.028955] INFO: lockdep is turned off.
> [ 22.032889] CPU: 7 PID: 154 Comm: mount.nfs4 Tainted: G W
> 6.4.0-00001-gfbb103bb8df0 #8
> [ 22.042131] Hardware name: Samsung Exynos (Flattened Device Tree)
> [ 22.048196] unwind_backtrace from show_stack+0x10/0x14
> [ 22.053393] show_stack from dump_stack_lvl+0x58/0x70
> [ 22.058417] dump_stack_lvl from __might_resched+0x194/0x260
> [ 22.064054] __might_resched from __rpc_execute+0x118/0x764
> [ 22.069596] __rpc_execute from rpc_execute+0xc0/0x1d0
> [ 22.074708] rpc_execute from rpc_run_task+0x148/0x190
> [ 22.079821] rpc_run_task from rpc_create_xprt+0x1a4/0x284
> [ 22.085281] rpc_create_xprt from rpc_create+0xf8/0x254
> [ 22.090483] rpc_create from nfs_create_rpc_client+0x150/0x17c
> [ 22.096286] nfs_create_rpc_client from nfs4_alloc_client+0x360/0x374
> [ 22.102700] nfs4_alloc_client from nfs_get_client+0x16c/0x3e8
> [ 22.108504] nfs_get_client from nfs4_set_client+0xfc/0x1a4
> [ 22.114050] nfs4_set_client from nfs4_create_server+0x11c/0x2fc
> [ 22.120029] nfs4_create_server from nfs4_try_get_tree+0x10/0x50
> [ 22.126009] nfs4_try_get_tree from vfs_get_tree+0x24/0xe4
> [ 22.131467] vfs_get_tree from path_mount+0x3e8/0xb04
> [ 22.136493] path_mount from sys_mount+0x20c/0x254
> [ 22.141258] sys_mount from ret_fast_syscall+0x0/0x1c
> [ 22.146284] Exception stack(0xf0cf9fa8 to 0xf0cf9ff0)
> [ 22.151322] 9fa0: 0047ebe0 00479c64 0047e960
> 0047e9b8 0047e9c8 00000000
> [ 22.159461] 9fc0: 0047ebe0 00479c64 b6f058c8 00000015 00466c08
> 00000010 00479c64 00466bfc
> [ 22.167606] 9fe0: 00479e70 befb69b0 0045a708 b6dca610
> [ 22.172820] BUG: scheduling while atomic: mount.nfs4/154/0x00000002
> [ 22.178871] INFO: lockdep is turned off.
> [ 22.182803] Modules linked in:
> [ 22.185798] CPU: 7 PID: 154 Comm: mount.nfs4 Tainted: G W
> 6.4.0-00001-gfbb103bb8df0 #8
> [ 22.195076] Hardware name: Samsung Exynos (Flattened Device Tree)
> [ 22.201139] unwind_backtrace from show_stack+0x10/0x14
> [ 22.206337] show_stack from dump_stack_lvl+0x58/0x70
> [ 22.211365] dump_stack_lvl from __schedule_bug+0x70/0x84
> [ 22.216736] __schedule_bug from __schedule+0x9c0/0xc80
> [ 22.221936] __schedule from schedule+0x58/0xf8
> [ 22.226439] schedule from schedule_timeout+0x134/0x200
> [ 22.231641] schedule_timeout from __wait_for_common+0xac/0x1f8
> [ 22.237533] __wait_for_common from
> wait_for_completion_killable+0x18/0x24
> [ 22.244379] wait_for_completion_killable from
> __kthread_create_on_node+0xe0/0x168
> [ 22.251923] __kthread_create_on_node from
> kthread_create_on_node+0x30/0x60
> [ 22.258851] kthread_create_on_node from svc_set_num_threads+0x1c8/0x420
> [ 22.265525] svc_set_num_threads from nfs_callback_up+0x150/0x3c0
> [ 22.271597] nfs_callback_up from nfs4_init_client+0x98/0x144
> [ 22.277306] nfs4_init_client from nfs4_set_client+0xfc/0x1a4
> [ 22.283026] nfs4_set_client from nfs4_create_server+0x11c/0x2fc
> [ 22.289005] nfs4_create_server from nfs4_try_get_tree+0x10/0x50
> [ 22.294985] nfs4_try_get_tree from vfs_get_tree+0x24/0xe4
> [ 22.300444] vfs_get_tree from path_mount+0x3e8/0xb04
> [ 22.305468] path_mount from sys_mount+0x20c/0x254
> [ 22.310249] sys_mount from ret_fast_syscall+0x0/0x1c
> [ 22.315261] Exception stack(0xf0cf9fa8 to 0xf0cf9ff0)
> [ 22.320300] 9fa0: 0047ebe0 00479c64 0047e960
> 0047e9b8 0047e9c8 00000000
> [ 22.328438] 9fc0: 0047ebe0 00479c64 b6f058c8 00000015 00466c08
> 00000010 00479c64 00466bfc
> [ 22.336582] 9fe0: 00479e70 befb69b0 0045a708 b6dca610
> :: running cleanup hook [udev]
> [ 26.235349] systemd[1]: System time before build time, advancing clock.
> [ 26.435536] systemd[1]: systemd 253.4-1-arch running in system mode
> (+PAM +AUDIT -SELINUX -APPARMOR -IMA +SMACK +SECCOMP +GCRYPT +GNUTLS
> +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD
> +LIBCRYPTSETUP +LIBFDISK +PCRE2 -PWQUALITY +P11KIT -QRENCODE +TPM2
> +BZIP2 +LZ4 +XZ +ZLIB +ZSTD +BPF_FRAMEWORK +XKBCOMMON +UTMP -SYSVINIT
> default-hierarchy=unified)
> [ 26.466749] systemd[1]: Detected architecture arm.
>
>
>
> Best regards,
> Krzysztof
>


Attachments:
v4-0003-NFSv4.2-Rework-scratch-handling-for-READ_PLUS-aga.patch (4.51 kB)
v4-0002-NFSv4.2-Fix-READ_PLUS-size-calculations.patch (1.98 kB)
v4-0001-NFSv4.2-Fix-READ_PLUS-smatch-warnings.patch (1.46 kB)
v4-0004-SUNRPC-kmap-the-xdr-pages-during-decode.patch (3.54 kB)
Download all attachments