LinuxLists.cc - Is this nfsd kernel oops known?

[permalink] [raw]

Subject: Re: Is this nfsd kernel oops known?

> On Aug 30, 2022, at 2:22 PM, Olga Kornievskaia <[email protected]> wrote:
>
> On Tue, Aug 30, 2022 at 1:49 PM Jeff Layton <[email protected]> wrote:
>>
>> On Tue, 2022-08-30 at 13:14 -0400, Olga Kornievskaia wrote:
>>> Hi folks,
>>>
>>> Is this a known nfsd kernel oops in 6.0-rc1. Was running xfstests on
>>> pre-rhel-9.1 client against 6.0-rc1 server when it panic-ed.
>>>
>>> [ 5554.769159] BUG: KASAN: null-ptr-deref in kernel_sendpage+0x60/0x220
>>> [ 5554.770526] Read of size 8 at addr 0000000000000008 by task nfsd/2590
>>> [ 5554.771899]
>>
>> No, I haven't seen this one. I'm guessing the page pointer passed to
>> kernel_sendpage was probably NULL, so this may be a case where something
>> walked off the end of the rq_pages array?
>>
>> Beyond that I can't tell much from just this stack trace. It might be
>> nice to see what line of code kernel_sendpage+0x60 refers to on your
>> kernel.
>
> Ok I reproduced it again.
>
> This time I updated to Chuck's 'for-next' branch at commit
> deb33fa8542eaf554e78a725cb8b922ac06978a4 (hopefully that means I'm
> 'current').
>
> client: 5.14.0-148 running generic/138 test.

If generic/138 does not require a scratch device, then I should have
run it many times already. I haven't seen a crash... maybe there's
something in addition that needs to happen to trigger it?

> On the network trace the last thing is a client sending a READ to the server.
>
> Embarrassingly I'm unable to get the line number of kernel_sendpage.
> So I think this must be in vmlinux but when I gdb vmlinux it doesn't
> have any debugging symbols even though my configuration turns on
> debugging. I can get you the line for the svc_tcp_sendmsg+0x206 but I
> know that doesn't help. What Kconfig I need to have debug symbols for
> kernel_sendpage (I have CONFIG_DEBUG_KERNEL but I have
> CONFIG_DEBUG_INFO_NONE=y so is that it should I choose something else
> here)?
>
> Also another piece, when I tested with a server running 5.19-rc6
> (which was based on Trond's tree for 6.0), the server didn't panic.
> Not sure if that helps.

Should be quick to bisect then.

>>> [ 5554.772249] CPU: 1 PID: 2590 Comm: nfsd Not tainted 6.0.0-rc1+ #84
>>> [ 5554.773575] Hardware name: VMware, Inc. VMware Virtual
>>> Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
>>> [ 5554.775952] Call Trace:
>>> [ 5554.776500] <TASK>
>>> [ 5554.776977] dump_stack_lvl+0x33/0x46
>>> [ 5554.777792] ? kernel_sendpage+0x60/0x220
>>> [ 5554.778672] print_report.cold.12+0x499/0x6b7
>>> [ 5554.779628] ? tcp_release_cb+0x46/0x200
>>> [ 5554.780577] ? kernel_sendpage+0x60/0x220
>>> [ 5554.781516] kasan_report+0xa3/0x120
>>> [ 5554.782361] ? inet_sendmsg+0xa0/0xa0
>>> [ 5554.783217] ? kernel_sendpage+0x60/0x220
>>> [ 5554.784191] kernel_sendpage+0x60/0x220
>>> [ 5554.785247] svc_tcp_sendmsg+0x206/0x2e0 [sunrpc]
>>> [ 5554.787188] ? svc_tcp_send_kvec.isra.20.constprop.29+0xa0/0xa0 [sunrpc]
>>> [ 5554.789364] ? refcount_dec_not_one+0xa0/0x120
>>> [ 5554.790402] ? refcount_warn_saturate+0x120/0x120
>>> [ 5554.791495] ? __rcu_read_unlock+0x4e/0x250
>>> [ 5554.792575] ? __mutex_lock_slowpath+0x10/0x10
>>> [ 5554.793571] ? tcp_release_cb+0x46/0x200
>>> [ 5554.794443] svc_tcp_sendto+0x14f/0x2e0 [sunrpc]
>>> [ 5554.796182] ? svc_addsock+0x370/0x370 [sunrpc]
>>> [ 5554.797924] ? svc_sock_secure_port+0x27/0x50 [sunrpc]
>>> [ 5554.799848] ? svc_recv+0xab0/0xfa0 [sunrpc]
>>> [ 5554.801434] svc_send+0x9c/0x260 [sunrpc]
>>> [ 5554.802963] nfsd+0x170/0x270 [nfsd]
>>> [ 5554.804140] ? nfsd_shutdown_threads+0xe0/0xe0 [nfsd]
>>> [ 5554.805631] kthread+0x160/0x190
>>> [ 5554.806354] ? kthread_complete_and_exit+0x20/0x20
>>> [ 5554.807401] ret_from_fork+0x1f/0x30
>>> [ 5554.808206] </TASK>
>>> [ 5554.808699] ==================================================================
>>> [ 5554.810486] Disabling lock debugging due to kernel taint
>>> [ 5554.811772] BUG: kernel NULL pointer dereference, address: 0000000000000008
>>> [ 5554.813236] #PF: supervisor read access in kernel mode
>>> [ 5554.814345] #PF: error_code(0x0000) - not-present page
>>> [ 5554.815462] PGD 0 P4D 0
>>> [ 5554.816032] Oops: 0000 [#1] PREEMPT SMP KASAN PTI
>>> [ 5554.817057] CPU: 1 PID: 2590 Comm: nfsd Tainted: G B
>>> 6.0.0-rc1+ #84
>>> [ 5554.818677] Hardware name: VMware, Inc. VMware Virtual
>>> Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
>>> [ 5554.821028] RIP: 0010:kernel_sendpage+0x60/0x220
>>> [ 5554.822138] Code: 24 a0 00 00 00 e8 a0 98 83 ff 49 83 bc 24 a0 00
>>> 00 00 00 0f 84 9f 00 00 00 48 8d 43 08 48 89 c7 48 89 44 24 08 e8 80
>>> 98 83 ff <4c> 8b 63 08 41 f6 c4 01 0f 85 ee 00 00 00 0f 1f 44 00 00 48
>>> 89 df
>>> [ 5554.826047] RSP: 0018:ffff888017ef7c38 EFLAGS: 00010296
>>> [ 5554.827192] RAX: 0000000000000001 RBX: 0000000000000000 RCX: ffffffffa3b173b6
>>> [ 5554.828715] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffffffffa6b16260
>>> [ 5554.830237] RBP: ffff8880057ac380 R08: fffffbfff4d62c4d R09: fffffbfff4d62c4d
>>> [ 5554.831757] R10: ffffffffa6b16267 R11: fffffbfff4d62c4c R12: ffffffffa545e6a0
>>> [ 5554.833341] R13: ffff8880057ac3a0 R14: 0000000000001000 R15: 0000000000000000
>>> [ 5554.834881] FS: 0000000000000000(0000) GS:ffff888057c80000(0000)
>>> knlGS:0000000000000000
>>> [ 5554.836590] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [ 5554.837819] CR2: 0000000000000008 CR3: 000000000677e004 CR4: 00000000001706e0
>>> [ 5554.839374] Call Trace:
>>> [ 5554.839919] <TASK>
>>> [ 5554.840400] svc_tcp_sendmsg+0x206/0x2e0 [sunrpc]
>>> [ 5554.842066] ? svc_tcp_send_kvec.isra.20.constprop.29+0xa0/0xa0 [sunrpc]
>>> [ 5554.844194] ? refcount_dec_not_one+0xa0/0x120
>>> [ 5554.845239] ? refcount_warn_saturate+0x120/0x120
>>> [ 5554.846275] ? __rcu_read_unlock+0x4e/0x250
>>> [ 5554.847199] ? __mutex_lock_slowpath+0x10/0x10
>>> [ 5554.848171] ? tcp_release_cb+0x46/0x200
>>> [ 5554.849039] svc_tcp_sendto+0x14f/0x2e0 [sunrpc]
>>> [ 5554.850667] ? svc_addsock+0x370/0x370 [sunrpc]
>>> [ 5554.852285] ? svc_sock_secure_port+0x27/0x50 [sunrpc]
>>> [ 5554.854420] ? svc_recv+0xab0/0xfa0 [sunrpc]
>>> [ 5554.856187] svc_send+0x9c/0x260 [sunrpc]
>>> [ 5554.857773] nfsd+0x170/0x270 [nfsd]
>>> [ 5554.859009] ? nfsd_shutdown_threads+0xe0/0xe0 [nfsd]
>>> [ 5554.860602] kthread+0x160/0x190
>>> [ 5554.861400] ? kthread_complete_and_exit+0x20/0x20
>>> [ 5554.862452] ret_from_fork+0x1f/0x30
>>> [ 5554.863265] </TASK>
>>> [ 5554.863756] Modules linked in: rdma_ucm ib_uverbs rpcrdma rdma_cm
>>> iw_cm ib_cm ib_core nfsd nfs_acl lockd grace ext4 mbcache jbd2 fuse
>>> xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT
>>> nf_reject_ipv4 nft_compat nf_tables nfnetlink tun bridge stp llc bnep
>>> vmw_vsock_vmci_transport vsock intel_rapl_msr intel_rapl_common
>>> snd_seq_midi snd_seq_midi_event crct10dif_pclmul crc32_pclmul
>>> vmw_balloon ghash_clmulni_intel joydev pcspkr snd_ens1371
>>> snd_ac97_codec ac97_bus snd_seq btusb btrtl btbcm btintel snd_pcm
>>> snd_timer snd_rawmidi snd_seq_device bluetooth rfkill snd ecdh_generic
>>> ecc soundcore vmw_vmci i2c_piix4 auth_rpcgss sunrpc ip_tables xfs
>>> libcrc32c sr_mod cdrom sg crc32c_intel nvme serio_raw nvme_core t10_pi
>>> crc64_rocksoft crc64 vmwgfx drm_ttm_helper ttm ahci drm_kms_helper
>>> ata_generic syscopyarea sysfillrect sysimgblt fb_sys_fops vmxnet3
>>> libahci ata_piix drm libata
>>> [ 5554.880681] CR2: 0000000000000008
>>> [ 5554.881539] ---[ end trace 0000000000000000 ]---
>>
>> --
>> Jeff Layton <[email protected]>

--
Chuck Lever

2022-08-30 18:37:01

[permalink] [raw]

Subject: Re: Is this nfsd kernel oops known?

On Tue, Aug 30, 2022 at 1:49 PM Jeff Layton <[email protected]> wrote:
>
> On Tue, 2022-08-30 at 13:14 -0400, Olga Kornievskaia wrote:
> > Hi folks,
> >
> > Is this a known nfsd kernel oops in 6.0-rc1. Was running xfstests on
> > pre-rhel-9.1 client against 6.0-rc1 server when it panic-ed.
> >
> > [ 5554.769159] BUG: KASAN: null-ptr-deref in kernel_sendpage+0x60/0x220
> > [ 5554.770526] Read of size 8 at addr 0000000000000008 by task nfsd/2590
> > [ 5554.771899]
>
> No, I haven't seen this one. I'm guessing the page pointer passed to
> kernel_sendpage was probably NULL, so this may be a case where something
> walked off the end of the rq_pages array?
>
> Beyond that I can't tell much from just this stack trace. It might be
> nice to see what line of code kernel_sendpage+0x60 refers to on your
> kernel.

Ok I reproduced it again.

This time I updated to Chuck's 'for-next' branch at commit
deb33fa8542eaf554e78a725cb8b922ac06978a4 (hopefully that means I'm
'current').

client: 5.14.0-148 running generic/138 test.

On the network trace the last thing is a client sending a READ to the server.

Embarrassingly I'm unable to get the line number of kernel_sendpage.
So I think this must be in vmlinux but when I gdb vmlinux it doesn't
have any debugging symbols even though my configuration turns on
debugging. I can get you the line for the svc_tcp_sendmsg+0x206 but I
know that doesn't help. What Kconfig I need to have debug symbols for
kernel_sendpage (I have CONFIG_DEBUG_KERNEL but I have
CONFIG_DEBUG_INFO_NONE=y so is that it should I choose something else
here)?

Also another piece, when I tested with a server running 5.19-rc6
(which was based on Trond's tree for 6.0), the server didn't panic.
Not sure if that helps.

> > [ 5554.772249] CPU: 1 PID: 2590 Comm: nfsd Not tainted 6.0.0-rc1+ #84
> > [ 5554.773575] Hardware name: VMware, Inc. VMware Virtual
> > Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
> > [ 5554.775952] Call Trace:
> > [ 5554.776500] <TASK>
> > [ 5554.776977] dump_stack_lvl+0x33/0x46
> > [ 5554.777792] ? kernel_sendpage+0x60/0x220
> > [ 5554.778672] print_report.cold.12+0x499/0x6b7
> > [ 5554.779628] ? tcp_release_cb+0x46/0x200
> > [ 5554.780577] ? kernel_sendpage+0x60/0x220
> > [ 5554.781516] kasan_report+0xa3/0x120
> > [ 5554.782361] ? inet_sendmsg+0xa0/0xa0
> > [ 5554.783217] ? kernel_sendpage+0x60/0x220
> > [ 5554.784191] kernel_sendpage+0x60/0x220
> > [ 5554.785247] svc_tcp_sendmsg+0x206/0x2e0 [sunrpc]
> > [ 5554.787188] ? svc_tcp_send_kvec.isra.20.constprop.29+0xa0/0xa0 [sunrpc]
> > [ 5554.789364] ? refcount_dec_not_one+0xa0/0x120
> > [ 5554.790402] ? refcount_warn_saturate+0x120/0x120
> > [ 5554.791495] ? __rcu_read_unlock+0x4e/0x250
> > [ 5554.792575] ? __mutex_lock_slowpath+0x10/0x10
> > [ 5554.793571] ? tcp_release_cb+0x46/0x200
> > [ 5554.794443] svc_tcp_sendto+0x14f/0x2e0 [sunrpc]
> > [ 5554.796182] ? svc_addsock+0x370/0x370 [sunrpc]
> > [ 5554.797924] ? svc_sock_secure_port+0x27/0x50 [sunrpc]
> > [ 5554.799848] ? svc_recv+0xab0/0xfa0 [sunrpc]
> > [ 5554.801434] svc_send+0x9c/0x260 [sunrpc]
> > [ 5554.802963] nfsd+0x170/0x270 [nfsd]
> > [ 5554.804140] ? nfsd_shutdown_threads+0xe0/0xe0 [nfsd]
> > [ 5554.805631] kthread+0x160/0x190
> > [ 5554.806354] ? kthread_complete_and_exit+0x20/0x20
> > [ 5554.807401] ret_from_fork+0x1f/0x30
> > [ 5554.808206] </TASK>
> > [ 5554.808699] ==================================================================
> > [ 5554.810486] Disabling lock debugging due to kernel taint
> > [ 5554.811772] BUG: kernel NULL pointer dereference, address: 0000000000000008
> > [ 5554.813236] #PF: supervisor read access in kernel mode
> > [ 5554.814345] #PF: error_code(0x0000) - not-present page
> > [ 5554.815462] PGD 0 P4D 0
> > [ 5554.816032] Oops: 0000 [#1] PREEMPT SMP KASAN PTI
> > [ 5554.817057] CPU: 1 PID: 2590 Comm: nfsd Tainted: G B
> > 6.0.0-rc1+ #84
> > [ 5554.818677] Hardware name: VMware, Inc. VMware Virtual
> > Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
> > [ 5554.821028] RIP: 0010:kernel_sendpage+0x60/0x220
> > [ 5554.822138] Code: 24 a0 00 00 00 e8 a0 98 83 ff 49 83 bc 24 a0 00
> > 00 00 00 0f 84 9f 00 00 00 48 8d 43 08 48 89 c7 48 89 44 24 08 e8 80
> > 98 83 ff <4c> 8b 63 08 41 f6 c4 01 0f 85 ee 00 00 00 0f 1f 44 00 00 48
> > 89 df
> > [ 5554.826047] RSP: 0018:ffff888017ef7c38 EFLAGS: 00010296
> > [ 5554.827192] RAX: 0000000000000001 RBX: 0000000000000000 RCX: ffffffffa3b173b6
> > [ 5554.828715] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffffffffa6b16260
> > [ 5554.830237] RBP: ffff8880057ac380 R08: fffffbfff4d62c4d R09: fffffbfff4d62c4d
> > [ 5554.831757] R10: ffffffffa6b16267 R11: fffffbfff4d62c4c R12: ffffffffa545e6a0
> > [ 5554.833341] R13: ffff8880057ac3a0 R14: 0000000000001000 R15: 0000000000000000
> > [ 5554.834881] FS: 0000000000000000(0000) GS:ffff888057c80000(0000)
> > knlGS:0000000000000000
> > [ 5554.836590] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 5554.837819] CR2: 0000000000000008 CR3: 000000000677e004 CR4: 00000000001706e0
> > [ 5554.839374] Call Trace:
> > [ 5554.839919] <TASK>
> > [ 5554.840400] svc_tcp_sendmsg+0x206/0x2e0 [sunrpc]
> > [ 5554.842066] ? svc_tcp_send_kvec.isra.20.constprop.29+0xa0/0xa0 [sunrpc]
> > [ 5554.844194] ? refcount_dec_not_one+0xa0/0x120
> > [ 5554.845239] ? refcount_warn_saturate+0x120/0x120
> > [ 5554.846275] ? __rcu_read_unlock+0x4e/0x250
> > [ 5554.847199] ? __mutex_lock_slowpath+0x10/0x10
> > [ 5554.848171] ? tcp_release_cb+0x46/0x200
> > [ 5554.849039] svc_tcp_sendto+0x14f/0x2e0 [sunrpc]
> > [ 5554.850667] ? svc_addsock+0x370/0x370 [sunrpc]
> > [ 5554.852285] ? svc_sock_secure_port+0x27/0x50 [sunrpc]
> > [ 5554.854420] ? svc_recv+0xab0/0xfa0 [sunrpc]
> > [ 5554.856187] svc_send+0x9c/0x260 [sunrpc]
> > [ 5554.857773] nfsd+0x170/0x270 [nfsd]
> > [ 5554.859009] ? nfsd_shutdown_threads+0xe0/0xe0 [nfsd]
> > [ 5554.860602] kthread+0x160/0x190
> > [ 5554.861400] ? kthread_complete_and_exit+0x20/0x20
> > [ 5554.862452] ret_from_fork+0x1f/0x30
> > [ 5554.863265] </TASK>
> > [ 5554.863756] Modules linked in: rdma_ucm ib_uverbs rpcrdma rdma_cm
> > iw_cm ib_cm ib_core nfsd nfs_acl lockd grace ext4 mbcache jbd2 fuse
> > xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT
> > nf_reject_ipv4 nft_compat nf_tables nfnetlink tun bridge stp llc bnep
> > vmw_vsock_vmci_transport vsock intel_rapl_msr intel_rapl_common
> > snd_seq_midi snd_seq_midi_event crct10dif_pclmul crc32_pclmul
> > vmw_balloon ghash_clmulni_intel joydev pcspkr snd_ens1371
> > snd_ac97_codec ac97_bus snd_seq btusb btrtl btbcm btintel snd_pcm
> > snd_timer snd_rawmidi snd_seq_device bluetooth rfkill snd ecdh_generic
> > ecc soundcore vmw_vmci i2c_piix4 auth_rpcgss sunrpc ip_tables xfs
> > libcrc32c sr_mod cdrom sg crc32c_intel nvme serio_raw nvme_core t10_pi
> > crc64_rocksoft crc64 vmwgfx drm_ttm_helper ttm ahci drm_kms_helper
> > ata_generic syscopyarea sysfillrect sysimgblt fb_sys_fops vmxnet3
> > libahci ata_piix drm libata
> > [ 5554.880681] CR2: 0000000000000008
> > [ 5554.881539] ---[ end trace 0000000000000000 ]---
>
> --
> Jeff Layton <[email protected]>

2022-08-30 18:40:44

[permalink] [raw]

Subject: Re: Is this nfsd kernel oops known?

On Tue, Aug 30, 2022 at 2:26 PM Chuck Lever III <[email protected]> wrote:
>
>
>
> > On Aug 30, 2022, at 2:22 PM, Olga Kornievskaia <[email protected]> wrote:
> >
> > On Tue, Aug 30, 2022 at 1:49 PM Jeff Layton <[email protected]> wrote:
> >>
> >> On Tue, 2022-08-30 at 13:14 -0400, Olga Kornievskaia wrote:
> >>> Hi folks,
> >>>
> >>> Is this a known nfsd kernel oops in 6.0-rc1. Was running xfstests on
> >>> pre-rhel-9.1 client against 6.0-rc1 server when it panic-ed.
> >>>
> >>> [ 5554.769159] BUG: KASAN: null-ptr-deref in kernel_sendpage+0x60/0x220
> >>> [ 5554.770526] Read of size 8 at addr 0000000000000008 by task nfsd/2590
> >>> [ 5554.771899]
> >>
> >> No, I haven't seen this one. I'm guessing the page pointer passed to
> >> kernel_sendpage was probably NULL, so this may be a case where something
> >> walked off the end of the rq_pages array?
> >>
> >> Beyond that I can't tell much from just this stack trace. It might be
> >> nice to see what line of code kernel_sendpage+0x60 refers to on your
> >> kernel.
> >
> > Ok I reproduced it again.
> >
> > This time I updated to Chuck's 'for-next' branch at commit
> > deb33fa8542eaf554e78a725cb8b922ac06978a4 (hopefully that means I'm
> > 'current').
> >
> > client: 5.14.0-148 running generic/138 test.
>
> If generic/138 does not require a scratch device, then I should have
> run it many times already. I haven't seen a crash... maybe there's
> something in addition that needs to happen to trigger it?

Looking at the test generic/138 is one of the tests that's going
against /scratch so I'm assuming it's using a scratch device.

> > On the network trace the last thing is a client sending a READ to the server.
> >
> > Embarrassingly I'm unable to get the line number of kernel_sendpage.
> > So I think this must be in vmlinux but when I gdb vmlinux it doesn't
> > have any debugging symbols even though my configuration turns on
> > debugging. I can get you the line for the svc_tcp_sendmsg+0x206 but I
> > know that doesn't help. What Kconfig I need to have debug symbols for
> > kernel_sendpage (I have CONFIG_DEBUG_KERNEL but I have
> > CONFIG_DEBUG_INFO_NONE=y so is that it should I choose something else
> > here)?
> >
> > Also another piece, when I tested with a server running 5.19-rc6
> > (which was based on Trond's tree for 6.0), the server didn't panic.
> > Not sure if that helps.
>
> Should be quick to bisect then.

Um.. I disagree but let's agree to disagree.

>
>
> >>> [ 5554.772249] CPU: 1 PID: 2590 Comm: nfsd Not tainted 6.0.0-rc1+ #84
> >>> [ 5554.773575] Hardware name: VMware, Inc. VMware Virtual
> >>> Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
> >>> [ 5554.775952] Call Trace:
> >>> [ 5554.776500] <TASK>
> >>> [ 5554.776977] dump_stack_lvl+0x33/0x46
> >>> [ 5554.777792] ? kernel_sendpage+0x60/0x220
> >>> [ 5554.778672] print_report.cold.12+0x499/0x6b7
> >>> [ 5554.779628] ? tcp_release_cb+0x46/0x200
> >>> [ 5554.780577] ? kernel_sendpage+0x60/0x220
> >>> [ 5554.781516] kasan_report+0xa3/0x120
> >>> [ 5554.782361] ? inet_sendmsg+0xa0/0xa0
> >>> [ 5554.783217] ? kernel_sendpage+0x60/0x220
> >>> [ 5554.784191] kernel_sendpage+0x60/0x220
> >>> [ 5554.785247] svc_tcp_sendmsg+0x206/0x2e0 [sunrpc]
> >>> [ 5554.787188] ? svc_tcp_send_kvec.isra.20.constprop.29+0xa0/0xa0 [sunrpc]
> >>> [ 5554.789364] ? refcount_dec_not_one+0xa0/0x120
> >>> [ 5554.790402] ? refcount_warn_saturate+0x120/0x120
> >>> [ 5554.791495] ? __rcu_read_unlock+0x4e/0x250
> >>> [ 5554.792575] ? __mutex_lock_slowpath+0x10/0x10
> >>> [ 5554.793571] ? tcp_release_cb+0x46/0x200
> >>> [ 5554.794443] svc_tcp_sendto+0x14f/0x2e0 [sunrpc]
> >>> [ 5554.796182] ? svc_addsock+0x370/0x370 [sunrpc]
> >>> [ 5554.797924] ? svc_sock_secure_port+0x27/0x50 [sunrpc]
> >>> [ 5554.799848] ? svc_recv+0xab0/0xfa0 [sunrpc]
> >>> [ 5554.801434] svc_send+0x9c/0x260 [sunrpc]
> >>> [ 5554.802963] nfsd+0x170/0x270 [nfsd]
> >>> [ 5554.804140] ? nfsd_shutdown_threads+0xe0/0xe0 [nfsd]
> >>> [ 5554.805631] kthread+0x160/0x190
> >>> [ 5554.806354] ? kthread_complete_and_exit+0x20/0x20
> >>> [ 5554.807401] ret_from_fork+0x1f/0x30
> >>> [ 5554.808206] </TASK>
> >>> [ 5554.808699] ==================================================================
> >>> [ 5554.810486] Disabling lock debugging due to kernel taint
> >>> [ 5554.811772] BUG: kernel NULL pointer dereference, address: 0000000000000008
> >>> [ 5554.813236] #PF: supervisor read access in kernel mode
> >>> [ 5554.814345] #PF: error_code(0x0000) - not-present page
> >>> [ 5554.815462] PGD 0 P4D 0
> >>> [ 5554.816032] Oops: 0000 [#1] PREEMPT SMP KASAN PTI
> >>> [ 5554.817057] CPU: 1 PID: 2590 Comm: nfsd Tainted: G B
> >>> 6.0.0-rc1+ #84
> >>> [ 5554.818677] Hardware name: VMware, Inc. VMware Virtual
> >>> Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
> >>> [ 5554.821028] RIP: 0010:kernel_sendpage+0x60/0x220
> >>> [ 5554.822138] Code: 24 a0 00 00 00 e8 a0 98 83 ff 49 83 bc 24 a0 00
> >>> 00 00 00 0f 84 9f 00 00 00 48 8d 43 08 48 89 c7 48 89 44 24 08 e8 80
> >>> 98 83 ff <4c> 8b 63 08 41 f6 c4 01 0f 85 ee 00 00 00 0f 1f 44 00 00 48
> >>> 89 df
> >>> [ 5554.826047] RSP: 0018:ffff888017ef7c38 EFLAGS: 00010296
> >>> [ 5554.827192] RAX: 0000000000000001 RBX: 0000000000000000 RCX: ffffffffa3b173b6
> >>> [ 5554.828715] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffffffffa6b16260
> >>> [ 5554.830237] RBP: ffff8880057ac380 R08: fffffbfff4d62c4d R09: fffffbfff4d62c4d
> >>> [ 5554.831757] R10: ffffffffa6b16267 R11: fffffbfff4d62c4c R12: ffffffffa545e6a0
> >>> [ 5554.833341] R13: ffff8880057ac3a0 R14: 0000000000001000 R15: 0000000000000000
> >>> [ 5554.834881] FS: 0000000000000000(0000) GS:ffff888057c80000(0000)
> >>> knlGS:0000000000000000
> >>> [ 5554.836590] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>> [ 5554.837819] CR2: 0000000000000008 CR3: 000000000677e004 CR4: 00000000001706e0
> >>> [ 5554.839374] Call Trace:
> >>> [ 5554.839919] <TASK>
> >>> [ 5554.840400] svc_tcp_sendmsg+0x206/0x2e0 [sunrpc]
> >>> [ 5554.842066] ? svc_tcp_send_kvec.isra.20.constprop.29+0xa0/0xa0 [sunrpc]
> >>> [ 5554.844194] ? refcount_dec_not_one+0xa0/0x120
> >>> [ 5554.845239] ? refcount_warn_saturate+0x120/0x120
> >>> [ 5554.846275] ? __rcu_read_unlock+0x4e/0x250
> >>> [ 5554.847199] ? __mutex_lock_slowpath+0x10/0x10
> >>> [ 5554.848171] ? tcp_release_cb+0x46/0x200
> >>> [ 5554.849039] svc_tcp_sendto+0x14f/0x2e0 [sunrpc]
> >>> [ 5554.850667] ? svc_addsock+0x370/0x370 [sunrpc]
> >>> [ 5554.852285] ? svc_sock_secure_port+0x27/0x50 [sunrpc]
> >>> [ 5554.854420] ? svc_recv+0xab0/0xfa0 [sunrpc]
> >>> [ 5554.856187] svc_send+0x9c/0x260 [sunrpc]
> >>> [ 5554.857773] nfsd+0x170/0x270 [nfsd]
> >>> [ 5554.859009] ? nfsd_shutdown_threads+0xe0/0xe0 [nfsd]
> >>> [ 5554.860602] kthread+0x160/0x190
> >>> [ 5554.861400] ? kthread_complete_and_exit+0x20/0x20
> >>> [ 5554.862452] ret_from_fork+0x1f/0x30
> >>> [ 5554.863265] </TASK>
> >>> [ 5554.863756] Modules linked in: rdma_ucm ib_uverbs rpcrdma rdma_cm
> >>> iw_cm ib_cm ib_core nfsd nfs_acl lockd grace ext4 mbcache jbd2 fuse
> >>> xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT
> >>> nf_reject_ipv4 nft_compat nf_tables nfnetlink tun bridge stp llc bnep
> >>> vmw_vsock_vmci_transport vsock intel_rapl_msr intel_rapl_common
> >>> snd_seq_midi snd_seq_midi_event crct10dif_pclmul crc32_pclmul
> >>> vmw_balloon ghash_clmulni_intel joydev pcspkr snd_ens1371
> >>> snd_ac97_codec ac97_bus snd_seq btusb btrtl btbcm btintel snd_pcm
> >>> snd_timer snd_rawmidi snd_seq_device bluetooth rfkill snd ecdh_generic
> >>> ecc soundcore vmw_vmci i2c_piix4 auth_rpcgss sunrpc ip_tables xfs
> >>> libcrc32c sr_mod cdrom sg crc32c_intel nvme serio_raw nvme_core t10_pi
> >>> crc64_rocksoft crc64 vmwgfx drm_ttm_helper ttm ahci drm_kms_helper
> >>> ata_generic syscopyarea sysfillrect sysimgblt fb_sys_fops vmxnet3
> >>> libahci ata_piix drm libata
> >>> [ 5554.880681] CR2: 0000000000000008
> >>> [ 5554.881539] ---[ end trace 0000000000000000 ]---
> >>
> >> --
> >> Jeff Layton <[email protected]>
>
> --
> Chuck Lever
>
>
>

2022-08-30 18:43:15

[permalink] [raw]

Subject: Re: Is this nfsd kernel oops known?

> On Aug 30, 2022, at 2:33 PM, Olga Kornievskaia <[email protected]> wrote:
>
> On Tue, Aug 30, 2022 at 2:26 PM Chuck Lever III <[email protected]> wrote:
>>
>>
>>> On Aug 30, 2022, at 2:22 PM, Olga Kornievskaia <[email protected]> wrote:
>>>
>>> On Tue, Aug 30, 2022 at 1:49 PM Jeff Layton <[email protected]> wrote:
>>>>
>>>> On Tue, 2022-08-30 at 13:14 -0400, Olga Kornievskaia wrote:
>>>>> Hi folks,
>>>>>
>>>>> Is this a known nfsd kernel oops in 6.0-rc1. Was running xfstests on
>>>>> pre-rhel-9.1 client against 6.0-rc1 server when it panic-ed.
>>>>>
>>>>> [ 5554.769159] BUG: KASAN: null-ptr-deref in kernel_sendpage+0x60/0x220
>>>>> [ 5554.770526] Read of size 8 at addr 0000000000000008 by task nfsd/2590
>>>>> [ 5554.771899]
>>>>
>>>> No, I haven't seen this one. I'm guessing the page pointer passed to
>>>> kernel_sendpage was probably NULL, so this may be a case where something
>>>> walked off the end of the rq_pages array?
>>>>
>>>> Beyond that I can't tell much from just this stack trace. It might be
>>>> nice to see what line of code kernel_sendpage+0x60 refers to on your
>>>> kernel.
>>>
>>> Ok I reproduced it again.
>>>
>>> This time I updated to Chuck's 'for-next' branch at commit
>>> deb33fa8542eaf554e78a725cb8b922ac06978a4 (hopefully that means I'm
>>> 'current').
>>>
>>> client: 5.14.0-148 running generic/138 test.
>>
>> If generic/138 does not require a scratch device, then I should have
>> run it many times already. I haven't seen a crash... maybe there's
>> something in addition that needs to happen to trigger it?
>
> Looking at the test generic/138 is one of the tests that's going
> against /scratch so I'm assuming it's using a scratch device.

Can you tell if it's the server under test that crashes, or is
it the server hosting the scratch device?

>>> On the network trace the last thing is a client sending a READ to the server.
>>>
>>> Embarrassingly I'm unable to get the line number of kernel_sendpage.
>>> So I think this must be in vmlinux but when I gdb vmlinux it doesn't
>>> have any debugging symbols even though my configuration turns on
>>> debugging. I can get you the line for the svc_tcp_sendmsg+0x206 but I
>>> know that doesn't help. What Kconfig I need to have debug symbols for
>>> kernel_sendpage (I have CONFIG_DEBUG_KERNEL but I have
>>> CONFIG_DEBUG_INFO_NONE=y so is that it should I choose something else
>>> here)?
>>>
>>> Also another piece, when I tested with a server running 5.19-rc6
>>> (which was based on Trond's tree for 6.0), the server didn't panic.
>>> Not sure if that helps.
>>
>> Should be quick to bisect then.
>
> Um.. I disagree but let's agree to disagree.

OK, let me rephrase... Would you please bisect this? I'm at a loss
to think of a commit in 6.0-rc that would change behavior with TCP
transports.

>>>>> [ 5554.772249] CPU: 1 PID: 2590 Comm: nfsd Not tainted 6.0.0-rc1+ #84
>>>>> [ 5554.773575] Hardware name: VMware, Inc. VMware Virtual
>>>>> Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
>>>>> [ 5554.775952] Call Trace:
>>>>> [ 5554.776500] <TASK>
>>>>> [ 5554.776977] dump_stack_lvl+0x33/0x46
>>>>> [ 5554.777792] ? kernel_sendpage+0x60/0x220
>>>>> [ 5554.778672] print_report.cold.12+0x499/0x6b7
>>>>> [ 5554.779628] ? tcp_release_cb+0x46/0x200
>>>>> [ 5554.780577] ? kernel_sendpage+0x60/0x220
>>>>> [ 5554.781516] kasan_report+0xa3/0x120
>>>>> [ 5554.782361] ? inet_sendmsg+0xa0/0xa0
>>>>> [ 5554.783217] ? kernel_sendpage+0x60/0x220
>>>>> [ 5554.784191] kernel_sendpage+0x60/0x220
>>>>> [ 5554.785247] svc_tcp_sendmsg+0x206/0x2e0 [sunrpc]
>>>>> [ 5554.787188] ? svc_tcp_send_kvec.isra.20.constprop.29+0xa0/0xa0 [sunrpc]
>>>>> [ 5554.789364] ? refcount_dec_not_one+0xa0/0x120
>>>>> [ 5554.790402] ? refcount_warn_saturate+0x120/0x120
>>>>> [ 5554.791495] ? __rcu_read_unlock+0x4e/0x250
>>>>> [ 5554.792575] ? __mutex_lock_slowpath+0x10/0x10
>>>>> [ 5554.793571] ? tcp_release_cb+0x46/0x200
>>>>> [ 5554.794443] svc_tcp_sendto+0x14f/0x2e0 [sunrpc]
>>>>> [ 5554.796182] ? svc_addsock+0x370/0x370 [sunrpc]
>>>>> [ 5554.797924] ? svc_sock_secure_port+0x27/0x50 [sunrpc]
>>>>> [ 5554.799848] ? svc_recv+0xab0/0xfa0 [sunrpc]
>>>>> [ 5554.801434] svc_send+0x9c/0x260 [sunrpc]
>>>>> [ 5554.802963] nfsd+0x170/0x270 [nfsd]
>>>>> [ 5554.804140] ? nfsd_shutdown_threads+0xe0/0xe0 [nfsd]
>>>>> [ 5554.805631] kthread+0x160/0x190
>>>>> [ 5554.806354] ? kthread_complete_and_exit+0x20/0x20
>>>>> [ 5554.807401] ret_from_fork+0x1f/0x30
>>>>> [ 5554.808206] </TASK>
>>>>> [ 5554.808699] ==================================================================
>>>>> [ 5554.810486] Disabling lock debugging due to kernel taint
>>>>> [ 5554.811772] BUG: kernel NULL pointer dereference, address: 0000000000000008
>>>>> [ 5554.813236] #PF: supervisor read access in kernel mode
>>>>> [ 5554.814345] #PF: error_code(0x0000) - not-present page
>>>>> [ 5554.815462] PGD 0 P4D 0
>>>>> [ 5554.816032] Oops: 0000 [#1] PREEMPT SMP KASAN PTI
>>>>> [ 5554.817057] CPU: 1 PID: 2590 Comm: nfsd Tainted: G B
>>>>> 6.0.0-rc1+ #84
>>>>> [ 5554.818677] Hardware name: VMware, Inc. VMware Virtual
>>>>> Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
>>>>> [ 5554.821028] RIP: 0010:kernel_sendpage+0x60/0x220
>>>>> [ 5554.822138] Code: 24 a0 00 00 00 e8 a0 98 83 ff 49 83 bc 24 a0 00
>>>>> 00 00 00 0f 84 9f 00 00 00 48 8d 43 08 48 89 c7 48 89 44 24 08 e8 80
>>>>> 98 83 ff <4c> 8b 63 08 41 f6 c4 01 0f 85 ee 00 00 00 0f 1f 44 00 00 48
>>>>> 89 df
>>>>> [ 5554.826047] RSP: 0018:ffff888017ef7c38 EFLAGS: 00010296
>>>>> [ 5554.827192] RAX: 0000000000000001 RBX: 0000000000000000 RCX: ffffffffa3b173b6
>>>>> [ 5554.828715] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffffffffa6b16260
>>>>> [ 5554.830237] RBP: ffff8880057ac380 R08: fffffbfff4d62c4d R09: fffffbfff4d62c4d
>>>>> [ 5554.831757] R10: ffffffffa6b16267 R11: fffffbfff4d62c4c R12: ffffffffa545e6a0
>>>>> [ 5554.833341] R13: ffff8880057ac3a0 R14: 0000000000001000 R15: 0000000000000000
>>>>> [ 5554.834881] FS: 0000000000000000(0000) GS:ffff888057c80000(0000)
>>>>> knlGS:0000000000000000
>>>>> [ 5554.836590] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>> [ 5554.837819] CR2: 0000000000000008 CR3: 000000000677e004 CR4: 00000000001706e0
>>>>> [ 5554.839374] Call Trace:
>>>>> [ 5554.839919] <TASK>
>>>>> [ 5554.840400] svc_tcp_sendmsg+0x206/0x2e0 [sunrpc]
>>>>> [ 5554.842066] ? svc_tcp_send_kvec.isra.20.constprop.29+0xa0/0xa0 [sunrpc]
>>>>> [ 5554.844194] ? refcount_dec_not_one+0xa0/0x120
>>>>> [ 5554.845239] ? refcount_warn_saturate+0x120/0x120
>>>>> [ 5554.846275] ? __rcu_read_unlock+0x4e/0x250
>>>>> [ 5554.847199] ? __mutex_lock_slowpath+0x10/0x10
>>>>> [ 5554.848171] ? tcp_release_cb+0x46/0x200
>>>>> [ 5554.849039] svc_tcp_sendto+0x14f/0x2e0 [sunrpc]
>>>>> [ 5554.850667] ? svc_addsock+0x370/0x370 [sunrpc]
>>>>> [ 5554.852285] ? svc_sock_secure_port+0x27/0x50 [sunrpc]
>>>>> [ 5554.854420] ? svc_recv+0xab0/0xfa0 [sunrpc]
>>>>> [ 5554.856187] svc_send+0x9c/0x260 [sunrpc]
>>>>> [ 5554.857773] nfsd+0x170/0x270 [nfsd]
>>>>> [ 5554.859009] ? nfsd_shutdown_threads+0xe0/0xe0 [nfsd]
>>>>> [ 5554.860602] kthread+0x160/0x190
>>>>> [ 5554.861400] ? kthread_complete_and_exit+0x20/0x20
>>>>> [ 5554.862452] ret_from_fork+0x1f/0x30
>>>>> [ 5554.863265] </TASK>
>>>>> [ 5554.863756] Modules linked in: rdma_ucm ib_uverbs rpcrdma rdma_cm
>>>>> iw_cm ib_cm ib_core nfsd nfs_acl lockd grace ext4 mbcache jbd2 fuse
>>>>> xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT
>>>>> nf_reject_ipv4 nft_compat nf_tables nfnetlink tun bridge stp llc bnep
>>>>> vmw_vsock_vmci_transport vsock intel_rapl_msr intel_rapl_common
>>>>> snd_seq_midi snd_seq_midi_event crct10dif_pclmul crc32_pclmul
>>>>> vmw_balloon ghash_clmulni_intel joydev pcspkr snd_ens1371
>>>>> snd_ac97_codec ac97_bus snd_seq btusb btrtl btbcm btintel snd_pcm
>>>>> snd_timer snd_rawmidi snd_seq_device bluetooth rfkill snd ecdh_generic
>>>>> ecc soundcore vmw_vmci i2c_piix4 auth_rpcgss sunrpc ip_tables xfs
>>>>> libcrc32c sr_mod cdrom sg crc32c_intel nvme serio_raw nvme_core t10_pi
>>>>> crc64_rocksoft crc64 vmwgfx drm_ttm_helper ttm ahci drm_kms_helper
>>>>> ata_generic syscopyarea sysfillrect sysimgblt fb_sys_fops vmxnet3
>>>>> libahci ata_piix drm libata
>>>>> [ 5554.880681] CR2: 0000000000000008
>>>>> [ 5554.881539] ---[ end trace 0000000000000000 ]---
>>>>
>>>> --
>>>> Jeff Layton <[email protected]>
>>
>> --
>> Chuck Lever

--
Chuck Lever

2022-08-30 19:05:09

[permalink] [raw]

Subject: Re: Is this nfsd kernel oops known?

On Tue, Aug 30, 2022 at 2:41 PM Chuck Lever III <[email protected]> wrote:
>
>
> > On Aug 30, 2022, at 2:33 PM, Olga Kornievskaia <[email protected]> wrote:
> >
> > On Tue, Aug 30, 2022 at 2:26 PM Chuck Lever III <[email protected]> wrote:
> >>
> >>
> >>> On Aug 30, 2022, at 2:22 PM, Olga Kornievskaia <[email protected]> wrote:
> >>>
> >>> On Tue, Aug 30, 2022 at 1:49 PM Jeff Layton <[email protected]> wrote:
> >>>>
> >>>> On Tue, 2022-08-30 at 13:14 -0400, Olga Kornievskaia wrote:
> >>>>> Hi folks,
> >>>>>
> >>>>> Is this a known nfsd kernel oops in 6.0-rc1. Was running xfstests on
> >>>>> pre-rhel-9.1 client against 6.0-rc1 server when it panic-ed.
> >>>>>
> >>>>> [ 5554.769159] BUG: KASAN: null-ptr-deref in kernel_sendpage+0x60/0x220
> >>>>> [ 5554.770526] Read of size 8 at addr 0000000000000008 by task nfsd/2590
> >>>>> [ 5554.771899]
> >>>>
> >>>> No, I haven't seen this one. I'm guessing the page pointer passed to
> >>>> kernel_sendpage was probably NULL, so this may be a case where something
> >>>> walked off the end of the rq_pages array?
> >>>>
> >>>> Beyond that I can't tell much from just this stack trace. It might be
> >>>> nice to see what line of code kernel_sendpage+0x60 refers to on your
> >>>> kernel.
> >>>
> >>> Ok I reproduced it again.
> >>>
> >>> This time I updated to Chuck's 'for-next' branch at commit
> >>> deb33fa8542eaf554e78a725cb8b922ac06978a4 (hopefully that means I'm
> >>> 'current').
> >>>
> >>> client: 5.14.0-148 running generic/138 test.
> >>
> >> If generic/138 does not require a scratch device, then I should have
> >> run it many times already. I haven't seen a crash... maybe there's
> >> something in addition that needs to happen to trigger it?
> >
> > Looking at the test generic/138 is one of the tests that's going
> > against /scratch so I'm assuming it's using a scratch device.
>
> Can you tell if it's the server under test that crashes, or is
> it the server hosting the scratch device?

Sorry this is confusing. I just have 1 NFS client and 1 NFS server. By
scratch device I meant on the client I have it mount a different
separate file system on the server. NFS server is crashing.

> >>> On the network trace the last thing is a client sending a READ to the server.
> >>>
> >>> Embarrassingly I'm unable to get the line number of kernel_sendpage.
> >>> So I think this must be in vmlinux but when I gdb vmlinux it doesn't
> >>> have any debugging symbols even though my configuration turns on
> >>> debugging. I can get you the line for the svc_tcp_sendmsg+0x206 but I
> >>> know that doesn't help. What Kconfig I need to have debug symbols for
> >>> kernel_sendpage (I have CONFIG_DEBUG_KERNEL but I have
> >>> CONFIG_DEBUG_INFO_NONE=y so is that it should I choose something else
> >>> here)?
> >>>
> >>> Also another piece, when I tested with a server running 5.19-rc6
> >>> (which was based on Trond's tree for 6.0), the server didn't panic.
> >>> Not sure if that helps.
> >>
> >> Should be quick to bisect then.
> >
> > Um.. I disagree but let's agree to disagree.
>
> OK, let me rephrase... Would you please bisect this? I'm at a loss
> to think of a commit in 6.0-rc that would change behavior with TCP
> transports.

I'll give it a go... just can't promise quick. Currently i'm trying to
answer Jeff's request as to what kernel_sendpage() is pointing to on
the kernel (so rebuilding with config_debug_info=y, hoping that would
help)

> >>>>> [ 5554.772249] CPU: 1 PID: 2590 Comm: nfsd Not tainted 6.0.0-rc1+ #84
> >>>>> [ 5554.773575] Hardware name: VMware, Inc. VMware Virtual
> >>>>> Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
> >>>>> [ 5554.775952] Call Trace:
> >>>>> [ 5554.776500] <TASK>
> >>>>> [ 5554.776977] dump_stack_lvl+0x33/0x46
> >>>>> [ 5554.777792] ? kernel_sendpage+0x60/0x220
> >>>>> [ 5554.778672] print_report.cold.12+0x499/0x6b7
> >>>>> [ 5554.779628] ? tcp_release_cb+0x46/0x200
> >>>>> [ 5554.780577] ? kernel_sendpage+0x60/0x220
> >>>>> [ 5554.781516] kasan_report+0xa3/0x120
> >>>>> [ 5554.782361] ? inet_sendmsg+0xa0/0xa0
> >>>>> [ 5554.783217] ? kernel_sendpage+0x60/0x220
> >>>>> [ 5554.784191] kernel_sendpage+0x60/0x220
> >>>>> [ 5554.785247] svc_tcp_sendmsg+0x206/0x2e0 [sunrpc]
> >>>>> [ 5554.787188] ? svc_tcp_send_kvec.isra.20.constprop.29+0xa0/0xa0 [sunrpc]
> >>>>> [ 5554.789364] ? refcount_dec_not_one+0xa0/0x120
> >>>>> [ 5554.790402] ? refcount_warn_saturate+0x120/0x120
> >>>>> [ 5554.791495] ? __rcu_read_unlock+0x4e/0x250
> >>>>> [ 5554.792575] ? __mutex_lock_slowpath+0x10/0x10
> >>>>> [ 5554.793571] ? tcp_release_cb+0x46/0x200
> >>>>> [ 5554.794443] svc_tcp_sendto+0x14f/0x2e0 [sunrpc]
> >>>>> [ 5554.796182] ? svc_addsock+0x370/0x370 [sunrpc]
> >>>>> [ 5554.797924] ? svc_sock_secure_port+0x27/0x50 [sunrpc]
> >>>>> [ 5554.799848] ? svc_recv+0xab0/0xfa0 [sunrpc]
> >>>>> [ 5554.801434] svc_send+0x9c/0x260 [sunrpc]
> >>>>> [ 5554.802963] nfsd+0x170/0x270 [nfsd]
> >>>>> [ 5554.804140] ? nfsd_shutdown_threads+0xe0/0xe0 [nfsd]
> >>>>> [ 5554.805631] kthread+0x160/0x190
> >>>>> [ 5554.806354] ? kthread_complete_and_exit+0x20/0x20
> >>>>> [ 5554.807401] ret_from_fork+0x1f/0x30
> >>>>> [ 5554.808206] </TASK>
> >>>>> [ 5554.808699] ==================================================================
> >>>>> [ 5554.810486] Disabling lock debugging due to kernel taint
> >>>>> [ 5554.811772] BUG: kernel NULL pointer dereference, address: 0000000000000008
> >>>>> [ 5554.813236] #PF: supervisor read access in kernel mode
> >>>>> [ 5554.814345] #PF: error_code(0x0000) - not-present page
> >>>>> [ 5554.815462] PGD 0 P4D 0
> >>>>> [ 5554.816032] Oops: 0000 [#1] PREEMPT SMP KASAN PTI
> >>>>> [ 5554.817057] CPU: 1 PID: 2590 Comm: nfsd Tainted: G B
> >>>>> 6.0.0-rc1+ #84
> >>>>> [ 5554.818677] Hardware name: VMware, Inc. VMware Virtual
> >>>>> Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
> >>>>> [ 5554.821028] RIP: 0010:kernel_sendpage+0x60/0x220
> >>>>> [ 5554.822138] Code: 24 a0 00 00 00 e8 a0 98 83 ff 49 83 bc 24 a0 00
> >>>>> 00 00 00 0f 84 9f 00 00 00 48 8d 43 08 48 89 c7 48 89 44 24 08 e8 80
> >>>>> 98 83 ff <4c> 8b 63 08 41 f6 c4 01 0f 85 ee 00 00 00 0f 1f 44 00 00 48
> >>>>> 89 df
> >>>>> [ 5554.826047] RSP: 0018:ffff888017ef7c38 EFLAGS: 00010296
> >>>>> [ 5554.827192] RAX: 0000000000000001 RBX: 0000000000000000 RCX: ffffffffa3b173b6
> >>>>> [ 5554.828715] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffffffffa6b16260
> >>>>> [ 5554.830237] RBP: ffff8880057ac380 R08: fffffbfff4d62c4d R09: fffffbfff4d62c4d
> >>>>> [ 5554.831757] R10: ffffffffa6b16267 R11: fffffbfff4d62c4c R12: ffffffffa545e6a0
> >>>>> [ 5554.833341] R13: ffff8880057ac3a0 R14: 0000000000001000 R15: 0000000000000000
> >>>>> [ 5554.834881] FS: 0000000000000000(0000) GS:ffff888057c80000(0000)
> >>>>> knlGS:0000000000000000
> >>>>> [ 5554.836590] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>>>> [ 5554.837819] CR2: 0000000000000008 CR3: 000000000677e004 CR4: 00000000001706e0
> >>>>> [ 5554.839374] Call Trace:
> >>>>> [ 5554.839919] <TASK>
> >>>>> [ 5554.840400] svc_tcp_sendmsg+0x206/0x2e0 [sunrpc]
> >>>>> [ 5554.842066] ? svc_tcp_send_kvec.isra.20.constprop.29+0xa0/0xa0 [sunrpc]
> >>>>> [ 5554.844194] ? refcount_dec_not_one+0xa0/0x120
> >>>>> [ 5554.845239] ? refcount_warn_saturate+0x120/0x120
> >>>>> [ 5554.846275] ? __rcu_read_unlock+0x4e/0x250
> >>>>> [ 5554.847199] ? __mutex_lock_slowpath+0x10/0x10
> >>>>> [ 5554.848171] ? tcp_release_cb+0x46/0x200
> >>>>> [ 5554.849039] svc_tcp_sendto+0x14f/0x2e0 [sunrpc]
> >>>>> [ 5554.850667] ? svc_addsock+0x370/0x370 [sunrpc]
> >>>>> [ 5554.852285] ? svc_sock_secure_port+0x27/0x50 [sunrpc]
> >>>>> [ 5554.854420] ? svc_recv+0xab0/0xfa0 [sunrpc]
> >>>>> [ 5554.856187] svc_send+0x9c/0x260 [sunrpc]
> >>>>> [ 5554.857773] nfsd+0x170/0x270 [nfsd]
> >>>>> [ 5554.859009] ? nfsd_shutdown_threads+0xe0/0xe0 [nfsd]
> >>>>> [ 5554.860602] kthread+0x160/0x190
> >>>>> [ 5554.861400] ? kthread_complete_and_exit+0x20/0x20
> >>>>> [ 5554.862452] ret_from_fork+0x1f/0x30
> >>>>> [ 5554.863265] </TASK>
> >>>>> [ 5554.863756] Modules linked in: rdma_ucm ib_uverbs rpcrdma rdma_cm
> >>>>> iw_cm ib_cm ib_core nfsd nfs_acl lockd grace ext4 mbcache jbd2 fuse
> >>>>> xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT
> >>>>> nf_reject_ipv4 nft_compat nf_tables nfnetlink tun bridge stp llc bnep
> >>>>> vmw_vsock_vmci_transport vsock intel_rapl_msr intel_rapl_common
> >>>>> snd_seq_midi snd_seq_midi_event crct10dif_pclmul crc32_pclmul
> >>>>> vmw_balloon ghash_clmulni_intel joydev pcspkr snd_ens1371
> >>>>> snd_ac97_codec ac97_bus snd_seq btusb btrtl btbcm btintel snd_pcm
> >>>>> snd_timer snd_rawmidi snd_seq_device bluetooth rfkill snd ecdh_generic
> >>>>> ecc soundcore vmw_vmci i2c_piix4 auth_rpcgss sunrpc ip_tables xfs
> >>>>> libcrc32c sr_mod cdrom sg crc32c_intel nvme serio_raw nvme_core t10_pi
> >>>>> crc64_rocksoft crc64 vmwgfx drm_ttm_helper ttm ahci drm_kms_helper
> >>>>> ata_generic syscopyarea sysfillrect sysimgblt fb_sys_fops vmxnet3
> >>>>> libahci ata_piix drm libata
> >>>>> [ 5554.880681] CR2: 0000000000000008
> >>>>> [ 5554.881539] ---[ end trace 0000000000000000 ]---
> >>>>
> >>>> --
> >>>> Jeff Layton <[email protected]>
> >>
> >> --
> >> Chuck Lever
>
> --
> Chuck Lever
>
>
>

2022-09-01 14:03:39

[permalink] [raw]

Subject: Re: Is this nfsd kernel oops known?

On Tue, Aug 30, 2022 at 1:49 PM Jeff Layton <[email protected]> wrote:
>
> On Tue, 2022-08-30 at 13:14 -0400, Olga Kornievskaia wrote:
> > Hi folks,
> >
> > Is this a known nfsd kernel oops in 6.0-rc1. Was running xfstests on
> > pre-rhel-9.1 client against 6.0-rc1 server when it panic-ed.
> >
> > [ 5554.769159] BUG: KASAN: null-ptr-deref in kernel_sendpage+0x60/0x220
> > [ 5554.770526] Read of size 8 at addr 0000000000000008 by task nfsd/2590
> > [ 5554.771899]
>
> No, I haven't seen this one. I'm guessing the page pointer passed to
> kernel_sendpage was probably NULL, so this may be a case where something
> walked off the end of the rq_pages array?
>
> Beyond that I can't tell much from just this stack trace. It might be
> nice to see what line of code kernel_sendpage+0x60 refers to on your
> kernel.

After getting debug symbols this is what gdb told me...

(gdb) l *(kernel_sendpage+0x60)
0xffffffff81cbd570 is in kernel_sendpage (./include/linux/page-flags.h:487).
482 TESTCLEARFLAG(LRU, lru, PF_HEAD)
483 PAGEFLAG(Active, active, PF_HEAD) __CLEARPAGEFLAG(Active, active, PF_HEAD)
484 TESTCLEARFLAG(Active, active, PF_HEAD)
485 PAGEFLAG(Workingset, workingset, PF_HEAD)
486 TESTCLEARFLAG(Workingset, workingset, PF_HEAD)
487 __PAGEFLAG(Slab, slab, PF_NO_TAIL)
488 __PAGEFLAG(SlobFree, slob_free, PF_NO_TAIL)
489 PAGEFLAG(Checked, checked, PF_NO_COMPOUND) /* Used by some filesystems */
490
491 /* Xen */

I'm unable to complete a git bisect. Could somebody suggest what to do
when a git bisect midpoint results in an unbootable kernel? I can't
"skip" this point can I? I'm not sure if marking it "bad" makes sense
since it's not relevant to the actual problem.

However, what I believe I can determine is that NFSD changes are not
to blame because I have a git bisect point which lead to a failure and
which didn't have nfsd changes yet (see git bisect log below)

[aglo@ipa84 linux-nfs]$ git bisect log
git bisect start
# bad: [632de0656c63f11f3700d1a03cacfdb4b85ae98f] NFSD enforce
filehandle check for source file in COPY
git bisect bad 632de0656c63f11f3700d1a03cacfdb4b85ae98f
# good: [3d7cb6b04c3f3115719235cc6866b10326de34cd] Linux 5.19
git bisect good 3d7cb6b04c3f3115719235cc6866b10326de34cd
# good: [b44f2fd87919b5ae6e1756d4c7ba2cbba22238e1] Merge tag
'drm-next-2022-08-03' of git://anongit.freedesktop.org/drm/drm
git bisect good b44f2fd87919b5ae6e1756d4c7ba2cbba22238e1
# good: [6614a3c3164a5df2b54abb0b3559f51041cf705b] Merge tag
'mm-stable-2022-08-03' of
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
git bisect good 6614a3c3164a5df2b54abb0b3559f51041cf705b
# good: [eb5699ba31558bdb2cee6ebde3d0a68091e47dce] Merge tag
'mm-nonmm-stable-2022-08-06-2' of
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
git bisect good eb5699ba31558bdb2cee6ebde3d0a68091e47dce
# bad: [5e2e7383b57fa03ec2b00c82bb7f49a4a707c1f7] Merge tag
'pinctrl-v6.0-1' of
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
git bisect bad 5e2e7383b57fa03ec2b00c82bb7f49a4a707c1f7
# good: [507f811f205c17fd6f64e8d34d4bf91cd01b07a2] Merge tag
'pm-5.20-rc1-2' of
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
git bisect good 507f811f205c17fd6f64e8d34d4bf91cd01b07a2
# bad: [e394ff83bbca1c72427b1feb5c6b9d4dad832f01] Merge tag 'nfsd-6.0'
of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
git bisect bad e394ff83bbca1c72427b1feb5c6b9d4dad832f01
# bad: [f30adc0d332fdfe5315cb98bd6a7ff0d5cf2aa38] Merge tag
'pull-work.iov_iter-rebased' of
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
git bisect bad f30adc0d332fdfe5315cb98bd6a7ff0d5cf2aa38
# bad: [8447d0e75099eb54eea9306c2d43ecfc956d09ed] remoteproc:
qcom_q6v5_pas: Do not fail if regulators are not found
git bisect bad 8447d0e75099eb54eea9306c2d43ecfc956d09ed

"Merge tag nfsd-6.0" was bad but so was "Merge tag
'pull-work.iov_iter-rebased'" which is before that. So something else
broke something in the kernel which is affecting NFSD.

Again if somebody could suggest how I could keep git bisecting where I
get non-bootable kernel I could continue...

>
> > [ 5554.772249] CPU: 1 PID: 2590 Comm: nfsd Not tainted 6.0.0-rc1+ #84
> > [ 5554.773575] Hardware name: VMware, Inc. VMware Virtual
> > Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
> > [ 5554.775952] Call Trace:
> > [ 5554.776500] <TASK>
> > [ 5554.776977] dump_stack_lvl+0x33/0x46
> > [ 5554.777792] ? kernel_sendpage+0x60/0x220
> > [ 5554.778672] print_report.cold.12+0x499/0x6b7
> > [ 5554.779628] ? tcp_release_cb+0x46/0x200
> > [ 5554.780577] ? kernel_sendpage+0x60/0x220
> > [ 5554.781516] kasan_report+0xa3/0x120
> > [ 5554.782361] ? inet_sendmsg+0xa0/0xa0
> > [ 5554.783217] ? kernel_sendpage+0x60/0x220
> > [ 5554.784191] kernel_sendpage+0x60/0x220
> > [ 5554.785247] svc_tcp_sendmsg+0x206/0x2e0 [sunrpc]
> > [ 5554.787188] ? svc_tcp_send_kvec.isra.20.constprop.29+0xa0/0xa0 [sunrpc]
> > [ 5554.789364] ? refcount_dec_not_one+0xa0/0x120
> > [ 5554.790402] ? refcount_warn_saturate+0x120/0x120
> > [ 5554.791495] ? __rcu_read_unlock+0x4e/0x250
> > [ 5554.792575] ? __mutex_lock_slowpath+0x10/0x10
> > [ 5554.793571] ? tcp_release_cb+0x46/0x200
> > [ 5554.794443] svc_tcp_sendto+0x14f/0x2e0 [sunrpc]
> > [ 5554.796182] ? svc_addsock+0x370/0x370 [sunrpc]
> > [ 5554.797924] ? svc_sock_secure_port+0x27/0x50 [sunrpc]
> > [ 5554.799848] ? svc_recv+0xab0/0xfa0 [sunrpc]
> > [ 5554.801434] svc_send+0x9c/0x260 [sunrpc]
> > [ 5554.802963] nfsd+0x170/0x270 [nfsd]
> > [ 5554.804140] ? nfsd_shutdown_threads+0xe0/0xe0 [nfsd]
> > [ 5554.805631] kthread+0x160/0x190
> > [ 5554.806354] ? kthread_complete_and_exit+0x20/0x20
> > [ 5554.807401] ret_from_fork+0x1f/0x30
> > [ 5554.808206] </TASK>
> > [ 5554.808699] ==================================================================
> > [ 5554.810486] Disabling lock debugging due to kernel taint
> > [ 5554.811772] BUG: kernel NULL pointer dereference, address: 0000000000000008
> > [ 5554.813236] #PF: supervisor read access in kernel mode
> > [ 5554.814345] #PF: error_code(0x0000) - not-present page
> > [ 5554.815462] PGD 0 P4D 0
> > [ 5554.816032] Oops: 0000 [#1] PREEMPT SMP KASAN PTI
> > [ 5554.817057] CPU: 1 PID: 2590 Comm: nfsd Tainted: G B
> > 6.0.0-rc1+ #84
> > [ 5554.818677] Hardware name: VMware, Inc. VMware Virtual
> > Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
> > [ 5554.821028] RIP: 0010:kernel_sendpage+0x60/0x220
> > [ 5554.822138] Code: 24 a0 00 00 00 e8 a0 98 83 ff 49 83 bc 24 a0 00
> > 00 00 00 0f 84 9f 00 00 00 48 8d 43 08 48 89 c7 48 89 44 24 08 e8 80
> > 98 83 ff <4c> 8b 63 08 41 f6 c4 01 0f 85 ee 00 00 00 0f 1f 44 00 00 48
> > 89 df
> > [ 5554.826047] RSP: 0018:ffff888017ef7c38 EFLAGS: 00010296
> > [ 5554.827192] RAX: 0000000000000001 RBX: 0000000000000000 RCX: ffffffffa3b173b6
> > [ 5554.828715] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffffffffa6b16260
> > [ 5554.830237] RBP: ffff8880057ac380 R08: fffffbfff4d62c4d R09: fffffbfff4d62c4d
> > [ 5554.831757] R10: ffffffffa6b16267 R11: fffffbfff4d62c4c R12: ffffffffa545e6a0
> > [ 5554.833341] R13: ffff8880057ac3a0 R14: 0000000000001000 R15: 0000000000000000
> > [ 5554.834881] FS: 0000000000000000(0000) GS:ffff888057c80000(0000)
> > knlGS:0000000000000000
> > [ 5554.836590] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 5554.837819] CR2: 0000000000000008 CR3: 000000000677e004 CR4: 00000000001706e0
> > [ 5554.839374] Call Trace:
> > [ 5554.839919] <TASK>
> > [ 5554.840400] svc_tcp_sendmsg+0x206/0x2e0 [sunrpc]
> > [ 5554.842066] ? svc_tcp_send_kvec.isra.20.constprop.29+0xa0/0xa0 [sunrpc]
> > [ 5554.844194] ? refcount_dec_not_one+0xa0/0x120
> > [ 5554.845239] ? refcount_warn_saturate+0x120/0x120
> > [ 5554.846275] ? __rcu_read_unlock+0x4e/0x250
> > [ 5554.847199] ? __mutex_lock_slowpath+0x10/0x10
> > [ 5554.848171] ? tcp_release_cb+0x46/0x200
> > [ 5554.849039] svc_tcp_sendto+0x14f/0x2e0 [sunrpc]
> > [ 5554.850667] ? svc_addsock+0x370/0x370 [sunrpc]
> > [ 5554.852285] ? svc_sock_secure_port+0x27/0x50 [sunrpc]
> > [ 5554.854420] ? svc_recv+0xab0/0xfa0 [sunrpc]
> > [ 5554.856187] svc_send+0x9c/0x260 [sunrpc]
> > [ 5554.857773] nfsd+0x170/0x270 [nfsd]
> > [ 5554.859009] ? nfsd_shutdown_threads+0xe0/0xe0 [nfsd]
> > [ 5554.860602] kthread+0x160/0x190
> > [ 5554.861400] ? kthread_complete_and_exit+0x20/0x20
> > [ 5554.862452] ret_from_fork+0x1f/0x30
> > [ 5554.863265] </TASK>
> > [ 5554.863756] Modules linked in: rdma_ucm ib_uverbs rpcrdma rdma_cm
> > iw_cm ib_cm ib_core nfsd nfs_acl lockd grace ext4 mbcache jbd2 fuse
> > xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT
> > nf_reject_ipv4 nft_compat nf_tables nfnetlink tun bridge stp llc bnep
> > vmw_vsock_vmci_transport vsock intel_rapl_msr intel_rapl_common
> > snd_seq_midi snd_seq_midi_event crct10dif_pclmul crc32_pclmul
> > vmw_balloon ghash_clmulni_intel joydev pcspkr snd_ens1371
> > snd_ac97_codec ac97_bus snd_seq btusb btrtl btbcm btintel snd_pcm
> > snd_timer snd_rawmidi snd_seq_device bluetooth rfkill snd ecdh_generic
> > ecc soundcore vmw_vmci i2c_piix4 auth_rpcgss sunrpc ip_tables xfs
> > libcrc32c sr_mod cdrom sg crc32c_intel nvme serio_raw nvme_core t10_pi
> > crc64_rocksoft crc64 vmwgfx drm_ttm_helper ttm ahci drm_kms_helper
> > ata_generic syscopyarea sysfillrect sysimgblt fb_sys_fops vmxnet3
> > libahci ata_piix drm libata
> > [ 5554.880681] CR2: 0000000000000008
> > [ 5554.881539] ---[ end trace 0000000000000000 ]---
>
> --
> Jeff Layton <[email protected]>

2022-09-01 14:34:31

[permalink] [raw]

Subject: Re: Is this nfsd kernel oops known?

On 1 Sep 2022, at 9:51, Olga Kornievskaia wrote:

> On Tue, Aug 30, 2022 at 1:49 PM Jeff Layton <[email protected]>
> wrote:
>>
>> On Tue, 2022-08-30 at 13:14 -0400, Olga Kornievskaia wrote:
>>> Hi folks,
>>>
>>> Is this a known nfsd kernel oops in 6.0-rc1. Was running xfstests on
>>> pre-rhel-9.1 client against 6.0-rc1 server when it panic-ed.
>>>
>>> [ 5554.769159] BUG: KASAN: null-ptr-deref in
>>> kernel_sendpage+0x60/0x220
>>> [ 5554.770526] Read of size 8 at addr 0000000000000008 by task
>>> nfsd/2590
>>> [ 5554.771899]
>>
>> No, I haven't seen this one. I'm guessing the page pointer passed to
>> kernel_sendpage was probably NULL, so this may be a case where
>> something
>> walked off the end of the rq_pages array?
>>
>> Beyond that I can't tell much from just this stack trace. It might be
>> nice to see what line of code kernel_sendpage+0x60 refers to on your
>> kernel.
>
> After getting debug symbols this is what gdb told me...
>
> (gdb) l *(kernel_sendpage+0x60)
> 0xffffffff81cbd570 is in kernel_sendpage
> (./include/linux/page-flags.h:487).
> 482 TESTCLEARFLAG(LRU, lru, PF_HEAD)
> 483 PAGEFLAG(Active, active, PF_HEAD) __CLEARPAGEFLAG(Active, active,
> PF_HEAD)
> 484 TESTCLEARFLAG(Active, active, PF_HEAD)
> 485 PAGEFLAG(Workingset, workingset, PF_HEAD)
> 486 TESTCLEARFLAG(Workingset, workingset, PF_HEAD)
> 487 __PAGEFLAG(Slab, slab, PF_NO_TAIL)
> 488 __PAGEFLAG(SlobFree, slob_free, PF_NO_TAIL)
> 489 PAGEFLAG(Checked, checked, PF_NO_COMPOUND) /* Used by some
> filesystems */
> 490
> 491 /* Xen */

I just oopsed here too on 6.0-rc3, but I didn't get a vmcore. I'll get
the
next one and hopefully take it apart a bit further. My oops was on
kernel_sendpage+0x1d:

crash> dis -lrx kernel_sendpage+0x52
/usr/local/src/linux/net/socket.c: 3557
0xffffffff9caf0160 <kernel_sendpage>: nopl 0x0(%rax,%rax,1)
[FTRACE NOP]
/usr/local/src/linux/net/socket.c: 3558
0xffffffff9caf0165 <kernel_sendpage+0x5>: push %rbx
0xffffffff9caf0166 <kernel_sendpage+0x6>: mov %rdi,%rbx
0xffffffff9caf0169 <kernel_sendpage+0x9>: sub $0x18,%rsp
0xffffffff9caf016d <kernel_sendpage+0xd>: mov
0x20(%rdi),%rax
0xffffffff9caf0171 <kernel_sendpage+0x11>: mov
0xa0(%rax),%r9
0xffffffff9caf0178 <kernel_sendpage+0x18>: test %r9,%r9
0xffffffff9caf017b <kernel_sendpage+0x1b>: je
0xffffffff9caf01b2 <kernel_sendpage+0x52>
/usr/local/src/linux/./include/linux/page-flags.h: 253
>>> 0xffffffff9caf017d <kernel_sendpage+0x1d>: mov
>>> 0x8(%rsi),%rax
/usr/local/src/linux/./include/linux/page-flags.h: 255

Yes, RSI is 0.

251 static inline unsigned long _compound_head(const struct page
*page)
252 {
>>> 253 unsigned long head = READ_ONCE(page->compound_head);
254
255 if (unlikely(head & 1))
256 return head - 1;
257 return (unsigned long)page_fixed_fake_head(page);
258 }

Hmm, maybe that's inside

kernel_sendpage ->
sendpage_ok ->
page_count ->
folio_ref_count ->
page_folio

.. and page is NULL? That would only make sense if we used to survive
calling kernel_sendpage with bvec->bv_page = NULL, which seems unlikely.

I'll try to catch a vmcore this time, which will help me see more.

Ben

2022-09-01 20:37:38

[permalink] [raw]

Subject: Re: Is this nfsd kernel oops known?

> On Sep 1, 2022, at 9:51 AM, Olga Kornievskaia <[email protected]> wrote:
>
> On Tue, Aug 30, 2022 at 1:49 PM Jeff Layton <[email protected]> wrote:
>>
>> On Tue, 2022-08-30 at 13:14 -0400, Olga Kornievskaia wrote:
>>> Hi folks,
>>>
>>> Is this a known nfsd kernel oops in 6.0-rc1. Was running xfstests on
>>> pre-rhel-9.1 client against 6.0-rc1 server when it panic-ed.
>>>
>>> [ 5554.769159] BUG: KASAN: null-ptr-deref in kernel_sendpage+0x60/0x220
>>> [ 5554.770526] Read of size 8 at addr 0000000000000008 by task nfsd/2590
>>> [ 5554.771899]
>>
>> No, I haven't seen this one. I'm guessing the page pointer passed to
>> kernel_sendpage was probably NULL, so this may be a case where something
>> walked off the end of the rq_pages array?
>>
>> Beyond that I can't tell much from just this stack trace. It might be
>> nice to see what line of code kernel_sendpage+0x60 refers to on your
>> kernel.
>
> After getting debug symbols this is what gdb told me...
>
> (gdb) l *(kernel_sendpage+0x60)
> 0xffffffff81cbd570 is in kernel_sendpage (./include/linux/page-flags.h:487).
> 482 TESTCLEARFLAG(LRU, lru, PF_HEAD)
> 483 PAGEFLAG(Active, active, PF_HEAD) __CLEARPAGEFLAG(Active, active, PF_HEAD)
> 484 TESTCLEARFLAG(Active, active, PF_HEAD)
> 485 PAGEFLAG(Workingset, workingset, PF_HEAD)
> 486 TESTCLEARFLAG(Workingset, workingset, PF_HEAD)
> 487 __PAGEFLAG(Slab, slab, PF_NO_TAIL)
> 488 __PAGEFLAG(SlobFree, slob_free, PF_NO_TAIL)
> 489 PAGEFLAG(Checked, checked, PF_NO_COMPOUND) /* Used by some filesystems */
> 490
> 491 /* Xen */
>
> I'm unable to complete a git bisect. Could somebody suggest what to do
> when a git bisect midpoint results in an unbootable kernel? I can't
> "skip" this point can I? I'm not sure if marking it "bad" makes sense
> since it's not relevant to the actual problem.

Try skipping. I think "git bisect" is supposed to be smart enough
to figure things out after a skip.

If not, start over and pick endpoints that you think might avoid
the unbootable kernels. Even adjusting one endpoint a little
might be enough to make bisect pick new targets that avoid the
unbootable kernels.

--
Chuck Lever

2022-09-02 01:32:35

[permalink] [raw]

Subject: Re: Is this nfsd kernel oops known?

On Thu, Sep 1, 2022 at 4:09 PM Chuck Lever III <[email protected]> wrote:
>
>
>
> > On Sep 1, 2022, at 9:51 AM, Olga Kornievskaia <[email protected]> wrote:
> >
> > On Tue, Aug 30, 2022 at 1:49 PM Jeff Layton <[email protected]> wrote:
> >>
> >> On Tue, 2022-08-30 at 13:14 -0400, Olga Kornievskaia wrote:
> >>> Hi folks,
> >>>
> >>> Is this a known nfsd kernel oops in 6.0-rc1. Was running xfstests on
> >>> pre-rhel-9.1 client against 6.0-rc1 server when it panic-ed.
> >>>
> >>> [ 5554.769159] BUG: KASAN: null-ptr-deref in kernel_sendpage+0x60/0x220
> >>> [ 5554.770526] Read of size 8 at addr 0000000000000008 by task nfsd/2590
> >>> [ 5554.771899]
> >>
> >> No, I haven't seen this one. I'm guessing the page pointer passed to
> >> kernel_sendpage was probably NULL, so this may be a case where something
> >> walked off the end of the rq_pages array?
> >>
> >> Beyond that I can't tell much from just this stack trace. It might be
> >> nice to see what line of code kernel_sendpage+0x60 refers to on your
> >> kernel.
> >
> > After getting debug symbols this is what gdb told me...
> >
> > (gdb) l *(kernel_sendpage+0x60)
> > 0xffffffff81cbd570 is in kernel_sendpage (./include/linux/page-flags.h:487).
> > 482 TESTCLEARFLAG(LRU, lru, PF_HEAD)
> > 483 PAGEFLAG(Active, active, PF_HEAD) __CLEARPAGEFLAG(Active, active, PF_HEAD)
> > 484 TESTCLEARFLAG(Active, active, PF_HEAD)
> > 485 PAGEFLAG(Workingset, workingset, PF_HEAD)
> > 486 TESTCLEARFLAG(Workingset, workingset, PF_HEAD)
> > 487 __PAGEFLAG(Slab, slab, PF_NO_TAIL)
> > 488 __PAGEFLAG(SlobFree, slob_free, PF_NO_TAIL)
> > 489 PAGEFLAG(Checked, checked, PF_NO_COMPOUND) /* Used by some filesystems */
> > 490
> > 491 /* Xen */
> >
> > I'm unable to complete a git bisect. Could somebody suggest what to do
> > when a git bisect midpoint results in an unbootable kernel? I can't
> > "skip" this point can I? I'm not sure if marking it "bad" makes sense
> > since it's not relevant to the actual problem.
>
> Try skipping. I think "git bisect" is supposed to be smart enough
> to figure things out after a skip.
>
> If not, start over and pick endpoints that you think might avoid
> the unbootable kernels. Even adjusting one endpoint a little
> might be enough to make bisect pick new targets that avoid the
> unbootable kernels.

Thanks Chuck. I first, based on a hunch, narrowed down that it's
coming from Al Viro's merge commit. Then I git bisected his 32patches
to the following commit f0f6b614f83dbae99d283b7b12ab5dd2e04df979

[aglo@ipa84 linux-nfs]$ git bisect log
git bisect start
# bad: [f30adc0d332fdfe5315cb98bd6a7ff0d5cf2aa38] Merge tag
'pull-work.iov_iter-rebased' of
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
git bisect bad f30adc0d332fdfe5315cb98bd6a7ff0d5cf2aa38
# good: [5d5d353bed32dc3ea52e2619e0d1c60b17133b91] Merge tag
'rproc-v5.20' of
git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux
git bisect good 5d5d353bed32dc3ea52e2619e0d1c60b17133b91
# good: [68fe506f3731ecf7881de9512cc5f4c14fd13f3a] unify
xarray_get_pages() and xarray_get_pages_alloc()
git bisect good 68fe506f3731ecf7881de9512cc5f4c14fd13f3a
# good: [dc5801f60b269a73fcce789856c99d1845f75827] af_alg_make_sg():
switch to advancing variant of iov_iter_get_pages()
git bisect good dc5801f60b269a73fcce789856c99d1845f75827
# good: [746de1f86fcd33464acac047f111eea877f2f7a0] pipe_get_pages():
switch to append_pipe()
git bisect good 746de1f86fcd33464acac047f111eea877f2f7a0
# bad: [f0f6b614f83dbae99d283b7b12ab5dd2e04df979] copy_page_to_iter():
don't split high-order page in case of ITER_PIPE
git bisect bad f0f6b614f83dbae99d283b7b12ab5dd2e04df979

> --
> Chuck Lever
>
>
>

2022-09-02 21:05:33

[permalink] [raw]

Subject: Re: Is this nfsd kernel oops known?

Olga, does this fix it up for you? I'm testing now, but I think it
might be
a little harder for me to hit.

Ben

8<------------------------------------------------
From 6bea39a887495b1748ff3b179d6e2f3d7e552b61 Mon Sep 17 00:00:00 2001
From: Benjamin Coddington <[email protected]>
Date: Fri, 2 Sep 2022 16:49:17 -0400
Subject: [PATCH] SUNRPC: Fix svc_tcp_sendmsg bvec offset calculation

The xdr_buf's bvec member points to an array of struct bio_vec, let's
fixup the calculation to the start of the bio_vec for non-zero
page_base.

Fixes: bad4c6eb5eaa ("SUNRPC: Fix NFS READs that start at
non-page-aligned offsets")
Signed-off-by: Benjamin Coddington <[email protected]>
---
net/sunrpc/svcsock.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 2fc98fea59b4..ecafc9c4bc5c 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -1110,7 +1110,7 @@ static int svc_tcp_sendmsg(struct socket *sock,
struct xdr_buf *xdr,
unsigned int offset, len, remaining;
struct bio_vec *bvec;

- bvec = xdr->bvec + (xdr->page_base >> PAGE_SHIFT);
+ bvec = &xdr->bvec[xdr->page_base >> PAGE_SHIFT];
offset = offset_in_page(xdr->page_base);
remaining = xdr->page_len;
while (remaining > 0) {
--
2.37.2

2022-09-02 21:30:47

[permalink] [raw]

Subject: Re: Is this nfsd kernel oops known?

> On Sep 2, 2022, at 4:58 PM, Benjamin Coddington <[email protected]> wrote:
>
> Olga, does this fix it up for you? I'm testing now, but I think it might be
> a little harder for me to hit.
>
> Ben
>
> 8<------------------------------------------------
> From 6bea39a887495b1748ff3b179d6e2f3d7e552b61 Mon Sep 17 00:00:00 2001
> From: Benjamin Coddington <[email protected]>
> Date: Fri, 2 Sep 2022 16:49:17 -0400
> Subject: [PATCH] SUNRPC: Fix svc_tcp_sendmsg bvec offset calculation
>
> The xdr_buf's bvec member points to an array of struct bio_vec, let's
> fixup the calculation to the start of the bio_vec for non-zero
> page_base.
>
> Fixes: bad4c6eb5eaa ("SUNRPC: Fix NFS READs that start at non-page-aligned offsets")
> Signed-off-by: Benjamin Coddington <[email protected]>
> ---
> net/sunrpc/svcsock.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> index 2fc98fea59b4..ecafc9c4bc5c 100644
> --- a/net/sunrpc/svcsock.c
> +++ b/net/sunrpc/svcsock.c
> @@ -1110,7 +1110,7 @@ static int svc_tcp_sendmsg(struct socket *sock, struct xdr_buf *xdr,
> unsigned int offset, len, remaining;
> struct bio_vec *bvec;
>
> - bvec = xdr->bvec + (xdr->page_base >> PAGE_SHIFT);
> + bvec = &xdr->bvec[xdr->page_base >> PAGE_SHIFT];

Color me skeptical.

I'm not sure these two expressions are different. This variety
of pointer arithmetic is used throughout the XDR layer:

net/sunrpc/xdr.c: pgto = pages + (pgto_base >> PAGE_SHIFT);
net/sunrpc/xdr.c: pgfrom = pages + (pgfrom_base >> PAGE_SHIFT);
net/sunrpc/xdr.c: pgto = pages + (pgto_base >> PAGE_SHIFT);
net/sunrpc/xdr.c: pgfrom = pages + (pgfrom_base >> PAGE_SHIFT);
net/sunrpc/xdr.c: pgto = pages + (pgbase >> PAGE_SHIFT);
net/sunrpc/xdr.c: pgfrom = pages + (pgbase >> PAGE_SHIFT);
net/sunrpc/xdr.c: page = pages + (pgbase >> PAGE_SHIFT);
net/sunrpc/xdr.c: xdr->page_ptr = buf->pages + (new >> PAGE_SHIFT);
net/sunrpc/xdr.c: ppages = buf->pages + (base >> PAGE_SHIFT);
net/sunrpc/xprtrdma/rpc_rdma.c: ppages = buf->pages + (buf->page_base >> PAGE_SHIFT);
net/sunrpc/xprtrdma/rpc_rdma.c: ppages = xdrbuf->pages + (xdrbuf->page_base >> PAGE_SHIFT);
net/sunrpc/xprtrdma/rpc_rdma.c: ppages = xdr->pages + (xdr->page_base >> PAGE_SHIFT);
net/sunrpc/xprtrdma/rpc_rdma.c: ppages = xdr->pages + (xdr->page_base >> PAGE_SHIFT);

Commit bad4c6eb5eaa is from v5.11. Wouldn't this issue have
shown up in earlier kernels? At the very least, the patch
description needs to explain why this computation is not a
problem for kernels 5.11 through 5.19.

> offset = offset_in_page(xdr->page_base);
> remaining = xdr->page_len;
> while (remaining > 0) {
> --
> 2.37.2
>

--
Chuck Lever

2022-09-03 00:45:24

[permalink] [raw]

Subject: Re: Is this nfsd kernel oops known?

On 2 Sep 2022, at 17:13, Chuck Lever III wrote:

>> On Sep 2, 2022, at 4:58 PM, Benjamin Coddington <[email protected]>
>> wrote:
>>
>> Olga, does this fix it up for you? I'm testing now, but I think it
>> might be
>> a little harder for me to hit.
>>
>> Ben
>>
>> 8<------------------------------------------------
>> From 6bea39a887495b1748ff3b179d6e2f3d7e552b61 Mon Sep 17 00:00:00
>> 2001
>> From: Benjamin Coddington <[email protected]>
>> Date: Fri, 2 Sep 2022 16:49:17 -0400
>> Subject: [PATCH] SUNRPC: Fix svc_tcp_sendmsg bvec offset calculation
>>
>> The xdr_buf's bvec member points to an array of struct bio_vec, let's
>> fixup the calculation to the start of the bio_vec for non-zero
>> page_base.
>>
>> Fixes: bad4c6eb5eaa ("SUNRPC: Fix NFS READs that start at
>> non-page-aligned offsets")
>> Signed-off-by: Benjamin Coddington <[email protected]>
>> ---
>> net/sunrpc/svcsock.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
>> index 2fc98fea59b4..ecafc9c4bc5c 100644
>> --- a/net/sunrpc/svcsock.c
>> +++ b/net/sunrpc/svcsock.c
>> @@ -1110,7 +1110,7 @@ static int svc_tcp_sendmsg(struct socket *sock,
>> struct xdr_buf *xdr,
>> unsigned int offset, len, remaining;
>> struct bio_vec *bvec;
>>
>> - bvec = xdr->bvec + (xdr->page_base >> PAGE_SHIFT);
>> + bvec = &xdr->bvec[xdr->page_base >> PAGE_SHIFT];
>
> Color me skeptical.
>
> I'm not sure these two expressions are different. This variety
> of pointer arithmetic is used throughout the XDR layer:

Yeah, you know what - it did crash in the same place with this change.

My thinking was that if you have (for example) page_base = 8192, and
xdr->bvec of, say 0xffff4500, then what you want is to set the local
bvec var
to 0xfff4500 + sizeof(struct bio_vec)*2, but the code looks like it
would
set the local bvec to 0xffff4502, which is not the same thing..

There must be a hole in my head, I guess I need to dig out my K&R,
sorry
for the noise. I will figure it out.

> net/sunrpc/xdr.c: pgto = pages + (pgto_base >> PAGE_SHIFT);
> net/sunrpc/xdr.c: pgfrom = pages + (pgfrom_base >> PAGE_SHIFT);
> net/sunrpc/xdr.c: pgto = pages + (pgto_base >> PAGE_SHIFT);
> net/sunrpc/xdr.c: pgfrom = pages + (pgfrom_base >> PAGE_SHIFT);
> net/sunrpc/xdr.c: pgto = pages + (pgbase >> PAGE_SHIFT);
> net/sunrpc/xdr.c: pgfrom = pages + (pgbase >> PAGE_SHIFT);
> net/sunrpc/xdr.c: page = pages + (pgbase >> PAGE_SHIFT);
> net/sunrpc/xdr.c: xdr->page_ptr = buf->pages + (new >>
> PAGE_SHIFT);
> net/sunrpc/xdr.c: ppages = buf->pages + (base >>
> PAGE_SHIFT);
> net/sunrpc/xprtrdma/rpc_rdma.c: ppages = buf->pages + (buf->page_base
> >> PAGE_SHIFT);
> net/sunrpc/xprtrdma/rpc_rdma.c: ppages = xdrbuf->pages +
> (xdrbuf->page_base >> PAGE_SHIFT);
> net/sunrpc/xprtrdma/rpc_rdma.c: ppages = xdr->pages + (xdr->page_base
> >> PAGE_SHIFT);
> net/sunrpc/xprtrdma/rpc_rdma.c: ppages = xdr->pages + (xdr->page_base
> >> PAGE_SHIFT);

Hmm.. there's clearly something wrong with me.

> Commit bad4c6eb5eaa is from v5.11. Wouldn't this issue have
> shown up in earlier kernels? At the very least, the patch
> description needs to explain why this computation is not a
> problem for kernels 5.11 through 5.19.

I totally agree. I figured it was rare to have a non-zero page_base,
and
maybe a client change is now creating that.

Ben

2022-09-03 03:50:58

[permalink] [raw]

Subject: Re: Is this nfsd kernel oops known?

Hi folks,

Ok so I won't be trying Ben's idea but I'm sort of confused is the
thought that NFS is somehow at fault in incorrectly using the "new"
code introduced by the new patches. Isn't it possible that the new
patches are wrong? I haven't had time to try and revert the patch(es)
to see if that makes the oops go away. I won't get around to it until
about tuesday with the holidays.

On Fri, Sep 2, 2022 at 8:38 PM Benjamin Coddington <[email protected]> wrote:
>
> On 2 Sep 2022, at 17:13, Chuck Lever III wrote:
>
> >> On Sep 2, 2022, at 4:58 PM, Benjamin Coddington <[email protected]>
> >> wrote:
> >>
> >> Olga, does this fix it up for you? I'm testing now, but I think it
> >> might be
> >> a little harder for me to hit.
> >>
> >> Ben
> >>
> >> 8<------------------------------------------------
> >> From 6bea39a887495b1748ff3b179d6e2f3d7e552b61 Mon Sep 17 00:00:00
> >> 2001
> >> From: Benjamin Coddington <[email protected]>
> >> Date: Fri, 2 Sep 2022 16:49:17 -0400
> >> Subject: [PATCH] SUNRPC: Fix svc_tcp_sendmsg bvec offset calculation
> >>
> >> The xdr_buf's bvec member points to an array of struct bio_vec, let's
> >> fixup the calculation to the start of the bio_vec for non-zero
> >> page_base.
> >>
> >> Fixes: bad4c6eb5eaa ("SUNRPC: Fix NFS READs that start at
> >> non-page-aligned offsets")
> >> Signed-off-by: Benjamin Coddington <[email protected]>
> >> ---
> >> net/sunrpc/svcsock.c | 2 +-
> >> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> >> index 2fc98fea59b4..ecafc9c4bc5c 100644
> >> --- a/net/sunrpc/svcsock.c
> >> +++ b/net/sunrpc/svcsock.c
> >> @@ -1110,7 +1110,7 @@ static int svc_tcp_sendmsg(struct socket *sock,
> >> struct xdr_buf *xdr,
> >> unsigned int offset, len, remaining;
> >> struct bio_vec *bvec;
> >>
> >> - bvec = xdr->bvec + (xdr->page_base >> PAGE_SHIFT);
> >> + bvec = &xdr->bvec[xdr->page_base >> PAGE_SHIFT];
> >
> > Color me skeptical.
> >
> > I'm not sure these two expressions are different. This variety
> > of pointer arithmetic is used throughout the XDR layer:
>
> Yeah, you know what - it did crash in the same place with this change.
>
> My thinking was that if you have (for example) page_base = 8192, and
> xdr->bvec of, say 0xffff4500, then what you want is to set the local
> bvec var
> to 0xfff4500 + sizeof(struct bio_vec)*2, but the code looks like it
> would
> set the local bvec to 0xffff4502, which is not the same thing..
>
> There must be a hole in my head, I guess I need to dig out my K&R,
> sorry
> for the noise. I will figure it out.
>
> > net/sunrpc/xdr.c: pgto = pages + (pgto_base >> PAGE_SHIFT);
> > net/sunrpc/xdr.c: pgfrom = pages + (pgfrom_base >> PAGE_SHIFT);
> > net/sunrpc/xdr.c: pgto = pages + (pgto_base >> PAGE_SHIFT);
> > net/sunrpc/xdr.c: pgfrom = pages + (pgfrom_base >> PAGE_SHIFT);
> > net/sunrpc/xdr.c: pgto = pages + (pgbase >> PAGE_SHIFT);
> > net/sunrpc/xdr.c: pgfrom = pages + (pgbase >> PAGE_SHIFT);
> > net/sunrpc/xdr.c: page = pages + (pgbase >> PAGE_SHIFT);
> > net/sunrpc/xdr.c: xdr->page_ptr = buf->pages + (new >>
> > PAGE_SHIFT);
> > net/sunrpc/xdr.c: ppages = buf->pages + (base >>
> > PAGE_SHIFT);
> > net/sunrpc/xprtrdma/rpc_rdma.c: ppages = buf->pages + (buf->page_base
> > >> PAGE_SHIFT);
> > net/sunrpc/xprtrdma/rpc_rdma.c: ppages = xdrbuf->pages +
> > (xdrbuf->page_base >> PAGE_SHIFT);
> > net/sunrpc/xprtrdma/rpc_rdma.c: ppages = xdr->pages + (xdr->page_base
> > >> PAGE_SHIFT);
> > net/sunrpc/xprtrdma/rpc_rdma.c: ppages = xdr->pages + (xdr->page_base
> > >> PAGE_SHIFT);
>
> Hmm.. there's clearly something wrong with me.
>
> > Commit bad4c6eb5eaa is from v5.11. Wouldn't this issue have
> > shown up in earlier kernels? At the very least, the patch
> > description needs to explain why this computation is not a
> > problem for kernels 5.11 through 5.19.
>
> I totally agree. I figured it was rare to have a non-zero page_base,
> and
> maybe a client change is now creating that.
>
> Ben
>

2022-09-03 04:30:53

by NeilBrown

[permalink] [raw]

Subject: Re: Is this nfsd kernel oops known?

On Thu, 01 Sep 2022, Olga Kornievskaia wrote:
> On Tue, Aug 30, 2022 at 1:49 PM Jeff Layton <[email protected]> wrote:
> >
> > On Tue, 2022-08-30 at 13:14 -0400, Olga Kornievskaia wrote:
> > > Hi folks,
> > >
> > > Is this a known nfsd kernel oops in 6.0-rc1. Was running xfstests on
> > > pre-rhel-9.1 client against 6.0-rc1 server when it panic-ed.
> > >
> > > [ 5554.769159] BUG: KASAN: null-ptr-deref in kernel_sendpage+0x60/0x220
> > > [ 5554.770526] Read of size 8 at addr 0000000000000008 by task nfsd/2590
> > > [ 5554.771899]
> >
> > No, I haven't seen this one. I'm guessing the page pointer passed to
> > kernel_sendpage was probably NULL, so this may be a case where something
> > walked off the end of the rq_pages array?
> >
> > Beyond that I can't tell much from just this stack trace. It might be
> > nice to see what line of code kernel_sendpage+0x60 refers to on your
> > kernel.
>
> After getting debug symbols this is what gdb told me...

have you tried
./scripts/faddr2line vmlinux kernel_sendpage+0x60
??
It will show you where inlined code was called from, and is a bit easier
than gdb.
Add a "--list' option to get surrounding source code.

NeilBrown

>
> (gdb) l *(kernel_sendpage+0x60)
> 0xffffffff81cbd570 is in kernel_sendpage (./include/linux/page-flags.h:487).
> 482 TESTCLEARFLAG(LRU, lru, PF_HEAD)
> 483 PAGEFLAG(Active, active, PF_HEAD) __CLEARPAGEFLAG(Active, active, PF_HEAD)
> 484 TESTCLEARFLAG(Active, active, PF_HEAD)
> 485 PAGEFLAG(Workingset, workingset, PF_HEAD)
> 486 TESTCLEARFLAG(Workingset, workingset, PF_HEAD)
> 487 __PAGEFLAG(Slab, slab, PF_NO_TAIL)
> 488 __PAGEFLAG(SlobFree, slob_free, PF_NO_TAIL)
> 489 PAGEFLAG(Checked, checked, PF_NO_COMPOUND) /* Used by some filesystems */
> 490
> 491 /* Xen */
>
> I'm unable to complete a git bisect. Could somebody suggest what to do
> when a git bisect midpoint results in an unbootable kernel? I can't
> "skip" this point can I? I'm not sure if marking it "bad" makes sense
> since it's not relevant to the actual problem.
>
> However, what I believe I can determine is that NFSD changes are not
> to blame because I have a git bisect point which lead to a failure and
> which didn't have nfsd changes yet (see git bisect log below)
>
> [aglo@ipa84 linux-nfs]$ git bisect log
> git bisect start
> # bad: [632de0656c63f11f3700d1a03cacfdb4b85ae98f] NFSD enforce
> filehandle check for source file in COPY
> git bisect bad 632de0656c63f11f3700d1a03cacfdb4b85ae98f
> # good: [3d7cb6b04c3f3115719235cc6866b10326de34cd] Linux 5.19
> git bisect good 3d7cb6b04c3f3115719235cc6866b10326de34cd
> # good: [b44f2fd87919b5ae6e1756d4c7ba2cbba22238e1] Merge tag
> 'drm-next-2022-08-03' of git://anongit.freedesktop.org/drm/drm
> git bisect good b44f2fd87919b5ae6e1756d4c7ba2cbba22238e1
> # good: [6614a3c3164a5df2b54abb0b3559f51041cf705b] Merge tag
> 'mm-stable-2022-08-03' of
> git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
> git bisect good 6614a3c3164a5df2b54abb0b3559f51041cf705b
> # good: [eb5699ba31558bdb2cee6ebde3d0a68091e47dce] Merge tag
> 'mm-nonmm-stable-2022-08-06-2' of
> git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
> git bisect good eb5699ba31558bdb2cee6ebde3d0a68091e47dce
> # bad: [5e2e7383b57fa03ec2b00c82bb7f49a4a707c1f7] Merge tag
> 'pinctrl-v6.0-1' of
> git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
> git bisect bad 5e2e7383b57fa03ec2b00c82bb7f49a4a707c1f7
> # good: [507f811f205c17fd6f64e8d34d4bf91cd01b07a2] Merge tag
> 'pm-5.20-rc1-2' of
> git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
> git bisect good 507f811f205c17fd6f64e8d34d4bf91cd01b07a2
> # bad: [e394ff83bbca1c72427b1feb5c6b9d4dad832f01] Merge tag 'nfsd-6.0'
> of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
> git bisect bad e394ff83bbca1c72427b1feb5c6b9d4dad832f01
> # bad: [f30adc0d332fdfe5315cb98bd6a7ff0d5cf2aa38] Merge tag
> 'pull-work.iov_iter-rebased' of
> git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
> git bisect bad f30adc0d332fdfe5315cb98bd6a7ff0d5cf2aa38
> # bad: [8447d0e75099eb54eea9306c2d43ecfc956d09ed] remoteproc:
> qcom_q6v5_pas: Do not fail if regulators are not found
> git bisect bad 8447d0e75099eb54eea9306c2d43ecfc956d09ed
>
> "Merge tag nfsd-6.0" was bad but so was "Merge tag
> 'pull-work.iov_iter-rebased'" which is before that. So something else
> broke something in the kernel which is affecting NFSD.
>
> Again if somebody could suggest how I could keep git bisecting where I
> get non-bootable kernel I could continue...
>
>
> >
> > > [ 5554.772249] CPU: 1 PID: 2590 Comm: nfsd Not tainted 6.0.0-rc1+ #84
> > > [ 5554.773575] Hardware name: VMware, Inc. VMware Virtual
> > > Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
> > > [ 5554.775952] Call Trace:
> > > [ 5554.776500] <TASK>
> > > [ 5554.776977] dump_stack_lvl+0x33/0x46
> > > [ 5554.777792] ? kernel_sendpage+0x60/0x220
> > > [ 5554.778672] print_report.cold.12+0x499/0x6b7
> > > [ 5554.779628] ? tcp_release_cb+0x46/0x200
> > > [ 5554.780577] ? kernel_sendpage+0x60/0x220
> > > [ 5554.781516] kasan_report+0xa3/0x120
> > > [ 5554.782361] ? inet_sendmsg+0xa0/0xa0
> > > [ 5554.783217] ? kernel_sendpage+0x60/0x220
> > > [ 5554.784191] kernel_sendpage+0x60/0x220
> > > [ 5554.785247] svc_tcp_sendmsg+0x206/0x2e0 [sunrpc]
> > > [ 5554.787188] ? svc_tcp_send_kvec.isra.20.constprop.29+0xa0/0xa0 [sunrpc]
> > > [ 5554.789364] ? refcount_dec_not_one+0xa0/0x120
> > > [ 5554.790402] ? refcount_warn_saturate+0x120/0x120
> > > [ 5554.791495] ? __rcu_read_unlock+0x4e/0x250
> > > [ 5554.792575] ? __mutex_lock_slowpath+0x10/0x10
> > > [ 5554.793571] ? tcp_release_cb+0x46/0x200
> > > [ 5554.794443] svc_tcp_sendto+0x14f/0x2e0 [sunrpc]
> > > [ 5554.796182] ? svc_addsock+0x370/0x370 [sunrpc]
> > > [ 5554.797924] ? svc_sock_secure_port+0x27/0x50 [sunrpc]
> > > [ 5554.799848] ? svc_recv+0xab0/0xfa0 [sunrpc]
> > > [ 5554.801434] svc_send+0x9c/0x260 [sunrpc]
> > > [ 5554.802963] nfsd+0x170/0x270 [nfsd]
> > > [ 5554.804140] ? nfsd_shutdown_threads+0xe0/0xe0 [nfsd]
> > > [ 5554.805631] kthread+0x160/0x190
> > > [ 5554.806354] ? kthread_complete_and_exit+0x20/0x20
> > > [ 5554.807401] ret_from_fork+0x1f/0x30
> > > [ 5554.808206] </TASK>
> > > [ 5554.808699] ==================================================================
> > > [ 5554.810486] Disabling lock debugging due to kernel taint
> > > [ 5554.811772] BUG: kernel NULL pointer dereference, address: 0000000000000008
> > > [ 5554.813236] #PF: supervisor read access in kernel mode
> > > [ 5554.814345] #PF: error_code(0x0000) - not-present page
> > > [ 5554.815462] PGD 0 P4D 0
> > > [ 5554.816032] Oops: 0000 [#1] PREEMPT SMP KASAN PTI
> > > [ 5554.817057] CPU: 1 PID: 2590 Comm: nfsd Tainted: G B
> > > 6.0.0-rc1+ #84
> > > [ 5554.818677] Hardware name: VMware, Inc. VMware Virtual
> > > Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
> > > [ 5554.821028] RIP: 0010:kernel_sendpage+0x60/0x220
> > > [ 5554.822138] Code: 24 a0 00 00 00 e8 a0 98 83 ff 49 83 bc 24 a0 00
> > > 00 00 00 0f 84 9f 00 00 00 48 8d 43 08 48 89 c7 48 89 44 24 08 e8 80
> > > 98 83 ff <4c> 8b 63 08 41 f6 c4 01 0f 85 ee 00 00 00 0f 1f 44 00 00 48
> > > 89 df
> > > [ 5554.826047] RSP: 0018:ffff888017ef7c38 EFLAGS: 00010296
> > > [ 5554.827192] RAX: 0000000000000001 RBX: 0000000000000000 RCX: ffffffffa3b173b6
> > > [ 5554.828715] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffffffffa6b16260
> > > [ 5554.830237] RBP: ffff8880057ac380 R08: fffffbfff4d62c4d R09: fffffbfff4d62c4d
> > > [ 5554.831757] R10: ffffffffa6b16267 R11: fffffbfff4d62c4c R12: ffffffffa545e6a0
> > > [ 5554.833341] R13: ffff8880057ac3a0 R14: 0000000000001000 R15: 0000000000000000
> > > [ 5554.834881] FS: 0000000000000000(0000) GS:ffff888057c80000(0000)
> > > knlGS:0000000000000000
> > > [ 5554.836590] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [ 5554.837819] CR2: 0000000000000008 CR3: 000000000677e004 CR4: 00000000001706e0
> > > [ 5554.839374] Call Trace:
> > > [ 5554.839919] <TASK>
> > > [ 5554.840400] svc_tcp_sendmsg+0x206/0x2e0 [sunrpc]
> > > [ 5554.842066] ? svc_tcp_send_kvec.isra.20.constprop.29+0xa0/0xa0 [sunrpc]
> > > [ 5554.844194] ? refcount_dec_not_one+0xa0/0x120
> > > [ 5554.845239] ? refcount_warn_saturate+0x120/0x120
> > > [ 5554.846275] ? __rcu_read_unlock+0x4e/0x250
> > > [ 5554.847199] ? __mutex_lock_slowpath+0x10/0x10
> > > [ 5554.848171] ? tcp_release_cb+0x46/0x200
> > > [ 5554.849039] svc_tcp_sendto+0x14f/0x2e0 [sunrpc]
> > > [ 5554.850667] ? svc_addsock+0x370/0x370 [sunrpc]
> > > [ 5554.852285] ? svc_sock_secure_port+0x27/0x50 [sunrpc]
> > > [ 5554.854420] ? svc_recv+0xab0/0xfa0 [sunrpc]
> > > [ 5554.856187] svc_send+0x9c/0x260 [sunrpc]
> > > [ 5554.857773] nfsd+0x170/0x270 [nfsd]
> > > [ 5554.859009] ? nfsd_shutdown_threads+0xe0/0xe0 [nfsd]
> > > [ 5554.860602] kthread+0x160/0x190
> > > [ 5554.861400] ? kthread_complete_and_exit+0x20/0x20
> > > [ 5554.862452] ret_from_fork+0x1f/0x30
> > > [ 5554.863265] </TASK>
> > > [ 5554.863756] Modules linked in: rdma_ucm ib_uverbs rpcrdma rdma_cm
> > > iw_cm ib_cm ib_core nfsd nfs_acl lockd grace ext4 mbcache jbd2 fuse
> > > xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT
> > > nf_reject_ipv4 nft_compat nf_tables nfnetlink tun bridge stp llc bnep
> > > vmw_vsock_vmci_transport vsock intel_rapl_msr intel_rapl_common
> > > snd_seq_midi snd_seq_midi_event crct10dif_pclmul crc32_pclmul
> > > vmw_balloon ghash_clmulni_intel joydev pcspkr snd_ens1371
> > > snd_ac97_codec ac97_bus snd_seq btusb btrtl btbcm btintel snd_pcm
> > > snd_timer snd_rawmidi snd_seq_device bluetooth rfkill snd ecdh_generic
> > > ecc soundcore vmw_vmci i2c_piix4 auth_rpcgss sunrpc ip_tables xfs
> > > libcrc32c sr_mod cdrom sg crc32c_intel nvme serio_raw nvme_core t10_pi
> > > crc64_rocksoft crc64 vmwgfx drm_ttm_helper ttm ahci drm_kms_helper
> > > ata_generic syscopyarea sysfillrect sysimgblt fb_sys_fops vmxnet3
> > > libahci ata_piix drm libata
> > > [ 5554.880681] CR2: 0000000000000008
> > > [ 5554.881539] ---[ end trace 0000000000000000 ]---
> >
> > --
> > Jeff Layton <[email protected]>
>

2022-09-03 16:43:34

[permalink] [raw]

Subject: Re: Is this nfsd kernel oops known?

> On Sep 2, 2022, at 11:34 PM, Olga Kornievskaia <[email protected]> wrote:
>
> Hi folks,
>
> Ok so I won't be trying Ben's idea but I'm sort of confused is the
> thought that NFS is somehow at fault in incorrectly using the "new"
> code introduced by the new patches. Isn't it possible that the new
> patches are wrong?

Since the bisect result indicates one of Al's commits, that commit
should be at the top of the suspect list. I believe the intent is
that, generally speaking, both the folio and the iov_iter work are
not supposed to require changes to API consumers like NFSD.

> I haven't had time to try and revert the patch(es)
> to see if that makes the oops go away.

A test revert would be a good step to confirm the bisect result
before asking for Al's opinion. Always dot your p's and cross your
q's!

> I won't get around to it until about tuesday with the holidays.

Try it out when you can. Or if someone else has a chance before
then, they can report back here.

> On Fri, Sep 2, 2022 at 8:38 PM Benjamin Coddington <[email protected]> wrote:
>>
>> On 2 Sep 2022, at 17:13, Chuck Lever III wrote:
>>
>>>> On Sep 2, 2022, at 4:58 PM, Benjamin Coddington <[email protected]>
>>>> wrote:
>>>>
>>>> Olga, does this fix it up for you? I'm testing now, but I think it
>>>> might be
>>>> a little harder for me to hit.
>>>>
>>>> Ben
>>>>
>>>> 8<------------------------------------------------
>>>> From 6bea39a887495b1748ff3b179d6e2f3d7e552b61 Mon Sep 17 00:00:00
>>>> 2001
>>>> From: Benjamin Coddington <[email protected]>
>>>> Date: Fri, 2 Sep 2022 16:49:17 -0400
>>>> Subject: [PATCH] SUNRPC: Fix svc_tcp_sendmsg bvec offset calculation
>>>>
>>>> The xdr_buf's bvec member points to an array of struct bio_vec, let's
>>>> fixup the calculation to the start of the bio_vec for non-zero
>>>> page_base.
>>>>
>>>> Fixes: bad4c6eb5eaa ("SUNRPC: Fix NFS READs that start at
>>>> non-page-aligned offsets")
>>>> Signed-off-by: Benjamin Coddington <[email protected]>
>>>> ---
>>>> net/sunrpc/svcsock.c | 2 +-
>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
>>>> index 2fc98fea59b4..ecafc9c4bc5c 100644
>>>> --- a/net/sunrpc/svcsock.c
>>>> +++ b/net/sunrpc/svcsock.c
>>>> @@ -1110,7 +1110,7 @@ static int svc_tcp_sendmsg(struct socket *sock,
>>>> struct xdr_buf *xdr,
>>>> unsigned int offset, len, remaining;
>>>> struct bio_vec *bvec;
>>>>
>>>> - bvec = xdr->bvec + (xdr->page_base >> PAGE_SHIFT);
>>>> + bvec = &xdr->bvec[xdr->page_base >> PAGE_SHIFT];
>>>
>>> Color me skeptical.
>>>
>>> I'm not sure these two expressions are different. This variety
>>> of pointer arithmetic is used throughout the XDR layer:
>>
>> Yeah, you know what - it did crash in the same place with this change.
>>
>> My thinking was that if you have (for example) page_base = 8192, and
>> xdr->bvec of, say 0xffff4500, then what you want is to set the local
>> bvec var
>> to 0xfff4500 + sizeof(struct bio_vec)*2, but the code looks like it
>> would
>> set the local bvec to 0xffff4502, which is not the same thing..
>>
>> There must be a hole in my head, I guess I need to dig out my K&R,
>> sorry
>> for the noise. I will figure it out.
>>
>>> net/sunrpc/xdr.c: pgto = pages + (pgto_base >> PAGE_SHIFT);
>>> net/sunrpc/xdr.c: pgfrom = pages + (pgfrom_base >> PAGE_SHIFT);
>>> net/sunrpc/xdr.c: pgto = pages + (pgto_base >> PAGE_SHIFT);
>>> net/sunrpc/xdr.c: pgfrom = pages + (pgfrom_base >> PAGE_SHIFT);
>>> net/sunrpc/xdr.c: pgto = pages + (pgbase >> PAGE_SHIFT);
>>> net/sunrpc/xdr.c: pgfrom = pages + (pgbase >> PAGE_SHIFT);
>>> net/sunrpc/xdr.c: page = pages + (pgbase >> PAGE_SHIFT);
>>> net/sunrpc/xdr.c: xdr->page_ptr = buf->pages + (new >>
>>> PAGE_SHIFT);
>>> net/sunrpc/xdr.c: ppages = buf->pages + (base >>
>>> PAGE_SHIFT);
>>> net/sunrpc/xprtrdma/rpc_rdma.c: ppages = buf->pages + (buf->page_base
>>>>> PAGE_SHIFT);
>>> net/sunrpc/xprtrdma/rpc_rdma.c: ppages = xdrbuf->pages +
>>> (xdrbuf->page_base >> PAGE_SHIFT);
>>> net/sunrpc/xprtrdma/rpc_rdma.c: ppages = xdr->pages + (xdr->page_base
>>>>> PAGE_SHIFT);
>>> net/sunrpc/xprtrdma/rpc_rdma.c: ppages = xdr->pages + (xdr->page_base
>>>>> PAGE_SHIFT);
>>
>> Hmm.. there's clearly something wrong with me.
>>
>>> Commit bad4c6eb5eaa is from v5.11. Wouldn't this issue have
>>> shown up in earlier kernels? At the very least, the patch
>>> description needs to explain why this computation is not a
>>> problem for kernels 5.11 through 5.19.
>>
>> I totally agree. I figured it was rare to have a non-zero page_base,
>> and
>> maybe a client change is now creating that.
>>
>> Ben
>>

--
Chuck Lever

2022-09-06 18:30:12

[permalink] [raw]

Subject: Re: Is this nfsd kernel oops known?

On 1 Sep 2022, at 21:27, Olga Kornievskaia wrote:

> Thanks Chuck. I first, based on a hunch, narrowed down that it's
> coming from Al Viro's merge commit. Then I git bisected his 32patches
> to the following commit f0f6b614f83dbae99d283b7b12ab5dd2e04df979

No crash for me after reverting f0f6b614f83dbae99d283b7b12ab5dd2e04df979.

Ben

2022-09-06 19:27:22

[permalink] [raw]

Subject: Re: Is this nfsd kernel oops known?

On Tue, Sep 6, 2022 at 2:28 PM Benjamin Coddington <[email protected]> wrote:
>
> On 1 Sep 2022, at 21:27, Olga Kornievskaia wrote:
>
> > Thanks Chuck. I first, based on a hunch, narrowed down that it's
> > coming from Al Viro's merge commit. Then I git bisected his 32patches
> > to the following commit f0f6b614f83dbae99d283b7b12ab5dd2e04df979
>
> No crash for me after reverting f0f6b614f83dbae99d283b7b12ab5dd2e04df979.

I second that. No crash after a revert here.

>
> Ben
>

2022-09-07 05:01:01

[permalink] [raw]

Subject: Re: Is this nfsd kernel oops known?

> On Sep 6, 2022, at 3:12 PM, Olga Kornievskaia <[email protected]> wrote:
>
> On Tue, Sep 6, 2022 at 2:28 PM Benjamin Coddington <[email protected]> wrote:
>>
>> On 1 Sep 2022, at 21:27, Olga Kornievskaia wrote:
>>
>>> Thanks Chuck. I first, based on a hunch, narrowed down that it's
>>> coming from Al Viro's merge commit. Then I git bisected his 32patches
>>> to the following commit f0f6b614f83dbae99d283b7b12ab5dd2e04df979
>>
>> No crash for me after reverting f0f6b614f83dbae99d283b7b12ab5dd2e04df979.
>
> I second that. No crash after a revert here.

I bisected the new xfstests failures to the same commit:

f0f6b614f83dbae99d283b7b12ab5dd2e04df979 is the first bad commit
commit f0f6b614f83dbae99d283b7b12ab5dd2e04df979
Author: Al Viro <[email protected]>
Date: Thu Jun 23 17:21:37 2022 -0400

copy_page_to_iter(): don't split high-order page in case of ITER_PIPE

... just shove it into one pipe_buffer.

Signed-off-by: Al Viro <[email protected]>

lib/iov_iter.c | 21 ++++++---------------
1 file changed, 6 insertions(+), 15 deletions(-)

--
Chuck Lever

2022-09-07 12:59:41

[permalink] [raw]

Subject: Re: Is this nfsd kernel oops known?

On 7 Sep 2022, at 0:58, Chuck Lever III wrote:

>> On Sep 6, 2022, at 3:12 PM, Olga Kornievskaia <[email protected]> wrote:
>>
>> On Tue, Sep 6, 2022 at 2:28 PM Benjamin Coddington
>> <[email protected]> wrote:
>>>
>>> On 1 Sep 2022, at 21:27, Olga Kornievskaia wrote:
>>>
>>>> Thanks Chuck. I first, based on a hunch, narrowed down that it's
>>>> coming from Al Viro's merge commit. Then I git bisected his
>>>> 32patches
>>>> to the following commit f0f6b614f83dbae99d283b7b12ab5dd2e04df979
>>>
>>> No crash for me after reverting
>>> f0f6b614f83dbae99d283b7b12ab5dd2e04df979.
>>
>> I second that. No crash after a revert here.
>
> I bisected the new xfstests failures to the same commit:
>
> f0f6b614f83dbae99d283b7b12ab5dd2e04df979 is the first bad commit
> commit f0f6b614f83dbae99d283b7b12ab5dd2e04df979
> Author: Al Viro <[email protected]>
> Date: Thu Jun 23 17:21:37 2022 -0400
>
> copy_page_to_iter(): don't split high-order page in case of
> ITER_PIPE
>
> ... just shove it into one pipe_buffer.
>
> Signed-off-by: Al Viro <[email protected]>
>
> lib/iov_iter.c | 21 ++++++---------------
> 1 file changed, 6 insertions(+), 15 deletions(-)
>

I've been reliably reproducing this on generic/551 on xfs. In the case
where it crashes, rqstp->rq_res.page_base is positive multiple of
PAGE_SIZE
after getting set in nfsd_splice_actor(), and that with page_len
overruns
the 256 pages we have.

With f0f6b614f83d reverted, rqstp->rq_res.page_base is always zero.

After 47b7fcae419dc and f0f6b614f83d, buf->offset in nfsd_splice_actor
can
be a positive multiple of PAGE_SIZE, whereas before it was always just
the
offset into the page.

Something like this might fix it up:

diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 9f486b788ed0..d62963f36f03 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -849,7 +849,7 @@ nfsd_splice_actor(struct pipe_inode_info *pipe,
struct pipe_buffer *buf,

svc_rqst_replace_page(rqstp, buf->page);
if (rqstp->rq_res.page_len == 0)
- rqstp->rq_res.page_base = buf->offset;
+ rqstp->rq_res.page_base = buf->offset % PAGE_SIZE;
rqstp->rq_res.page_len += sd->len;
return sd->len;
}

.. but we should check with Al about whether this needs to be fixed over
in
copy_page_to_iter_pipe(), since I don't think pipe_buffer offset should
be
greater than PAGE_SIZE.

Al, any thoughts?

Ben

2022-09-10 21:17:08

[permalink] [raw]

Subject: Re: Is this nfsd kernel oops known?

On Wed, Sep 07, 2022 at 08:52:46AM -0400, Benjamin Coddington wrote:
> On 7 Sep 2022, at 0:58, Chuck Lever III wrote:
>
> > > On Sep 6, 2022, at 3:12 PM, Olga Kornievskaia <[email protected]> wrote:
> > >
> > > On Tue, Sep 6, 2022 at 2:28 PM Benjamin Coddington
> > > <[email protected]> wrote:
> > > >
> > > > On 1 Sep 2022, at 21:27, Olga Kornievskaia wrote:
> > > >
> > > > > Thanks Chuck. I first, based on a hunch, narrowed down that it's
> > > > > coming from Al Viro's merge commit. Then I git bisected his
> > > > > 32patches
> > > > > to the following commit f0f6b614f83dbae99d283b7b12ab5dd2e04df979
> > > >
> > > > No crash for me after reverting
> > > > f0f6b614f83dbae99d283b7b12ab5dd2e04df979.
> > >
> > > I second that. No crash after a revert here.
> >
> > I bisected the new xfstests failures to the same commit:
> >
> > f0f6b614f83dbae99d283b7b12ab5dd2e04df979 is the first bad commit
> > commit f0f6b614f83dbae99d283b7b12ab5dd2e04df979
> > Author: Al Viro <[email protected]>
> > Date: Thu Jun 23 17:21:37 2022 -0400
> >
> > copy_page_to_iter(): don't split high-order page in case of
> > ITER_PIPE
> >
> > ... just shove it into one pipe_buffer.
> >
> > Signed-off-by: Al Viro <[email protected]>
> >
> > lib/iov_iter.c | 21 ++++++---------------
> > 1 file changed, 6 insertions(+), 15 deletions(-)
> >
>
> I've been reliably reproducing this on generic/551 on xfs. In the case
> where it crashes, rqstp->rq_res.page_base is positive multiple of PAGE_SIZE
> after getting set in nfsd_splice_actor(), and that with page_len overruns
> the 256 pages we have.
>
> With f0f6b614f83d reverted, rqstp->rq_res.page_base is always zero.
>
> After 47b7fcae419dc and f0f6b614f83d, buf->offset in nfsd_splice_actor can
> be a positive multiple of PAGE_SIZE, whereas before it was always just the
> offset into the page.
>
> Something like this might fix it up:
>
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 9f486b788ed0..d62963f36f03 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -849,7 +849,7 @@ nfsd_splice_actor(struct pipe_inode_info *pipe, struct
> pipe_buffer *buf,
>
> svc_rqst_replace_page(rqstp, buf->page);
> if (rqstp->rq_res.page_len == 0)
> - rqstp->rq_res.page_base = buf->offset;
> + rqstp->rq_res.page_base = buf->offset % PAGE_SIZE;
> rqstp->rq_res.page_len += sd->len;
> return sd->len;
> }
>
> .. but we should check with Al about whether this needs to be fixed over in
> copy_page_to_iter_pipe(), since I don't think pipe_buffer offset should be
> greater than PAGE_SIZE.
>
> Al, any thoughts?

pipe_buffer offsets in general can be greater than PAGE_SIZE. What's more,
buffer *size* can be greater than PAGE_SIZE - it really can contain more
than PAGE_SIZE worth of data. In that case the page is a compound one, of
course.

Regression is the combination of "splice from regular file to pipe hadn't
produced that earlier, now it does" and "nfsd never needed to handle that".

I don't believe that fix is correct. *IF* you can't deal with compound
page in sunrpc, you need a loop going by subpages in nfsd_splice_actor(),
similar to one that used to be in copy_page_to_iter(). Could you try
the following:

nfsd_splice_actor(): handle compound pages

pipe_buffer might refer to a compound page (and contain more than a PAGE_SIZE
worth of data). Theoretically it had been possible since way back, but
nfsd_splice_actor() hadn't run into that until copy_page_to_iter() change.
Fortunately, the only thing that changes for compound pages is that we
need to stuff each relevant subpage in and convert the offset into offset
in the first subpage.

Hopefully-fixes: f0f6b614f83d "copy_page_to_iter(): don't split high-order page in case of ITER_PIPE"
Signed-off-by: Al Viro <[email protected]>
---

diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 9f486b788ed0..b16aed158ba6 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -846,10 +846,14 @@ nfsd_splice_actor(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
struct splice_desc *sd)
{
struct svc_rqst *rqstp = sd->u.data;
-
- svc_rqst_replace_page(rqstp, buf->page);
- if (rqstp->rq_res.page_len == 0)
- rqstp->rq_res.page_base = buf->offset;
+ struct page *page = buf->page; // may be a compound one
+ unsigned offset = buf->offset;
+
+ page += offset / PAGE_SIZE;
+ for (int i = sd->len; i > 0; i -= PAGE_SIZE)
+ svc_rqst_replace_page(rqstp, page++);
+ if (rqstp->rq_res.page_len == 0) // first call
+ rqstp->rq_res.page_base = offset % PAGE_SIZE;
rqstp->rq_res.page_len += sd->len;
return sd->len;
}

2022-09-10 21:30:56

[permalink] [raw]

Subject: Re: Is this nfsd kernel oops known?

> On Sep 10, 2022, at 5:14 PM, Al Viro <[email protected]> wrote:
>
> On Wed, Sep 07, 2022 at 08:52:46AM -0400, Benjamin Coddington wrote:
>> On 7 Sep 2022, at 0:58, Chuck Lever III wrote:
>>
>>>> On Sep 6, 2022, at 3:12 PM, Olga Kornievskaia <[email protected]> wrote:
>>>>
>>>> On Tue, Sep 6, 2022 at 2:28 PM Benjamin Coddington
>>>> <[email protected]> wrote:
>>>>>
>>>>> On 1 Sep 2022, at 21:27, Olga Kornievskaia wrote:
>>>>>
>>>>>> Thanks Chuck. I first, based on a hunch, narrowed down that it's
>>>>>> coming from Al Viro's merge commit. Then I git bisected his
>>>>>> 32patches
>>>>>> to the following commit f0f6b614f83dbae99d283b7b12ab5dd2e04df979
>>>>>
>>>>> No crash for me after reverting
>>>>> f0f6b614f83dbae99d283b7b12ab5dd2e04df979.
>>>>
>>>> I second that. No crash after a revert here.
>>>
>>> I bisected the new xfstests failures to the same commit:
>>>
>>> f0f6b614f83dbae99d283b7b12ab5dd2e04df979 is the first bad commit
>>> commit f0f6b614f83dbae99d283b7b12ab5dd2e04df979
>>> Author: Al Viro <[email protected]>
>>> Date: Thu Jun 23 17:21:37 2022 -0400
>>>
>>> copy_page_to_iter(): don't split high-order page in case of
>>> ITER_PIPE
>>>
>>> ... just shove it into one pipe_buffer.
>>>
>>> Signed-off-by: Al Viro <[email protected]>
>>>
>>> lib/iov_iter.c | 21 ++++++---------------
>>> 1 file changed, 6 insertions(+), 15 deletions(-)
>>>
>>
>> I've been reliably reproducing this on generic/551 on xfs. In the case
>> where it crashes, rqstp->rq_res.page_base is positive multiple of PAGE_SIZE
>> after getting set in nfsd_splice_actor(), and that with page_len overruns
>> the 256 pages we have.
>>
>> With f0f6b614f83d reverted, rqstp->rq_res.page_base is always zero.
>>
>> After 47b7fcae419dc and f0f6b614f83d, buf->offset in nfsd_splice_actor can
>> be a positive multiple of PAGE_SIZE, whereas before it was always just the
>> offset into the page.
>>
>> Something like this might fix it up:
>>
>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
>> index 9f486b788ed0..d62963f36f03 100644
>> --- a/fs/nfsd/vfs.c
>> +++ b/fs/nfsd/vfs.c
>> @@ -849,7 +849,7 @@ nfsd_splice_actor(struct pipe_inode_info *pipe, struct
>> pipe_buffer *buf,
>>
>> svc_rqst_replace_page(rqstp, buf->page);
>> if (rqstp->rq_res.page_len == 0)
>> - rqstp->rq_res.page_base = buf->offset;
>> + rqstp->rq_res.page_base = buf->offset % PAGE_SIZE;
>> rqstp->rq_res.page_len += sd->len;
>> return sd->len;
>> }
>>
>> .. but we should check with Al about whether this needs to be fixed over in
>> copy_page_to_iter_pipe(), since I don't think pipe_buffer offset should be
>> greater than PAGE_SIZE.
>>
>> Al, any thoughts?
>
> pipe_buffer offsets in general can be greater than PAGE_SIZE. What's more,
> buffer *size* can be greater than PAGE_SIZE - it really can contain more
> than PAGE_SIZE worth of data. In that case the page is a compound one, of
> course.
>
> Regression is the combination of "splice from regular file to pipe hadn't
> produced that earlier, now it does" and "nfsd never needed to handle that".
>
> I don't believe that fix is correct. *IF* you can't deal with compound
> page in sunrpc, you need a loop going by subpages in nfsd_splice_actor(),
> similar to one that used to be in copy_page_to_iter(). Could you try
> the following:
>
> nfsd_splice_actor(): handle compound pages
>
> pipe_buffer might refer to a compound page (and contain more than a PAGE_SIZE
> worth of data). Theoretically it had been possible since way back, but
> nfsd_splice_actor() hadn't run into that until copy_page_to_iter() change.

It's also possible that recent simplifications I've done to the splice
read actor accidentally removed the ability to deal with compound pages.
You might want to review the commit history of nfsd_splice_actor() to
see if there is an historic version that would behave correctly with
the new copy_page_to_iter().

Is the need to deal with CompoundPage documented somewhere? If not,
perhaps nfsd_splice_actor() could mention it so that overzealous
maintainers don't break it again.

> Fortunately, the only thing that changes for compound pages is that we
> need to stuff each relevant subpage in and convert the offset into offset
> in the first subpage.
>
> Hopefully-fixes: f0f6b614f83d "copy_page_to_iter(): don't split high-order page in case of ITER_PIPE"
> Signed-off-by: Al Viro <[email protected]>
> ---
>
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 9f486b788ed0..b16aed158ba6 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -846,10 +846,14 @@ nfsd_splice_actor(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
> struct splice_desc *sd)
> {
> struct svc_rqst *rqstp = sd->u.data;
> -
> - svc_rqst_replace_page(rqstp, buf->page);
> - if (rqstp->rq_res.page_len == 0)
> - rqstp->rq_res.page_base = buf->offset;
> + struct page *page = buf->page; // may be a compound one
> + unsigned offset = buf->offset;
> +
> + page += offset / PAGE_SIZE;
> + for (int i = sd->len; i > 0; i -= PAGE_SIZE)
> + svc_rqst_replace_page(rqstp, page++);
> + if (rqstp->rq_res.page_len == 0) // first call
> + rqstp->rq_res.page_base = offset % PAGE_SIZE;
> rqstp->rq_res.page_len += sd->len;
> return sd->len;
> }

--
Chuck Lever

2022-09-10 22:30:31

[permalink] [raw]

Subject: Re: Is this nfsd kernel oops known?

On Sat, Sep 10, 2022 at 09:21:11PM +0000, Chuck Lever III wrote:

> It's also possible that recent simplifications I've done to the splice
> read actor accidentally removed the ability to deal with compound pages.
> You might want to review the commit history of nfsd_splice_actor() to
> see if there is an historic version that would behave correctly with
> the new copy_page_to_iter().

Nah, that's unrelated...

> Is the need to deal with CompoundPage documented somewhere? If not,
> perhaps nfsd_splice_actor() could mention it so that overzealous
> maintainers don't break it again.

> > + struct page *page = buf->page; // may be a compound one

Does that qualify? ;-)

FWIW, there's a separate problem in that thing - it assumes that
pipe_buffer boundaries will end up PAGE_SIZE-aligned. Usually
that's going to be true, but foofs_splice_read() is not required to
maintain that. E.g. it might be working in terms of chunks
used by weird protocol used by foofs, with e.g. 1024-byte payloads
+ 300-odd bytes of framing/checskums/whatnot. In that case it
might do 1024 bytes per pipe_buffer, with non-zero offset in page
in each of them; normal read()/splice()/etc. would work just fine
with that, but for nfsd_splice_actor() results would not be
nice.

AFAICS, sunrpc assumes that we have several pages, offset in the
first one and total size; no provisions exist for e.g. 5Kb of
data scattered in 5 chunks over 5 pages. Correct?

2022-09-10 23:00:57

[permalink] [raw]

Subject: Re: Is this nfsd kernel oops known?

> On Sep 10, 2022, at 6:13 PM, Al Viro <[email protected]> wrote:
>
> On Sat, Sep 10, 2022 at 09:21:11PM +0000, Chuck Lever III wrote:
>
>> It's also possible that recent simplifications I've done to the splice
>> read actor accidentally removed the ability to deal with compound pages.
>> You might want to review the commit history of nfsd_splice_actor() to
>> see if there is an historic version that would behave correctly with
>> the new copy_page_to_iter().
>
> Nah, that's unrelated...
>
>> Is the need to deal with CompoundPage documented somewhere? If not,
>> perhaps nfsd_splice_actor() could mention it so that overzealous
>> maintainers don't break it again.
>
>>> + struct page *page = buf->page; // may be a compound one
>
> Does that qualify? ;-)

Well, no, since you just added it :-) I meant pre-existing
documentation of the API. I take your remark as polite
encouragement to go and look for it, because this is an
area where I need deeper understanding.

> FWIW, there's a separate problem in that thing - it assumes that
> pipe_buffer boundaries will end up PAGE_SIZE-aligned. Usually
> that's going to be true, but foofs_splice_read() is not required to
> maintain that. E.g. it might be working in terms of chunks
> used by weird protocol used by foofs, with e.g. 1024-byte payloads
> + 300-odd bytes of framing/checskums/whatnot. In that case it
> might do 1024 bytes per pipe_buffer, with non-zero offset in page
> in each of them; normal read()/splice()/etc. would work just fine
> with that, but for nfsd_splice_actor() results would not be
> nice.
>
> AFAICS, sunrpc assumes that we have several pages, offset in the
> first one and total size; no provisions exist for e.g. 5Kb of
> data scattered in 5 chunks over 5 pages. Correct?

struct xdr_buf assumes the first page might contain payload that
starts somewhere in the middle of that page; the remaining segments
of the payload are contiguous page-sized chunks, with the last
segment starting on a page boundary but possibly being short of
a full page in length.

It's not a true scatter-gather arrangement. A 5K payload can
only be contained in two pages.

Someday we hope to switch the whole stack to use a bvec or
something more general than an array of pages, where each
segment can represent something that is not page-sized. There
is a struct bvec inside of struct xdr_buf already, but not
everyone uses that yet. RDMA, for example, still uses SGLs
or page vectors... so I've been waiting for some forward
movement in our transport APIs before doing more heavy
lifting.

--
Chuck Lever

2022-09-11 04:41:50

[permalink] [raw]

Subject: Re: Is this nfsd kernel oops known?

On Sat, Sep 10, 2022 at 10:35:52PM +0000, Chuck Lever III wrote:
>
>
> > On Sep 10, 2022, at 6:13 PM, Al Viro <[email protected]> wrote:
> >
> > On Sat, Sep 10, 2022 at 09:21:11PM +0000, Chuck Lever III wrote:
> >
> >> It's also possible that recent simplifications I've done to the splice
> >> read actor accidentally removed the ability to deal with compound pages.
> >> You might want to review the commit history of nfsd_splice_actor() to
> >> see if there is an historic version that would behave correctly with
> >> the new copy_page_to_iter().
> >
> > Nah, that's unrelated...
> >
> >> Is the need to deal with CompoundPage documented somewhere? If not,
> >> perhaps nfsd_splice_actor() could mention it so that overzealous
> >> maintainers don't break it again.
> >
> >>> + struct page *page = buf->page; // may be a compound one
> >
> > Does that qualify? ;-)
>
> Well, no, since you just added it :-) I meant pre-existing
> documentation of the API. I take your remark as polite
> encouragement to go and look for it, because this is an
> area where I need deeper understanding.

Not really - quality of documentation aside, it's a combination of splice
from sockets being capable of stuffing skb fragments into destination pipe
and skb allocations using compound pages. E.g. AF_UNIX sendmsg() on
a large datagram will result in that. socketpair(), then such sendmsg()
on one end, splice() from another and there you go - references to compound
pages ending up in pipe buffers...

nfsd_splice_read() does file-to-pipe splice (into internal pipe) + feeding
the contents of that pipe into nfsd_splice_actor(); the new part here is
that file-to-pipe splice from regular file can end up with the thing that
had always been possible for file-to-pipe splice from sockets...

2022-09-11 11:11:23

[permalink] [raw]

Subject: Re: Is this nfsd kernel oops known?

On 10 Sep 2022, at 17:14, Al Viro wrote:

> similar to one that used to be in copy_page_to_iter(). Could you try
> the following:

Yes, this fixes up generic/551. No crash or corruption. I'll send it
through a full run of xfstests as well.

Ben

>
> nfsd_splice_actor(): handle compound pages
>
> pipe_buffer might refer to a compound page (and contain more than a
> PAGE_SIZE
> worth of data). Theoretically it had been possible since way back,
> but
> nfsd_splice_actor() hadn't run into that until copy_page_to_iter()
> change.
> Fortunately, the only thing that changes for compound pages is that we
> need to stuff each relevant subpage in and convert the offset into
> offset
> in the first subpage.
>
> Hopefully-fixes: f0f6b614f83d "copy_page_to_iter(): don't split
> high-order page in case of ITER_PIPE"
> Signed-off-by: Al Viro <[email protected]>
> ---
>
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 9f486b788ed0..b16aed158ba6 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -846,10 +846,14 @@ nfsd_splice_actor(struct pipe_inode_info *pipe,
> struct pipe_buffer *buf,
> struct splice_desc *sd)
> {
> struct svc_rqst *rqstp = sd->u.data;
> -
> - svc_rqst_replace_page(rqstp, buf->page);
> - if (rqstp->rq_res.page_len == 0)
> - rqstp->rq_res.page_base = buf->offset;
> + struct page *page = buf->page; // may be a compound one
> + unsigned offset = buf->offset;
> +
> + page += offset / PAGE_SIZE;
> + for (int i = sd->len; i > 0; i -= PAGE_SIZE)
> + svc_rqst_replace_page(rqstp, page++);
> + if (rqstp->rq_res.page_len == 0) // first call
> + rqstp->rq_res.page_base = offset % PAGE_SIZE;
> rqstp->rq_res.page_len += sd->len;
> return sd->len;
> }

2022-09-11 18:41:10

[permalink] [raw]

Subject: Re: Is this nfsd kernel oops known?

> On Sep 10, 2022, at 5:14 PM, Al Viro <[email protected]> wrote:
>
> On Wed, Sep 07, 2022 at 08:52:46AM -0400, Benjamin Coddington wrote:
>> On 7 Sep 2022, at 0:58, Chuck Lever III wrote:
>>
>>>> On Sep 6, 2022, at 3:12 PM, Olga Kornievskaia <[email protected]> wrote:
>>>>
>>>> On Tue, Sep 6, 2022 at 2:28 PM Benjamin Coddington
>>>> <[email protected]> wrote:
>>>>>
>>>>> On 1 Sep 2022, at 21:27, Olga Kornievskaia wrote:
>>>>>
>>>>>> Thanks Chuck. I first, based on a hunch, narrowed down that it's
>>>>>> coming from Al Viro's merge commit. Then I git bisected his
>>>>>> 32patches
>>>>>> to the following commit f0f6b614f83dbae99d283b7b12ab5dd2e04df979
>>>>>
>>>>> No crash for me after reverting
>>>>> f0f6b614f83dbae99d283b7b12ab5dd2e04df979.
>>>>
>>>> I second that. No crash after a revert here.
>>>
>>> I bisected the new xfstests failures to the same commit:
>>>
>>> f0f6b614f83dbae99d283b7b12ab5dd2e04df979 is the first bad commit
>>> commit f0f6b614f83dbae99d283b7b12ab5dd2e04df979
>>> Author: Al Viro <[email protected]>
>>> Date: Thu Jun 23 17:21:37 2022 -0400
>>>
>>> copy_page_to_iter(): don't split high-order page in case of
>>> ITER_PIPE
>>>
>>> ... just shove it into one pipe_buffer.
>>>
>>> Signed-off-by: Al Viro <[email protected]>
>>>
>>> lib/iov_iter.c | 21 ++++++---------------
>>> 1 file changed, 6 insertions(+), 15 deletions(-)
>>>
>>
>> I've been reliably reproducing this on generic/551 on xfs. In the case
>> where it crashes, rqstp->rq_res.page_base is positive multiple of PAGE_SIZE
>> after getting set in nfsd_splice_actor(), and that with page_len overruns
>> the 256 pages we have.
>>
>> With f0f6b614f83d reverted, rqstp->rq_res.page_base is always zero.
>>
>> After 47b7fcae419dc and f0f6b614f83d, buf->offset in nfsd_splice_actor can
>> be a positive multiple of PAGE_SIZE, whereas before it was always just the
>> offset into the page.
>>
>> Something like this might fix it up:
>>
>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
>> index 9f486b788ed0..d62963f36f03 100644
>> --- a/fs/nfsd/vfs.c
>> +++ b/fs/nfsd/vfs.c
>> @@ -849,7 +849,7 @@ nfsd_splice_actor(struct pipe_inode_info *pipe, struct
>> pipe_buffer *buf,
>>
>> svc_rqst_replace_page(rqstp, buf->page);
>> if (rqstp->rq_res.page_len == 0)
>> - rqstp->rq_res.page_base = buf->offset;
>> + rqstp->rq_res.page_base = buf->offset % PAGE_SIZE;
>> rqstp->rq_res.page_len += sd->len;
>> return sd->len;
>> }
>>
>> .. but we should check with Al about whether this needs to be fixed over in
>> copy_page_to_iter_pipe(), since I don't think pipe_buffer offset should be
>> greater than PAGE_SIZE.
>>
>> Al, any thoughts?
>
> pipe_buffer offsets in general can be greater than PAGE_SIZE. What's more,
> buffer *size* can be greater than PAGE_SIZE - it really can contain more
> than PAGE_SIZE worth of data. In that case the page is a compound one, of
> course.
>
> Regression is the combination of "splice from regular file to pipe hadn't
> produced that earlier, now it does" and "nfsd never needed to handle that".
>
> I don't believe that fix is correct. *IF* you can't deal with compound
> page in sunrpc, you need a loop going by subpages in nfsd_splice_actor(),
> similar to one that used to be in copy_page_to_iter(). Could you try
> the following:
>
> nfsd_splice_actor(): handle compound pages
>
> pipe_buffer might refer to a compound page (and contain more than a PAGE_SIZE
> worth of data). Theoretically it had been possible since way back, but
> nfsd_splice_actor() hadn't run into that until copy_page_to_iter() change.
> Fortunately, the only thing that changes for compound pages is that we
> need to stuff each relevant subpage in and convert the offset into offset
> in the first subpage.
>
> Hopefully-fixes: f0f6b614f83d "copy_page_to_iter(): don't split high-order page in case of ITER_PIPE"
> Signed-off-by: Al Viro <[email protected]>
> ---

I'm getting my head around this, just a couple of comments so far.

> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 9f486b788ed0..b16aed158ba6 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -846,10 +846,14 @@ nfsd_splice_actor(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
> struct splice_desc *sd)
> {
> struct svc_rqst *rqstp = sd->u.data;
> -
> - svc_rqst_replace_page(rqstp, buf->page);
> - if (rqstp->rq_res.page_len == 0)
> - rqstp->rq_res.page_base = buf->offset;
> + struct page *page = buf->page; // may be a compound one
> + unsigned offset = buf->offset;
> +
> + page += offset / PAGE_SIZE;

Nit: I see "offset / PAGE_SIZE" is used in the iter code base,
but in the NFS stack, we prefer "offset >> PAGE_SIZE" and
"offset & ~PAGE_MASK" (below).

> + for (int i = sd->len; i > 0; i -= PAGE_SIZE)
> + svc_rqst_replace_page(rqstp, page++);
> + if (rqstp->rq_res.page_len == 0) // first call
> + rqstp->rq_res.page_base = offset % PAGE_SIZE;
> rqstp->rq_res.page_len += sd->len;
> return sd->len;
> }

I could take this through the nfsd for-rc tree, but that's based
on 5.19-rc7 so it doesn't have f0f6b614f83d. I don't think will
break functionality, but I'm wondering if it would be better for
you to take this through your tree to improve bisect-ability.

If you agree and Ben reports a Tested-by:, then here's my

Acked-by: Chuck Lever <[email protected]>

It might be nice one day to have a single mechanism for NFSD
to handle READs. I don't think a pipe is going to work for our
upcoming hole-detection scheme, for example.

--
Chuck Lever

2022-09-11 20:04:04

[permalink] [raw]

Subject: Re: Is this nfsd kernel oops known?

On Sun, Sep 11, 2022 at 06:36:22PM +0000, Chuck Lever III wrote:

> > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> > index 9f486b788ed0..b16aed158ba6 100644
> > --- a/fs/nfsd/vfs.c
> > +++ b/fs/nfsd/vfs.c
> > @@ -846,10 +846,14 @@ nfsd_splice_actor(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
> > struct splice_desc *sd)
> > {
> > struct svc_rqst *rqstp = sd->u.data;
> > -
> > - svc_rqst_replace_page(rqstp, buf->page);
> > - if (rqstp->rq_res.page_len == 0)
> > - rqstp->rq_res.page_base = buf->offset;
> > + struct page *page = buf->page; // may be a compound one
> > + unsigned offset = buf->offset;
> > +
> > + page += offset / PAGE_SIZE;
>
> Nit: I see "offset / PAGE_SIZE" is used in the iter code base,
> but in the NFS stack, we prefer "offset >> PAGE_SIZE" and
> "offset & ~PAGE_MASK" (below).

*shrug*

If a C compiler is too dumb to recognize division by a power of two
constant...

Anyway, your codebase, your rules.

>
> > + for (int i = sd->len; i > 0; i -= PAGE_SIZE)
> > + svc_rqst_replace_page(rqstp, page++);
> > + if (rqstp->rq_res.page_len == 0) // first call
> > + rqstp->rq_res.page_base = offset % PAGE_SIZE;
> > rqstp->rq_res.page_len += sd->len;
> > return sd->len;
> > }
>
> I could take this through the nfsd for-rc tree, but that's based
> on 5.19-rc7 so it doesn't have f0f6b614f83d. I don't think will
> break functionality, but I'm wondering if it would be better for
> you to take this through your tree to improve bisect-ability.
>
> If you agree and Ben reports a Tested-by:, then here's my
>
> Acked-by: Chuck Lever <[email protected]>

OK, I'll wait for Tested-by and send it to Linus. Should be safe
for backports - with non-compound pages we are going to have
offset < PAGE_SIZE and sd->len <= PAGE_SIZE, so this is equivalent
to the mainline variant of the function for those...

2022-09-12 19:07:55

[permalink] [raw]

Subject: Re: Is this nfsd kernel oops known?

On 11 Sep 2022, at 15:39, Al Viro wrote:
>>
>>> + for (int i = sd->len; i > 0; i -= PAGE_SIZE)
>>> + svc_rqst_replace_page(rqstp, page++);
>>> + if (rqstp->rq_res.page_len == 0) // first call
>>> + rqstp->rq_res.page_base = offset % PAGE_SIZE;
>>> rqstp->rq_res.page_len += sd->len;
>>> return sd->len;
>>> }
>>
>> I could take this through the nfsd for-rc tree, but that's based
>> on 5.19-rc7 so it doesn't have f0f6b614f83d. I don't think will
>> break functionality, but I'm wondering if it would be better for
>> you to take this through your tree to improve bisect-ability.
>>
>> If you agree and Ben reports a Tested-by:, then here's my
>>
>> Acked-by: Chuck Lever <[email protected]>
>
> OK, I'll wait for Tested-by and send it to Linus.

No new crashes or corruption after several passes through xfstests. I've
also tested the RDMA transport on soft iwarp. Thanks for the fix here.

Tested-by: Benjamin Coddington <[email protected]>

Ben

2022-09-13 03:01:34