Dear Linux folks,
Moving from Linux 5.10.113 to 5.15.69, starting Mozilla Thunderbird or
Mozilla Firefox with the home on NFS, both programs get killed, and
Linux 5.15.69 logs:
```
[ 3827.604396] BUG: unable to handle page fault for address:
000000001d473c07
[ 3827.611297] #PF: supervisor read access in kernel mode
[ 3827.616452] #PF: error_code(0x0000) - not-present page
[ 3827.621604] PGD 0 P4D 0
[ 3827.624152] Oops: 0000 [#1] SMP PTI
[ 3827.627657] CPU: 0 PID: 2378 Comm: firefox Not tainted
5.15.69.mx64.435 #1
[ 3827.634551] Hardware name: Dell Inc. Precision Tower 3620/0MWYPT,
BIOS 2.20.0 12/09/2021
[ 3827.642659] RIP: 0010:nfs_scan_commit_list+0x1e/0x100 [nfs]
[ 3827.648256] Code: 66 66 2e 0f 1f 84 00 00 00 00 00 90 0f 1f 44 00 00
41 57 41 56 41 55 41 54 55 53 48 83 ec 10 4c 8b 2f 48 89 3c 24 89 4c 24
0c <49> 8b 5d 00 4c 39 ef 0f 84 c3 00 00 00 48 89 f5 49 89 d6 4d 89 ef
[ 3827.667057] RSP: 0018:ffffc90002097ce0 EFLAGS: 00010282
[ 3827.672294] RAX: 000000006329dcd6 RBX: ffffc90002097d60 RCX:
000000007fffffff
[ 3827.679440] RDX: ffffc90002097d60 RSI: ffffc90002097d50 RDI:
ffff8881d7618b38
[ 3827.686587] RBP: ffffc90002097d50 R08: 0000000000000001 R09:
0000000000000000
[ 3827.693734] R10: 0000000000000000 R11: 61c8864680b583eb R12:
0000000000000000
[ 3827.700880] R13: 000000001d473c07 R14: 0000000000000001 R15:
0000000000000000
[ 3827.708027] FS: 00007fa6141f2780(0000) GS:ffff88881dc00000(0000)
knlGS:0000000000000000
[ 3827.716131] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3827.721886] CR2: 000000001d473c07 CR3: 000000012dae0006 CR4:
00000000003706f0
[ 3827.729034] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 3827.736180] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[ 3827.743328] Call Trace:
[ 3827.745779] <TASK>
[ 3827.747883] nfs_scan_commit+0x76/0xb0 [nfs]
[ 3827.752167] __nfs_commit_inode+0x108/0x180 [nfs]
[ 3827.756886] nfs_wb_all+0x59/0x110 [nfs]
[ 3827.760822] nfs4_inode_return_delegation+0x58/0x90 [nfsv4]
[ 3827.766413] nfs4_proc_remove+0x101/0x110 [nfsv4]
[ 3827.771130] nfs_unlink+0xf5/0x2d0 [nfs]
[ 3827.775065] vfs_unlink+0x10b/0x280
[ 3827.778563] do_unlinkat+0x19e/0x2c0
[ 3827.782158] __x64_sys_unlink+0x3e/0x60
[ 3827.786002] ? __x64_sys_readlink+0x1b/0x30
[ 3827.790192] do_syscall_64+0x40/0x90
[ 3827.793779] entry_SYSCALL_64_after_hwframe+0x61/0xcb
[ 3827.798847] RIP: 0033:0x7fa6142e2aa7
[ 3827.802435] Code: f0 ff ff 73 01 c3 48 8b 0d be 03 0d 00 f7 d8 64 89
01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 b8 57 00 00 00 0f
05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 91 03 0d 00 f7 d8 64 89 01 48
[ 3827.821264] RSP: 002b:00007fff37879a08 EFLAGS: 00000202 ORIG_RAX:
0000000000000057
[ 3827.828848] RAX: ffffffffffffffda RBX: 0000000080004005 RCX:
00007fa6142e2aa7
[ 3827.835997] RDX: 0000000077120e8d RSI: 00007fa614383520 RDI:
00007fa605425b88
[ 3827.843145] RBP: 00007fa605425b88 R08: 00007fff37879add R09:
0000000000000000
[ 3827.850291] R10: 00007fa614362ae0 R11: 0000000000000202 R12:
0000000077120e8d
[ 3827.857439] R13: 00007fff37879add R14: 00007fa6141f26c8 R15:
0000000000000065
[ 3827.864586] </TASK>
[ 3827.866776] Modules linked in: rpcsec_gss_krb5 nfsv4 nfs 8021q garp
stp mrp llc amdgpu snd_hda_codec_realtek snd_hda_codec_generic
ledtrig_audio i915 iommu_v2 gpu_sched drm_ttm_helper iosf_mbi ttm
drm_kms_helper x86_pkg_temp_thermal kvm_intel drm kvm snd_hda_codec_hdmi
intel_gtt i2c_algo_bit fb_sys_fops syscopyarea sysfillrect snd_hda_intel
input_leds led_class snd_intel_dspcfg sysimgblt e1000e snd_hda_codec
hid_logitech_hidpp snd_hda_core hid_logitech_dj snd_usb_audio
snd_usbmidi_lib snd_hwdep snd_rawmidi snd_pcm snd_timer uvcvideo
videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common snd
wmi_bmof soundcore wmi iTCO_wdt video irqbypass crc32c_intel
iTCO_vendor_support nfsd auth_rpcgss oid_registry nfs_acl lockd grace
sunrpc ip_tables x_tables unix ipv6 autofs4
[ 3827.935422] CR2: 000000001d473c07
[ 3827.938745] ---[ end trace d7dc2bc122fe8836 ]---
```
Please find the Linux messages attached.
Kind regards,
Paul
PS:
```
$ scripts/decodecode < /scratch/tmp/linux-5.15.69.mx64.435--messages.txt
[ 3827.948959] Code: 66 66 2e 0f 1f 84 00 00 00 00 00 90 0f 1f 44 00 00
41 57 41 56 41 55 41 54 55 53 48 83 ec 10 4c 8b 2f 48 89 3c 24 89 4c 24
0c <49> 8b 5d 00 4c 39 ef 0f 84 c3 00 00 00 48 89 f5 49 89 d6 4d 89 ef
All code
========
0: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1)
7: 00 00 00 00
b: 90 nop
c: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
11: 41 57 push %r15
13: 41 56 push %r14
15: 41 55 push %r13
17: 41 54 push %r12
19: 55 push %rbp
1a: 53 push %rbx
1b: 48 83 ec 10 sub $0x10,%rsp
1f: 4c 8b 2f mov (%rdi),%r13
22: 48 89 3c 24 mov %rdi,(%rsp)
26: 89 4c 24 0c mov %ecx,0xc(%rsp)
2a:* 49 8b 5d 00 mov 0x0(%r13),%rbx <-- trapping
instruction
2e: 4c 39 ef cmp %r13,%rdi
31: 0f 84 c3 00 00 00 je 0xfa
37: 48 89 f5 mov %rsi,%rbp
3a: 49 89 d6 mov %rdx,%r14
3d: 4d 89 ef mov %r13,%r15
Code starting with the faulting instruction
===========================================
0: 49 8b 5d 00 mov 0x0(%r13),%rbx
4: 4c 39 ef cmp %r13,%rdi
7: 0f 84 c3 00 00 00 je 0xd0
d: 48 89 f5 mov %rsi,%rbp
10: 49 89 d6 mov %rdx,%r14
13: 4d 89 ef mov %r13,%r15
```
```
$ scripts/decode_stacktrace.sh arch/x86/boot/compressed/vmlinux <
/scratch/tmp/linux-5.15.69.mx64.435--messages.txt
[…]
[ 3827.648256] Code: 66 66 2e 0f 1f 84 00 00 00 00 00 90 0f 1f 44 00 00
41 57 41 56 41 55 41 54 55 53 48 83 ec 10 4c 8b 2f 48 89 3c 24 89 4c 24
0c <49> 8b 5d 00 4c 39 ef 0f 84 c3 00 00 00 48 89 f5 49 89 d6 4d 89 ef
All code
========
0: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1)
7: 00 00 00 00
b: 90 nop
c: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
11: 41 57 push %r15
13: 41 56 push %r14
15: 41 55 push %r13
17: 41 54 push %r12
19: 55 push %rbp
1a: 53 push %rbx
1b: 48 83 ec 10 sub $0x10,%rsp
1f: 4c 8b 2f mov (%rdi),%r13
22: 48 89 3c 24 mov %rdi,(%rsp)
26: 89 4c 24 0c mov %ecx,0xc(%rsp)
2a:* 49 8b 5d 00 mov 0x0(%r13),%rbx <-- trapping
instruction
2e: 4c 39 ef cmp %r13,%rdi
31: 0f 84 c3 00 00 00 je 0xfa
37: 48 89 f5 mov %rsi,%rbp
3a: 49 89 d6 mov %rdx,%r14
3d: 4d 89 ef mov %r13,%r15
Code starting with the faulting instruction
===========================================
0: 49 8b 5d 00 mov 0x0(%r13),%rbx
4: 4c 39 ef cmp %r13,%rdi
7: 0f 84 c3 00 00 00 je 0xd0
d: 48 89 f5 mov %rsi,%rbp
10: 49 89 d6 mov %rdx,%r14
13: 4d 89 ef mov %r13,%r15
[…]
[ 3827.802435] Code: f0 ff ff 73 01 c3 48 8b 0d be 03 0d 00 f7 d8 64 89
01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 b8 57 00 00 00 0f
05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 91 03 0d 00 f7 d8 64 89 01 48
All code
========
0: f0 ff lock (bad)
2: ff 73 01 push 0x1(%rbx)
5: c3 ret
6: 48 8b 0d be 03 0d 00 mov 0xd03be(%rip),%rcx # 0xd03cb
d: f7 d8 neg %eax
f: 64 89 01 mov %eax,%fs:(%rcx)
12: 48 83 c8 ff or $0xffffffffffffffff,%rax
16: c3 ret
17: 66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1)
1e: 00 00 00
21: 66 90 xchg %ax,%ax
23: b8 57 00 00 00 mov $0x57,%eax
28: 0f 05 syscall
2a:* 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax <--
trapping instruction
30: 73 01 jae 0x33
32: c3 ret
33: 48 8b 0d 91 03 0d 00 mov 0xd0391(%rip),%rcx # 0xd03cb
3a: f7 d8 neg %eax
3c: 64 89 01 mov %eax,%fs:(%rcx)
3f: 48 rex.W
Code starting with the faulting instruction
===========================================
0: 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax
6: 73 01 jae 0x9
8: c3 ret
9: 48 8b 0d 91 03 0d 00 mov 0xd0391(%rip),%rcx # 0xd03a1
10: f7 d8 neg %eax
12: 64 89 01 mov %eax,%fs:(%rcx)
15: 48 rex.W
[…]
```
Hi Paul,
On Wed, 2022-09-21 at 13:42 +0200, Paul Menzel wrote:
> Dear Linux folks,
>
>
> Moving from Linux 5.10.113 to 5.15.69, starting Mozilla Thunderbird
> or
> Mozilla Firefox with the home on NFS, both programs get killed, and
> Linux 5.15.69 logs:
>
> ```
> [ 3827.604396] BUG: unable to handle page fault for address:
> 000000001d473c07
> [ 3827.611297] #PF: supervisor read access in kernel mode
> [ 3827.616452] #PF: error_code(0x0000) - not-present page
> [ 3827.621604] PGD 0 P4D 0
> [ 3827.624152] Oops: 0000 [#1] SMP PTI
> [ 3827.627657] CPU: 0 PID: 2378 Comm: firefox Not tainted
> 5.15.69.mx64.435 #1
> [ 3827.634551] Hardware name: Dell Inc. Precision Tower 3620/0MWYPT,
> BIOS 2.20.0 12/09/2021
> [ 3827.642659] RIP: 0010:nfs_scan_commit_list+0x1e/0x100 [nfs]
> [ 3827.648256] Code: 66 66 2e 0f 1f 84 00 00 00 00 00 90 0f 1f 44 00
> 00
> 41 57 41 56 41 55 41 54 55 53 48 83 ec 10 4c 8b 2f 48 89 3c 24 89 4c
> 24
> 0c <49> 8b 5d 00 4c 39 ef 0f 84 c3 00 00 00 48 89 f5 49 89 d6 4d 89
> ef
> [ 3827.667057] RSP: 0018:ffffc90002097ce0 EFLAGS: 00010282
> [ 3827.672294] RAX: 000000006329dcd6 RBX: ffffc90002097d60 RCX:
> 000000007fffffff
> [ 3827.679440] RDX: ffffc90002097d60 RSI: ffffc90002097d50 RDI:
> ffff8881d7618b38
> [ 3827.686587] RBP: ffffc90002097d50 R08: 0000000000000001 R09:
> 0000000000000000
> [ 3827.693734] R10: 0000000000000000 R11: 61c8864680b583eb R12:
> 0000000000000000
> [ 3827.700880] R13: 000000001d473c07 R14: 0000000000000001 R15:
> 0000000000000000
> [ 3827.708027] FS: 00007fa6141f2780(0000) GS:ffff88881dc00000(0000)
> knlGS:0000000000000000
> [ 3827.716131] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 3827.721886] CR2: 000000001d473c07 CR3: 000000012dae0006 CR4:
> 00000000003706f0
> [ 3827.729034] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [ 3827.736180] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> [ 3827.743328] Call Trace:
> [ 3827.745779] <TASK>
> [ 3827.747883] nfs_scan_commit+0x76/0xb0 [nfs]
> [ 3827.752167] __nfs_commit_inode+0x108/0x180 [nfs]
> [ 3827.756886] nfs_wb_all+0x59/0x110 [nfs]
> [ 3827.760822] nfs4_inode_return_delegation+0x58/0x90 [nfsv4]
> [ 3827.766413] nfs4_proc_remove+0x101/0x110 [nfsv4]
> [ 3827.771130] nfs_unlink+0xf5/0x2d0 [nfs]
> [ 3827.775065] vfs_unlink+0x10b/0x280
> [ 3827.778563] do_unlinkat+0x19e/0x2c0
> [ 3827.782158] __x64_sys_unlink+0x3e/0x60
> [ 3827.786002] ? __x64_sys_readlink+0x1b/0x30
> [ 3827.790192] do_syscall_64+0x40/0x90
> [ 3827.793779] entry_SYSCALL_64_after_hwframe+0x61/0xcb
> [ 3827.798847] RIP: 0033:0x7fa6142e2aa7
> [ 3827.802435] Code: f0 ff ff 73 01 c3 48 8b 0d be 03 0d 00 f7 d8 64
> 89
> 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 b8 57 00 00 00
> 0f
> 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 91 03 0d 00 f7 d8 64 89 01
> 48
> [ 3827.821264] RSP: 002b:00007fff37879a08 EFLAGS: 00000202 ORIG_RAX:
> 0000000000000057
> [ 3827.828848] RAX: ffffffffffffffda RBX: 0000000080004005 RCX:
> 00007fa6142e2aa7
> [ 3827.835997] RDX: 0000000077120e8d RSI: 00007fa614383520 RDI:
> 00007fa605425b88
> [ 3827.843145] RBP: 00007fa605425b88 R08: 00007fff37879add R09:
> 0000000000000000
> [ 3827.850291] R10: 00007fa614362ae0 R11: 0000000000000202 R12:
> 0000000077120e8d
> [ 3827.857439] R13: 00007fff37879add R14: 00007fa6141f26c8 R15:
> 0000000000000065
> [ 3827.864586] </TASK>
> [ 3827.866776] Modules linked in: rpcsec_gss_krb5 nfsv4 nfs 8021q
> garp
> stp mrp llc amdgpu snd_hda_codec_realtek snd_hda_codec_generic
> ledtrig_audio i915 iommu_v2 gpu_sched drm_ttm_helper iosf_mbi ttm
> drm_kms_helper x86_pkg_temp_thermal kvm_intel drm kvm
> snd_hda_codec_hdmi
> intel_gtt i2c_algo_bit fb_sys_fops syscopyarea sysfillrect
> snd_hda_intel
> input_leds led_class snd_intel_dspcfg sysimgblt e1000e snd_hda_codec
> hid_logitech_hidpp snd_hda_core hid_logitech_dj snd_usb_audio
> snd_usbmidi_lib snd_hwdep snd_rawmidi snd_pcm snd_timer uvcvideo
> videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common
> snd
> wmi_bmof soundcore wmi iTCO_wdt video irqbypass crc32c_intel
> iTCO_vendor_support nfsd auth_rpcgss oid_registry nfs_acl lockd grace
> sunrpc ip_tables x_tables unix ipv6 autofs4
> [ 3827.935422] CR2: 000000001d473c07
> [ 3827.938745] ---[ end trace d7dc2bc122fe8836 ]---
> ```
>
Does cherry-picking commit 6e176d47160c ("NFSv4: Fixes for
nfs4_inode_return_delegation()") into 5.15.69 from the upstream kernel
tree fix the problem?
8<---------------------------------------------------
From 6e176d47160cec8bcaa28d9aa06926d72d54237c Mon Sep 17 00:00:00 2001
From: Trond Myklebust <[email protected]>
Date: Sun, 10 Oct 2021 10:58:12 +0200
Subject: [PATCH] NFSv4: Fixes for nfs4_inode_return_delegation()
We mustn't call nfs_wb_all() on anything other than a regular file.
Furthermore, we can exit early when we don't hold a delegation.
Reported-by: David Wysochanski <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
---
fs/nfs/delegation.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c
index 11118398f495..7c9eb679dbdb 100644
--- a/fs/nfs/delegation.c
+++ b/fs/nfs/delegation.c
@@ -755,11 +755,13 @@ int nfs4_inode_return_delegation(struct inode *inode)
struct nfs_delegation *delegation;
delegation = nfs_start_delegation_return(nfsi);
- /* Synchronous recall of any application leases */
- break_lease(inode, O_WRONLY | O_RDWR);
- nfs_wb_all(inode);
- if (delegation != NULL)
+ if (delegation != NULL) {
+ /* Synchronous recall of any application leases */
+ break_lease(inode, O_WRONLY | O_RDWR);
+ if (S_ISREG(inode->i_mode))
+ nfs_wb_all(inode);
return nfs_end_delegation_return(inode, delegation, 1);
+ }
return 0;
}
--
2.37.3
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]
Dear Trond,
Thank you for the quick reply.
Am 21.09.22 um 14:44 schrieb Trond Myklebust:
> On Wed, 2022-09-21 at 13:42 +0200, Paul Menzel wrote:
>> Moving from Linux 5.10.113 to 5.15.69, starting Mozilla Thunderbird or
>> Mozilla Firefox with the home on NFS, both programs get killed, and
>> Linux 5.15.69 logs:
>>
>> ```
>> [ 3827.604396] BUG: unable to handle page fault for address: 000000001d473c07
>> [ 3827.611297] #PF: supervisor read access in kernel mode
>> [ 3827.616452] #PF: error_code(0x0000) - not-present page
>> [ 3827.621604] PGD 0 P4D 0
>> [ 3827.624152] Oops: 0000 [#1] SMP PTI
>> [ 3827.627657] CPU: 0 PID: 2378 Comm: firefox Not tainted 5.15.69.mx64.435 #1
>> [ 3827.634551] Hardware name: Dell Inc. Precision Tower 3620/0MWYPT, BIOS 2.20.0 12/09/2021
[…]
>> [ 3827.743328] Call Trace:
>> [ 3827.745779] <TASK>
>> [ 3827.747883] nfs_scan_commit+0x76/0xb0 [nfs]
>> [ 3827.752167] __nfs_commit_inode+0x108/0x180 [nfs]
>> [ 3827.756886] nfs_wb_all+0x59/0x110 [nfs]
>> [ 3827.760822] nfs4_inode_return_delegation+0x58/0x90 [nfsv4]
>> [ 3827.766413] nfs4_proc_remove+0x101/0x110 [nfsv4]
>> [ 3827.771130] nfs_unlink+0xf5/0x2d0 [nfs]
>> [ 3827.775065] vfs_unlink+0x10b/0x280
>> [ 3827.778563] do_unlinkat+0x19e/0x2c0
>> [ 3827.782158] __x64_sys_unlink+0x3e/0x60
>> [ 3827.786002] ? __x64_sys_readlink+0x1b/0x30
>> [ 3827.790192] do_syscall_64+0x40/0x90
>> [ 3827.793779] entry_SYSCALL_64_after_hwframe+0x61/0xcb
[…]
>> ```
>>
>
> Does cherry-picking commit 6e176d47160c ("NFSv4: Fixes for
> nfs4_inode_return_delegation()") into 5.15.69 from the upstream kernel
> tree fix the problem?
>
> 8<---------------------------------------------------
> From 6e176d47160cec8bcaa28d9aa06926d72d54237c Mon Sep 17 00:00:00 2001
> From: Trond Myklebust <[email protected]>
> Date: Sun, 10 Oct 2021 10:58:12 +0200
> Subject: [PATCH] NFSv4: Fixes for nfs4_inode_return_delegation()
[…]
Indeed with that commit, present since v5.16-rc1, we are unable to
reproduce the issue, so it seems to be the fix. It looks like there are
not a lot of 5.15 NFS users out there. ;-)
Kind regards,
Paul
On Thu, 2022-09-22 at 13:42 +0200, Paul Menzel wrote:
> Dear Trond,
>
>
> Thank you for the quick reply.
>
> Am 21.09.22 um 14:44 schrieb Trond Myklebust:
>
> > On Wed, 2022-09-21 at 13:42 +0200, Paul Menzel wrote:
>
> > > Moving from Linux 5.10.113 to 5.15.69, starting Mozilla
> > > Thunderbird or
> > > Mozilla Firefox with the home on NFS, both programs get killed,
> > > and
> > > Linux 5.15.69 logs:
> > >
> > > ```
> > > [ 3827.604396] BUG: unable to handle page fault for address:
> > > 000000001d473c07
> > > [ 3827.611297] #PF: supervisor read access in kernel mode
> > > [ 3827.616452] #PF: error_code(0x0000) - not-present page
> > > [ 3827.621604] PGD 0 P4D 0
> > > [ 3827.624152] Oops: 0000 [#1] SMP PTI
> > > [ 3827.627657] CPU: 0 PID: 2378 Comm: firefox Not tainted
> > > 5.15.69.mx64.435 #1
> > > [ 3827.634551] Hardware name: Dell Inc. Precision Tower
> > > 3620/0MWYPT, BIOS 2.20.0 12/09/2021
>
> […]
>
> > > [ 3827.743328] Call Trace:
> > > [ 3827.745779] <TASK>
> > > [ 3827.747883] nfs_scan_commit+0x76/0xb0 [nfs]
> > > [ 3827.752167] __nfs_commit_inode+0x108/0x180 [nfs]
> > > [ 3827.756886] nfs_wb_all+0x59/0x110 [nfs]
> > > [ 3827.760822] nfs4_inode_return_delegation+0x58/0x90 [nfsv4]
> > > [ 3827.766413] nfs4_proc_remove+0x101/0x110 [nfsv4]
> > > [ 3827.771130] nfs_unlink+0xf5/0x2d0 [nfs]
> > > [ 3827.775065] vfs_unlink+0x10b/0x280
> > > [ 3827.778563] do_unlinkat+0x19e/0x2c0
> > > [ 3827.782158] __x64_sys_unlink+0x3e/0x60
> > > [ 3827.786002] ? __x64_sys_readlink+0x1b/0x30
> > > [ 3827.790192] do_syscall_64+0x40/0x90
> > > [ 3827.793779] entry_SYSCALL_64_after_hwframe+0x61/0xcb
>
> […]
>
> > > ```
> > >
> >
> > Does cherry-picking commit 6e176d47160c ("NFSv4: Fixes for
> > nfs4_inode_return_delegation()") into 5.15.69 from the upstream
> > kernel
> > tree fix the problem?
> >
> > 8<---------------------------------------------------
> > From 6e176d47160cec8bcaa28d9aa06926d72d54237c Mon Sep 17 00:00:00
> > 2001
> > From: Trond Myklebust <[email protected]>
> > Date: Sun, 10 Oct 2021 10:58:12 +0200
> > Subject: [PATCH] NFSv4: Fixes for nfs4_inode_return_delegation()
>
> […]
>
> Indeed with that commit, present since v5.16-rc1, we are unable to
> reproduce the issue, so it seems to be the fix. It looks like there
> are
> not a lot of 5.15 NFS users out there. ;-)
>
I believe this is a dependency that was introduced by the back port of
commit e591b298d7ec ("NFS: Save some space in the inode") into 5.15.68.
So the reason it wasn't seen is because the change is very recent.
FYI Greg and Sasha: please also consider pulling 6e176d47160c ("NFSv4:
Fixes for nfs4_inode_return_delegation()") into that stable series.
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]
[adding Greg and Sasha to the recipients, to ensure they see this; CCing
Kurt as well, to keep him in the loop]
On 22.09.22 15:44, Trond Myklebust wrote:
> On Thu, 2022-09-22 at 13:42 +0200, Paul Menzel wrote:
>> Am 21.09.22 um 14:44 schrieb Trond Myklebust:
>>> On Wed, 2022-09-21 at 13:42 +0200, Paul Menzel wrote:
>>
>>>> Moving from Linux 5.10.113 to 5.15.69, starting Mozilla
>>>> Thunderbird or
>>>> Mozilla Firefox with the home on NFS, both programs get killed,
>>>> and
>>>> Linux 5.15.69 logs:
>>>>
>>>> ```
>>>> [ 3827.604396] BUG: unable to handle page fault for address:
>>>> 000000001d473c07
>>>> [ 3827.611297] #PF: supervisor read access in kernel mode
>>>> [ 3827.616452] #PF: error_code(0x0000) - not-present page
>>>> [ 3827.621604] PGD 0 P4D 0
>>>> [ 3827.624152] Oops: 0000 [#1] SMP PTI
>>>> [ 3827.627657] CPU: 0 PID: 2378 Comm: firefox Not tainted
>>>> 5.15.69.mx64.435 #1
>>>> [ 3827.634551] Hardware name: Dell Inc. Precision Tower
>>>> 3620/0MWYPT, BIOS 2.20.0 12/09/2021
>>
>> […]
>>
>>>> [ 3827.743328] Call Trace:
>>>> [ 3827.745779] <TASK>
>>>> [ 3827.747883] nfs_scan_commit+0x76/0xb0 [nfs]
>>>> [ 3827.752167] __nfs_commit_inode+0x108/0x180 [nfs]
>>>> [ 3827.756886] nfs_wb_all+0x59/0x110 [nfs]
>>>> [ 3827.760822] nfs4_inode_return_delegation+0x58/0x90 [nfsv4]
>>>> [ 3827.766413] nfs4_proc_remove+0x101/0x110 [nfsv4]
>>>> [ 3827.771130] nfs_unlink+0xf5/0x2d0 [nfs]
>>>> [ 3827.775065] vfs_unlink+0x10b/0x280
>>>> [ 3827.778563] do_unlinkat+0x19e/0x2c0
>>>> [ 3827.782158] __x64_sys_unlink+0x3e/0x60
>>>> [ 3827.786002] ? __x64_sys_readlink+0x1b/0x30
>>>> [ 3827.790192] do_syscall_64+0x40/0x90
>>>> [ 3827.793779] entry_SYSCALL_64_after_hwframe+0x61/0xcb
>>
>> […]
>>
>>>> ```
>>>>
>>>
>>> Does cherry-picking commit 6e176d47160c ("NFSv4: Fixes for
>>> nfs4_inode_return_delegation()") into 5.15.69 from the upstream
>>> kernel
>>> tree fix the problem?
>>>
>>> 8<---------------------------------------------------
>>> From 6e176d47160cec8bcaa28d9aa06926d72d54237c Mon Sep 17 00:00:00
>>> 2001
>>> From: Trond Myklebust <[email protected]>
>>> Date: Sun, 10 Oct 2021 10:58:12 +0200
>>> Subject: [PATCH] NFSv4: Fixes for nfs4_inode_return_delegation()
>>
>> […]
>>
>> Indeed with that commit, present since v5.16-rc1, we are unable to
>> reproduce the issue, so it seems to be the fix. It looks like there
>> are
>> not a lot of 5.15 NFS users out there. ;-)
>>
>
> I believe this is a dependency that was introduced by the back port of
> commit e591b298d7ec ("NFS: Save some space in the inode") into 5.15.68.
> So the reason it wasn't seen is because the change is very recent.
Side note: I wonder if that is causing this problem from Kurt as well:
https://lore.kernel.org/all/[email protected]/
> FYI Greg and Sasha: please also consider pulling 6e176d47160c ("NFSv4:
> Fixes for nfs4_inode_return_delegation()") into that stable series.
Greg, I noticed you in the past few days added quite a few patches into
the queue for the next 5.15.y release, but this one was not among them
afaics. So just to be sure: is that still on your todo list or is more
needed to get 6e176d47160c added in time for the next stable -rc?
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
P.S.: As the Linux kernel's regression tracker I deal with a lot of
reports and sometimes miss something important when writing mails like
this. If that's the case here, don't hesitate to tell me in a public
reply, it's in everyone's interest to set the public record straight.
On Mon, Sep 26, 2022 at 08:00:46AM +0200, Thorsten Leemhuis wrote:
> [adding Greg and Sasha to the recipients, to ensure they see this; CCing
> Kurt as well, to keep him in the loop]
>
> On 22.09.22 15:44, Trond Myklebust wrote:
> > On Thu, 2022-09-22 at 13:42 +0200, Paul Menzel wrote:
> >> Am 21.09.22 um 14:44 schrieb Trond Myklebust:
> >>> On Wed, 2022-09-21 at 13:42 +0200, Paul Menzel wrote:
> >>
> >>>> Moving from Linux 5.10.113 to 5.15.69, starting Mozilla
> >>>> Thunderbird or
> >>>> Mozilla Firefox with the home on NFS, both programs get killed,
> >>>> and
> >>>> Linux 5.15.69 logs:
> >>>>
> >>>> ```
> >>>> [ 3827.604396] BUG: unable to handle page fault for address:
> >>>> 000000001d473c07
> >>>> [ 3827.611297] #PF: supervisor read access in kernel mode
> >>>> [ 3827.616452] #PF: error_code(0x0000) - not-present page
> >>>> [ 3827.621604] PGD 0 P4D 0
> >>>> [ 3827.624152] Oops: 0000 [#1] SMP PTI
> >>>> [ 3827.627657] CPU: 0 PID: 2378 Comm: firefox Not tainted
> >>>> 5.15.69.mx64.435 #1
> >>>> [ 3827.634551] Hardware name: Dell Inc. Precision Tower
> >>>> 3620/0MWYPT, BIOS 2.20.0 12/09/2021
> >>
> >> […]
> >>
> >>>> [ 3827.743328] Call Trace:
> >>>> [ 3827.745779] <TASK>
> >>>> [ 3827.747883] nfs_scan_commit+0x76/0xb0 [nfs]
> >>>> [ 3827.752167] __nfs_commit_inode+0x108/0x180 [nfs]
> >>>> [ 3827.756886] nfs_wb_all+0x59/0x110 [nfs]
> >>>> [ 3827.760822] nfs4_inode_return_delegation+0x58/0x90 [nfsv4]
> >>>> [ 3827.766413] nfs4_proc_remove+0x101/0x110 [nfsv4]
> >>>> [ 3827.771130] nfs_unlink+0xf5/0x2d0 [nfs]
> >>>> [ 3827.775065] vfs_unlink+0x10b/0x280
> >>>> [ 3827.778563] do_unlinkat+0x19e/0x2c0
> >>>> [ 3827.782158] __x64_sys_unlink+0x3e/0x60
> >>>> [ 3827.786002] ? __x64_sys_readlink+0x1b/0x30
> >>>> [ 3827.790192] do_syscall_64+0x40/0x90
> >>>> [ 3827.793779] entry_SYSCALL_64_after_hwframe+0x61/0xcb
> >>
> >> […]
> >>
> >>>> ```
> >>>>
> >>>
> >>> Does cherry-picking commit 6e176d47160c ("NFSv4: Fixes for
> >>> nfs4_inode_return_delegation()") into 5.15.69 from the upstream
> >>> kernel
> >>> tree fix the problem?
> >>>
> >>> 8<---------------------------------------------------
> >>> From 6e176d47160cec8bcaa28d9aa06926d72d54237c Mon Sep 17 00:00:00
> >>> 2001
> >>> From: Trond Myklebust <[email protected]>
> >>> Date: Sun, 10 Oct 2021 10:58:12 +0200
> >>> Subject: [PATCH] NFSv4: Fixes for nfs4_inode_return_delegation()
> >>
> >> […]
> >>
> >> Indeed with that commit, present since v5.16-rc1, we are unable to
> >> reproduce the issue, so it seems to be the fix. It looks like there
> >> are
> >> not a lot of 5.15 NFS users out there. ;-)
> >>
> >
> > I believe this is a dependency that was introduced by the back port of
> > commit e591b298d7ec ("NFS: Save some space in the inode") into 5.15.68.
> > So the reason it wasn't seen is because the change is very recent.
>
> Side note: I wonder if that is causing this problem from Kurt as well:
> https://lore.kernel.org/all/[email protected]/
>
> > FYI Greg and Sasha: please also consider pulling 6e176d47160c ("NFSv4:
> > Fixes for nfs4_inode_return_delegation()") into that stable series.
>
> Greg, I noticed you in the past few days added quite a few patches into
> the queue for the next 5.15.y release, but this one was not among them
> afaics. So just to be sure: is that still on your todo list or is more
> needed to get 6e176d47160c added in time for the next stable -rc?
I don't see any request by anyone in the [email protected] history
asking for that commit to be added, so no, it was not in my queue.
I'll go add it now, thanks.
greg k-h
Hi Thorsten,
thanks for collecting this issue and providing relevant context!
On 26/09/2022 08:00, Thorsten Leemhuis wrote:
> [adding Greg and Sasha to the recipients, to ensure they see this; CCing
> Kurt as well, to keep him in the loop]
>
> On 22.09.22 15:44, Trond Myklebust wrote:
>> On Thu, 2022-09-22 at 13:42 +0200, Paul Menzel wrote:
>>> Am 21.09.22 um 14:44 schrieb Trond Myklebust:
>>>> On Wed, 2022-09-21 at 13:42 +0200, Paul Menzel wrote:
>>>>> Moving from Linux 5.10.113 to 5.15.69, starting Mozilla
>>>>> Thunderbird or
>>>>> Mozilla Firefox with the home on NFS, both programs get killed,
>>>>> and
>>>>> Linux 5.15.69 logs:
>>>>>
>>>>> ```
>>>>> [ 3827.604396] BUG: unable to handle page fault for address:
>>>>> 000000001d473c07
>>>>> [ 3827.611297] #PF: supervisor read access in kernel mode
>>>>> [ 3827.616452] #PF: error_code(0x0000) - not-present page
>>>>> [ 3827.621604] PGD 0 P4D 0
>>>>> [ 3827.624152] Oops: 0000 [#1] SMP PTI
>>>>> [ 3827.627657] CPU: 0 PID: 2378 Comm: firefox Not tainted
>>>>> 5.15.69.mx64.435 #1
>>>>> [ 3827.634551] Hardware name: Dell Inc. Precision Tower
>>>>> 3620/0MWYPT, BIOS 2.20.0 12/09/2021
>>> […]
>>>
>>>>> [ 3827.743328] Call Trace:
>>>>> [ 3827.745779] <TASK>
>>>>> [ 3827.747883] nfs_scan_commit+0x76/0xb0 [nfs]
>>>>> [ 3827.752167] __nfs_commit_inode+0x108/0x180 [nfs]
>>>>> [ 3827.756886] nfs_wb_all+0x59/0x110 [nfs]
>>>>> [ 3827.760822] nfs4_inode_return_delegation+0x58/0x90 [nfsv4]
>>>>> [ 3827.766413] nfs4_proc_remove+0x101/0x110 [nfsv4]
>>>>> [ 3827.771130] nfs_unlink+0xf5/0x2d0 [nfs]
>>>>> [ 3827.775065] vfs_unlink+0x10b/0x280
>>>>> [ 3827.778563] do_unlinkat+0x19e/0x2c0
>>>>> [ 3827.782158] __x64_sys_unlink+0x3e/0x60
>>>>> [ 3827.786002] ? __x64_sys_readlink+0x1b/0x30
>>>>> [ 3827.790192] do_syscall_64+0x40/0x90
>>>>> [ 3827.793779] entry_SYSCALL_64_after_hwframe+0x61/0xcb
>>> […]
>>>
>>>>> ```
>>>>>
>>>> Does cherry-picking commit 6e176d47160c ("NFSv4: Fixes for
>>>> nfs4_inode_return_delegation()") into 5.15.69 from the upstream
>>>> kernel
>>>> tree fix the problem?
>>>>
>>>> 8<---------------------------------------------------
>>>> From 6e176d47160cec8bcaa28d9aa06926d72d54237c Mon Sep 17 00:00:00
>>>> 2001
>>>> From: Trond Myklebust <[email protected]>
>>>> Date: Sun, 10 Oct 2021 10:58:12 +0200
>>>> Subject: [PATCH] NFSv4: Fixes for nfs4_inode_return_delegation()
>>> […]
>>>
>>> Indeed with that commit, present since v5.16-rc1, we are unable to
>>> reproduce the issue, so it seems to be the fix. It looks like there
>>> are
>>> not a lot of 5.15 NFS users out there. ;-)
>>>
>> I believe this is a dependency that was introduced by the back port of
>> commit e591b298d7ec ("NFS: Save some space in the inode") into 5.15.68.
>> So the reason it wasn't seen is because the change is very recent.
> Side note: I wonder if that is causing this problem from Kurt as well:
> https://lore.kernel.org/all/[email protected]/
Looks like it:
After confirming that the 5.15.69 kernel worked again fine backing out
those last three NFS commits, I reapplied them and cherry-picked commit
6e176d47160c as suggested. The kernel worked flawlessly thus far, so this
seems to indeed be a requirement for e591b298d7ec not to cause harm.
>> FYI Greg and Sasha: please also consider pulling 6e176d47160c ("NFSv4:
>> Fixes for nfs4_inode_return_delegation()") into that stable series.
> Greg, I noticed you in the past few days added quite a few patches into
> the queue for the next 5.15.y release, but this one was not among them
> afaics. So just to be sure: is that still on your todo list or is more
> needed to get 6e176d47160c added in time for the next stable -rc?
So by all means, Greg, please put this in the stable queue unless the
NFS wizards out there consider it safer to revert e591b298d7ec instead.
Thanks,
--
Kurt Garloff <[email protected]>
Cologne, Germany
On Tue, Sep 27, 2022 at 08:59:31PM +0200, Kurt Garloff wrote:
> Hi Thorsten,
>
> thanks for collecting this issue and providing relevant context!
>
> On 26/09/2022 08:00, Thorsten Leemhuis wrote:
>
> > [adding Greg and Sasha to the recipients, to ensure they see this; CCing
> > Kurt as well, to keep him in the loop]
> >
> > On 22.09.22 15:44, Trond Myklebust wrote:
> > > On Thu, 2022-09-22 at 13:42 +0200, Paul Menzel wrote:
> > > > Am 21.09.22 um 14:44 schrieb Trond Myklebust:
> > > > > On Wed, 2022-09-21 at 13:42 +0200, Paul Menzel wrote:
> > > > > > Moving from Linux 5.10.113 to 5.15.69, starting Mozilla
> > > > > > Thunderbird or
> > > > > > Mozilla Firefox with the home on NFS, both programs get killed,
> > > > > > and
> > > > > > Linux 5.15.69 logs:
> > > > > >
> > > > > > ```
> > > > > > [ 3827.604396] BUG: unable to handle page fault for address:
> > > > > > 000000001d473c07
> > > > > > [ 3827.611297] #PF: supervisor read access in kernel mode
> > > > > > [ 3827.616452] #PF: error_code(0x0000) - not-present page
> > > > > > [ 3827.621604] PGD 0 P4D 0
> > > > > > [ 3827.624152] Oops: 0000 [#1] SMP PTI
> > > > > > [ 3827.627657] CPU: 0 PID: 2378 Comm: firefox Not tainted
> > > > > > 5.15.69.mx64.435 #1
> > > > > > [ 3827.634551] Hardware name: Dell Inc. Precision Tower
> > > > > > 3620/0MWYPT, BIOS 2.20.0 12/09/2021
> > > > […]
> > > >
> > > > > > [ 3827.743328] Call Trace:
> > > > > > [ 3827.745779] <TASK>
> > > > > > [ 3827.747883] nfs_scan_commit+0x76/0xb0 [nfs]
> > > > > > [ 3827.752167] __nfs_commit_inode+0x108/0x180 [nfs]
> > > > > > [ 3827.756886] nfs_wb_all+0x59/0x110 [nfs]
> > > > > > [ 3827.760822] nfs4_inode_return_delegation+0x58/0x90 [nfsv4]
> > > > > > [ 3827.766413] nfs4_proc_remove+0x101/0x110 [nfsv4]
> > > > > > [ 3827.771130] nfs_unlink+0xf5/0x2d0 [nfs]
> > > > > > [ 3827.775065] vfs_unlink+0x10b/0x280
> > > > > > [ 3827.778563] do_unlinkat+0x19e/0x2c0
> > > > > > [ 3827.782158] __x64_sys_unlink+0x3e/0x60
> > > > > > [ 3827.786002] ? __x64_sys_readlink+0x1b/0x30
> > > > > > [ 3827.790192] do_syscall_64+0x40/0x90
> > > > > > [ 3827.793779] entry_SYSCALL_64_after_hwframe+0x61/0xcb
> > > > […]
> > > >
> > > > > > ```
> > > > > >
> > > > > Does cherry-picking commit 6e176d47160c ("NFSv4: Fixes for
> > > > > nfs4_inode_return_delegation()") into 5.15.69 from the upstream
> > > > > kernel
> > > > > tree fix the problem?
> > > > >
> > > > > 8<---------------------------------------------------
> > > > > From 6e176d47160cec8bcaa28d9aa06926d72d54237c Mon Sep 17 00:00:00
> > > > > 2001
> > > > > From: Trond Myklebust <[email protected]>
> > > > > Date: Sun, 10 Oct 2021 10:58:12 +0200
> > > > > Subject: [PATCH] NFSv4: Fixes for nfs4_inode_return_delegation()
> > > > […]
> > > >
> > > > Indeed with that commit, present since v5.16-rc1, we are unable to
> > > > reproduce the issue, so it seems to be the fix. It looks like there
> > > > are
> > > > not a lot of 5.15 NFS users out there. ;-)
> > > >
> > > I believe this is a dependency that was introduced by the back port of
> > > commit e591b298d7ec ("NFS: Save some space in the inode") into 5.15.68.
> > > So the reason it wasn't seen is because the change is very recent.
> > Side note: I wonder if that is causing this problem from Kurt as well:
> > https://lore.kernel.org/all/[email protected]/
>
> Looks like it:
> After confirming that the 5.15.69 kernel worked again fine backing out
> those last three NFS commits, I reapplied them and cherry-picked commit
> 6e176d47160c as suggested. The kernel worked flawlessly thus far, so this
> seems to indeed be a requirement for e591b298d7ec not to cause harm.
>
> > > FYI Greg and Sasha: please also consider pulling 6e176d47160c ("NFSv4:
> > > Fixes for nfs4_inode_return_delegation()") into that stable series.
> > Greg, I noticed you in the past few days added quite a few patches into
> > the queue for the next 5.15.y release, but this one was not among them
> > afaics. So just to be sure: is that still on your todo list or is more
> > needed to get 6e176d47160c added in time for the next stable -rc?
>
> So by all means, Greg, please put this in the stable queue unless the
> NFS wizards out there consider it safer to revert e591b298d7ec instead.
Already queued up for the next 5.15.y release that will happen in a few
hours, thanks for testing.
greg k-h
Hi Greg,
On 28/09/2022 08:51, Greg KH wrote:
> On Tue, Sep 27, 2022 at 08:59:31PM +0200, Kurt Garloff wrote:
>> On 26/09/2022 08:00, Thorsten Leemhuis wrote:
>>
>>> Does cherry-picking commit 6e176d47160c ("NFSv4: Fixes for
>>>>>> nfs4_inode_return_delegation()") into 5.15.69 from the upstream
>>>>>> kernel
>>>>>> tree fix the problem?
>>>>>>
>>>>>> 8<---------------------------------------------------
>>>>>> From 6e176d47160cec8bcaa28d9aa06926d72d54237c Mon Sep 17 00:00:00
>>>>>> 2001
>>>>>> From: Trond Myklebust <[email protected]>
>>>>>> Date: Sun, 10 Oct 2021 10:58:12 +0200
>>>>>> Subject: [PATCH] NFSv4: Fixes for nfs4_inode_return_delegation()
>>>>> […]
>>>>>
>>>>> Indeed with that commit, present since v5.16-rc1, we are unable to
>>>>> reproduce the issue, so it seems to be the fix. It looks like there
>>>>> are
>>>>> not a lot of 5.15 NFS users out there. ;-)
>>>>>
>>>> I believe this is a dependency that was introduced by the back port of
>>>> commit e591b298d7ec ("NFS: Save some space in the inode") into 5.15.68.
>>>> So the reason it wasn't seen is because the change is very recent.
>>> Side note: I wonder if that is causing this problem from Kurt as well:
>>> https://lore.kernel.org/all/[email protected]/
>> Looks like it:
>> After confirming that the 5.15.69 kernel worked again fine backing out
>> those last three NFS commits, I reapplied them and cherry-picked commit
>> 6e176d47160c as suggested. The kernel worked flawlessly thus far, so this
>> seems to indeed be a requirement for e591b298d7ec not to cause harm.
>> [...]
>> So by all means, Greg, please put this in the stable queue unless the
>> NFS wizards out there consider it safer to revert e591b298d7ec instead.
> Already queued up for the next 5.15.y release that will happen in a few
> hours, thanks for testing.
And -- unsurprisingly -- I can confirm that NFS in 5.15.71 does work again,
indeed.
Thanks!
--
Kurt Garloff <[email protected]>
Cologne, Germany