2021-10-09 13:36:12

by David Wysochanski

[permalink] [raw]
Subject: Oops in nfs_scan_commit running xfstest generic/005 with NFSv4.2 and hammerspace flexfiles server

Trond,

I wonder if you are aware of this or not.

This week I ran a lot of xfstests with hammerspace and other servers
without any issues and just now seeing this oops (after rebuilding
from a base of your testing branch at 0abb8895b065). I then re-built
with just your testing branch and got the same oops. Same test passes
on 5.14.0-rc4 (vanilla), as well as previous kernels I used at
BakeAthon with the fscache and readahead patches only. It reliably
panics for me so let me know if you want any more info or a
reproduction with tracepoints, etc. FYI, I don't think the server
matters because I can also reproduce with a rhel8 server
(kernel-4.18.0-305.19.1.el8_4) and I can also just run 'generic/005'
directly - previous tests don't matter.

[ 15.767423] nfs4filelayout_init: NFSv4 File Layout Driver Registering...
[ 28.614447] nfs4flexfilelayout_init: NFSv4 Flexfile Layout Driver
Registering...
[ 30.024616] run fstests generic/001 at 2021-10-09 09:01:14
[ 37.188167] run fstests generic/002 at 2021-10-09 09:01:21
[ 38.372767] run fstests generic/003 at 2021-10-09 09:01:23
[ 38.713218] run fstests generic/004 at 2021-10-09 09:01:23
[ 39.065705] run fstests generic/005 at 2021-10-09 09:01:23
[ 39.799076] general protection fault, probably for non-canonical
address 0xffe826e8e8e7897c: 0000 [#1] SMP PTI
[ 39.805058] CPU: 0 PID: 6213 Comm: rm Kdump: loaded Not tainted
5.15.0-rc4-trond-testing+ #76
[ 39.808300] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
BIOS 1.14.0-4.fc34 04/01/2014
[ 39.810819] RIP: 0010:__mutex_lock.constprop.0+0x97/0x3e0
[ 39.812438] Code: 65 d8 5b 41 5c 41 5d 41 5e 41 5f 5d c3 65 48 8b
04 25 c0 7b 01 00 48 8b 00 a8 08 75 1d 49 8b 06 48 83 e0 f8 0f 84 79
02 00 00 <8b> 50 34 85 d2 0f 85 5c 02 00 00 e8 59 8f 55 ff 65 48 8b 04
25 c0
[ 39.817418] RSP: 0018:ffffbb8e4558bcd0 EFLAGS: 00010286
[ 39.818546] RAX: ffe826e8e8e78948 RBX: ffffbb8e4558bd70 RCX: ffff932e89300000
[ 39.820087] RDX: ffff932e89300000 RSI: ffe826e8e8e78948 RDI: ffff932eae01edf0
[ 39.821583] RBP: ffffbb8e4558bd28 R08: 0000000000000001 R09: ffffbb8e4558bca0
[ 39.823091] R10: 000000000000001d R11: ffffffffffffcfcf R12: 0000000000000000
[ 39.824796] R13: 0000000000000000 R14: ffff932eae01edf0 R15: 0000000000000000
[ 39.826318] FS: 00007fac7df09740(0000) GS:ffff932ff7c00000(0000)
knlGS:0000000000000000
[ 39.828037] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 39.829275] CR2: 0000555f9b7e2018 CR3: 000000011dc96005 CR4: 0000000000770ef0
[ 39.830787] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 39.832275] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 39.833756] PKRU: 55555554
[ 39.834377] Call Trace:
[ 39.836219] ? security_inode_permission+0x30/0x50
[ 39.837365] nfs_scan_commit+0x36/0xa0 [nfs]
[ 39.838367] __nfs_commit_inode+0xf8/0x160 [nfs]
[ 39.839417] nfs_wb_all+0xa6/0xf0 [nfs]
[ 39.840309] nfs4_inode_return_delegation+0x58/0x80 [nfsv4]
[ 39.841554] nfs4_proc_remove+0xd1/0xe0 [nfsv4]
[ 39.842589] nfs_unlink+0xec/0x2d0 [nfs]
[ 39.843461] vfs_unlink+0x113/0x230
[ 39.844245] do_unlinkat+0x170/0x280
[ 39.845040] __x64_sys_unlinkat+0x33/0x60
[ 39.845922] do_syscall_64+0x3b/0x90
[ 39.846726] entry_SYSCALL_64_after_hwframe+0x44/0xae


# ./scripts/faddr2line fs/nfs/nfs.ko nfs_scan_commit+0x36
nfs_scan_commit+0x36/0xa0:
nfs_scan_commit at /mnt/build/kernel/fs/nfs/write.c:1078
(inlined by) nfs_scan_commit at /mnt/build/kernel/fs/nfs/write.c:1070


1060 /*
1061 * nfs_scan_commit - Scan an inode for commit requests
1062 * @inode: NFS inode to scan
1063 * @dst: mds destination list
1064 * @cinfo: mds and ds lists of reqs ready to commit
1065 *
1066 * Moves requests from the inode's 'commit' request list.
1067 * The requests are *not* checked to ensure that they form a
contiguous set.
1068 */
1069 int
1070 nfs_scan_commit(struct inode *inode, struct list_head *dst,
1071 struct nfs_commit_info *cinfo)
1072 {
1073 int ret = 0;
1074
1075 if (!atomic_long_read(&cinfo->mds->ncommit))
1076 return 0;
1077 mutex_lock(&NFS_I(cinfo->inode)->commit_mutex);
1078 if (atomic_long_read(&cinfo->mds->ncommit) > 0) {
1079 const int max = INT_MAX;
1080
1081 ret = nfs_scan_commit_list(&cinfo->mds->list, dst,
1082 cinfo, max);
1083 ret += pnfs_scan_commit_lists(inode, cinfo, max - ret);
1084 }
1085 mutex_unlock(&NFS_I(cinfo->inode)->commit_mutex);
1086 return ret;
1087 }


2021-10-10 09:05:23

by Trond Myklebust

[permalink] [raw]
Subject: Re: Oops in nfs_scan_commit running xfstest generic/005 with NFSv4.2 and hammerspace flexfiles server

On Sat, 2021-10-09 at 09:34 -0400, David Wysochanski wrote:
> Trond,
>
> I wonder if you are aware of this or not.
>
> This week I ran a lot of xfstests with hammerspace and other servers
> without any issues and just now seeing this oops (after rebuilding
> from a base of your testing branch at 0abb8895b065).  I then re-built
> with just your testing branch and got the same oops.  Same test passes
> on 5.14.0-rc4 (vanilla), as well as previous kernels I used at
> BakeAthon with the fscache and readahead patches only.  It reliably
> panics for me so let me know if you want any more info or a
> reproduction with tracepoints, etc.  FYI, I don't think the server
> matters because I can also reproduce with a rhel8 server
> (kernel-4.18.0-305.19.1.el8_4) and I can also just run 'generic/005'
> directly - previous tests don't matter.
>
> [   15.767423] nfs4filelayout_init: NFSv4 File Layout Driver
> Registering...
> [   28.614447] nfs4flexfilelayout_init: NFSv4 Flexfile Layout Driver
> Registering...
> [   30.024616] run fstests generic/001 at 2021-10-09 09:01:14
> [   37.188167] run fstests generic/002 at 2021-10-09 09:01:21
> [   38.372767] run fstests generic/003 at 2021-10-09 09:01:23
> [   38.713218] run fstests generic/004 at 2021-10-09 09:01:23
> [   39.065705] run fstests generic/005 at 2021-10-09 09:01:23
> [   39.799076] general protection fault, probably for non-canonical
> address 0xffe826e8e8e7897c: 0000 [#1] SMP PTI
> [   39.805058] CPU: 0 PID: 6213 Comm: rm Kdump: loaded Not tainted
> 5.15.0-rc4-trond-testing+ #76
> [   39.808300] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
> BIOS 1.14.0-4.fc34 04/01/2014
> [   39.810819] RIP: 0010:__mutex_lock.constprop.0+0x97/0x3e0
> [   39.812438] Code: 65 d8 5b 41 5c 41 5d 41 5e 41 5f 5d c3 65 48 8b
> 04 25 c0 7b 01 00 48 8b 00 a8 08 75 1d 49 8b 06 48 83 e0 f8 0f 84 79
> 02 00 00 <8b> 50 34 85 d2 0f 85 5c 02 00 00 e8 59 8f 55 ff 65 48 8b 04
> 25 c0
> [   39.817418] RSP: 0018:ffffbb8e4558bcd0 EFLAGS: 00010286
> [   39.818546] RAX: ffe826e8e8e78948 RBX: ffffbb8e4558bd70 RCX:
> ffff932e89300000
> [   39.820087] RDX: ffff932e89300000 RSI: ffe826e8e8e78948 RDI:
> ffff932eae01edf0
> [   39.821583] RBP: ffffbb8e4558bd28 R08: 0000000000000001 R09:
> ffffbb8e4558bca0
> [   39.823091] R10: 000000000000001d R11: ffffffffffffcfcf R12:
> 0000000000000000
> [   39.824796] R13: 0000000000000000 R14: ffff932eae01edf0 R15:
> 0000000000000000
> [   39.826318] FS:  00007fac7df09740(0000) GS:ffff932ff7c00000(0000)
> knlGS:0000000000000000
> [   39.828037] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   39.829275] CR2: 0000555f9b7e2018 CR3: 000000011dc96005 CR4:
> 0000000000770ef0
> [   39.830787] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [   39.832275] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> [   39.833756] PKRU: 55555554
> [   39.834377] Call Trace:
> [   39.836219]  ? security_inode_permission+0x30/0x50
> [   39.837365]  nfs_scan_commit+0x36/0xa0 [nfs]
> [   39.838367]  __nfs_commit_inode+0xf8/0x160 [nfs]
> [   39.839417]  nfs_wb_all+0xa6/0xf0 [nfs]
> [   39.840309]  nfs4_inode_return_delegation+0x58/0x80 [nfsv4]
> [   39.841554]  nfs4_proc_remove+0xd1/0xe0 [nfsv4]
> [   39.842589]  nfs_unlink+0xec/0x2d0 [nfs]
> [   39.843461]  vfs_unlink+0x113/0x230
> [   39.844245]  do_unlinkat+0x170/0x280
> [   39.845040]  __x64_sys_unlinkat+0x33/0x60
> [   39.845922]  do_syscall_64+0x3b/0x90
> [   39.846726]  entry_SYSCALL_64_after_hwframe+0x44/0xae
>

Whoops... I believe the following patch ought to fix it. Thanks for
reporting this!

8<---------------------------------------------------
From 64a082064b7b375263960c4f011bc9875ef50f6a Mon Sep 17 00:00:00 2001
From: Trond Myklebust <[email protected]>
Date: Sun, 10 Oct 2021 10:58:12 +0200
Subject: [PATCH] NFSv4: Fixes for nfs4_inode_return_delegation()

We mustn't call nfs_wb_all() on anything other than a regular file.
Furthermore, we can exit early when we don't hold a delegation.

Reported-by: David Wysochanski <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
---
fs/nfs/delegation.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c
index 11118398f495..7c9eb679dbdb 100644
--- a/fs/nfs/delegation.c
+++ b/fs/nfs/delegation.c
@@ -755,11 +755,13 @@ int nfs4_inode_return_delegation(struct inode *inode)
struct nfs_delegation *delegation;

delegation = nfs_start_delegation_return(nfsi);
- /* Synchronous recall of any application leases */
- break_lease(inode, O_WRONLY | O_RDWR);
- nfs_wb_all(inode);
- if (delegation != NULL)
+ if (delegation != NULL) {
+ /* Synchronous recall of any application leases */
+ break_lease(inode, O_WRONLY | O_RDWR);
+ if (S_ISREG(inode->i_mode))
+ nfs_wb_all(inode);
return nfs_end_delegation_return(inode, delegation, 1);
+ }
return 0;
}

--
2.31.1


--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2021-10-10 14:37:08

by David Wysochanski

[permalink] [raw]
Subject: Re: Oops in nfs_scan_commit running xfstest generic/005 with NFSv4.2 and hammerspace flexfiles server

On Sun, Oct 10, 2021 at 5:04 AM Trond Myklebust <[email protected]> wrote:
>
> On Sat, 2021-10-09 at 09:34 -0400, David Wysochanski wrote:
> > Trond,
> >
> > I wonder if you are aware of this or not.
> >
> > This week I ran a lot of xfstests with hammerspace and other servers
> > without any issues and just now seeing this oops (after rebuilding
> > from a base of your testing branch at 0abb8895b065). I then re-built
> > with just your testing branch and got the same oops. Same test passes
> > on 5.14.0-rc4 (vanilla), as well as previous kernels I used at
> > BakeAthon with the fscache and readahead patches only. It reliably
> > panics for me so let me know if you want any more info or a
> > reproduction with tracepoints, etc. FYI, I don't think the server
> > matters because I can also reproduce with a rhel8 server
> > (kernel-4.18.0-305.19.1.el8_4) and I can also just run 'generic/005'
> > directly - previous tests don't matter.
> >
> > [ 15.767423] nfs4filelayout_init: NFSv4 File Layout Driver
> > Registering...
> > [ 28.614447] nfs4flexfilelayout_init: NFSv4 Flexfile Layout Driver
> > Registering...
> > [ 30.024616] run fstests generic/001 at 2021-10-09 09:01:14
> > [ 37.188167] run fstests generic/002 at 2021-10-09 09:01:21
> > [ 38.372767] run fstests generic/003 at 2021-10-09 09:01:23
> > [ 38.713218] run fstests generic/004 at 2021-10-09 09:01:23
> > [ 39.065705] run fstests generic/005 at 2021-10-09 09:01:23
> > [ 39.799076] general protection fault, probably for non-canonical
> > address 0xffe826e8e8e7897c: 0000 [#1] SMP PTI
> > [ 39.805058] CPU: 0 PID: 6213 Comm: rm Kdump: loaded Not tainted
> > 5.15.0-rc4-trond-testing+ #76
> > [ 39.808300] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
> > BIOS 1.14.0-4.fc34 04/01/2014
> > [ 39.810819] RIP: 0010:__mutex_lock.constprop.0+0x97/0x3e0
> > [ 39.812438] Code: 65 d8 5b 41 5c 41 5d 41 5e 41 5f 5d c3 65 48 8b
> > 04 25 c0 7b 01 00 48 8b 00 a8 08 75 1d 49 8b 06 48 83 e0 f8 0f 84 79
> > 02 00 00 <8b> 50 34 85 d2 0f 85 5c 02 00 00 e8 59 8f 55 ff 65 48 8b 04
> > 25 c0
> > [ 39.817418] RSP: 0018:ffffbb8e4558bcd0 EFLAGS: 00010286
> > [ 39.818546] RAX: ffe826e8e8e78948 RBX: ffffbb8e4558bd70 RCX:
> > ffff932e89300000
> > [ 39.820087] RDX: ffff932e89300000 RSI: ffe826e8e8e78948 RDI:
> > ffff932eae01edf0
> > [ 39.821583] RBP: ffffbb8e4558bd28 R08: 0000000000000001 R09:
> > ffffbb8e4558bca0
> > [ 39.823091] R10: 000000000000001d R11: ffffffffffffcfcf R12:
> > 0000000000000000
> > [ 39.824796] R13: 0000000000000000 R14: ffff932eae01edf0 R15:
> > 0000000000000000
> > [ 39.826318] FS: 00007fac7df09740(0000) GS:ffff932ff7c00000(0000)
> > knlGS:0000000000000000
> > [ 39.828037] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 39.829275] CR2: 0000555f9b7e2018 CR3: 000000011dc96005 CR4:
> > 0000000000770ef0
> > [ 39.830787] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > 0000000000000000
> > [ 39.832275] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> > 0000000000000400
> > [ 39.833756] PKRU: 55555554
> > [ 39.834377] Call Trace:
> > [ 39.836219] ? security_inode_permission+0x30/0x50
> > [ 39.837365] nfs_scan_commit+0x36/0xa0 [nfs]
> > [ 39.838367] __nfs_commit_inode+0xf8/0x160 [nfs]
> > [ 39.839417] nfs_wb_all+0xa6/0xf0 [nfs]
> > [ 39.840309] nfs4_inode_return_delegation+0x58/0x80 [nfsv4]
> > [ 39.841554] nfs4_proc_remove+0xd1/0xe0 [nfsv4]
> > [ 39.842589] nfs_unlink+0xec/0x2d0 [nfs]
> > [ 39.843461] vfs_unlink+0x113/0x230
> > [ 39.844245] do_unlinkat+0x170/0x280
> > [ 39.845040] __x64_sys_unlinkat+0x33/0x60
> > [ 39.845922] do_syscall_64+0x3b/0x90
> > [ 39.846726] entry_SYSCALL_64_after_hwframe+0x44/0xae
> >
>
> Whoops... I believe the following patch ought to fix it. Thanks for
> reporting this!
>
> 8<---------------------------------------------------
> From 64a082064b7b375263960c4f011bc9875ef50f6a Mon Sep 17 00:00:00 2001
> From: Trond Myklebust <[email protected]>
> Date: Sun, 10 Oct 2021 10:58:12 +0200
> Subject: [PATCH] NFSv4: Fixes for nfs4_inode_return_delegation()
>
> We mustn't call nfs_wb_all() on anything other than a regular file.
> Furthermore, we can exit early when we don't hold a delegation.
>
> Reported-by: David Wysochanski <[email protected]>
> Signed-off-by: Trond Myklebust <[email protected]>
> ---
> fs/nfs/delegation.c | 10 ++++++----
> 1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c
> index 11118398f495..7c9eb679dbdb 100644
> --- a/fs/nfs/delegation.c
> +++ b/fs/nfs/delegation.c
> @@ -755,11 +755,13 @@ int nfs4_inode_return_delegation(struct inode *inode)
> struct nfs_delegation *delegation;
>
> delegation = nfs_start_delegation_return(nfsi);
> - /* Synchronous recall of any application leases */
> - break_lease(inode, O_WRONLY | O_RDWR);
> - nfs_wb_all(inode);
> - if (delegation != NULL)
> + if (delegation != NULL) {
> + /* Synchronous recall of any application leases */
> + break_lease(inode, O_WRONLY | O_RDWR);
> + if (S_ISREG(inode->i_mode))
> + nfs_wb_all(inode);
> return nfs_end_delegation_return(inode, delegation, 1);
> + }
> return 0;
> }
>
> --
> 2.31.1
>
>

Great, that seems to have fixed it!

Now I get the below WARN_ON pop though indicating
nfs_have_writebacks() is true when inside nfs_clear_inode() and I
think I saw this once before.
I think we need some simple fixup to nfs_have_writebacks() due to the
union-ization in your patch:
commit b712e11b99eadba5b4003dd815adb368835fb5d5
Author: Trond Myklebust <[email protected]>
Date: Tue Sep 28 17:41:41 2021 -0400

NFS: Save some space in the inode

Save some space in the nfs_inode by setting up an anonymous union with
the fields that are peculiar to a specific type of filesystem object.

Signed-off-by: Trond Myklebust <[email protected]>


You may want to fold something like this into the above which fixes
the WARN for me:
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index a5aef2cbe4ee..5a110ecf2d85 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -579,7 +579,9 @@ extern int nfs_access_get_cached(struct inode
*inode, const struct cred *cred, s
static inline int
nfs_have_writebacks(struct inode *inode)
{
- return atomic_long_read(&NFS_I(inode)->nrequests) != 0;
+ if (S_ISREG(inode->i_mode))
+ return atomic_long_read(&NFS_I(inode)->nrequests) != 0;
+ return 0;
}

/*


[ 77.421040] run fstests generic/001 at 2021-10-10 10:01:29
[ 84.454129] run fstests generic/002 at 2021-10-10 10:01:37
[ 85.578188] run fstests generic/003 at 2021-10-10 10:01:38
[ 85.894748] run fstests generic/004 at 2021-10-10 10:01:38
[ 86.242081] run fstests generic/005 at 2021-10-10 10:01:38
[ 87.054151] ------------[ cut here ]------------
[ 87.056407] WARNING: CPU: 7 PID: 6236 at fs/nfs/inode.c:123
nfs_clear_inode+0x3b/0x50 [nfs]
[ 87.060392] Modules linked in: nfs_layout_flexfiles rpcsec_gss_krb5
nfs_layout_nfsv41_files nfsv4 dns_resolver nfsv3 nfs nft_fib_inet
nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4
nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack
nf_defrag_ipv6 nf_defrag_ipv4 ip_set rfkill nf_tables nfnetlink
cachefiles fscache netfs intel_rapl_msr intel_rapl_common
isst_if_common kvm_intel kvm iTCO_wdt intel_pmc_bxt
iTCO_vendor_support joydev virtio_balloon irqbypass i2c_i801 i2c_smbus
lpc_ich nfsd nfs_acl lockd auth_rpcgss grace drm sunrpc fuse zram
ip_tables xfs crct10dif_pclmul crc32_pclmul crc32c_intel
ghash_clmulni_intel serio_raw virtio_console virtio_blk virtio_net
net_failover failover qemu_fw_cfg
[ 87.075262] CPU: 7 PID: 6236 Comm: rm Kdump: loaded Not tainted
5.15.0-rc4-fscache-remove-old-io-nfs-fixes-trond2-readahead2+ #78
[ 87.077057] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
BIOS 1.14.0-4.fc34 04/01/2014
[ 87.078376] RIP: 0010:nfs_clear_inode+0x3b/0x50 [nfs]
[ 87.079212] Code: 85 c0 75 26 48 8b 55 88 48 8d 45 88 48 39 c2 75
28 48 89 ef e8 76 ff ff ff 48 89 ef e8 9e b1 ff ff 48 89 ef 5d e9 f5
93 01 00 <0f> 0b 48 8b 55 88 48 8d 45 88 48 39 c2 74 d8 0f 0b eb d4 66
90 0f
[ 87.082029] RSP: 0018:ffffb7a9c5597e80 EFLAGS: 00010286
[ 87.082865] RAX: ffff8b03b8a75078 RBX: ffffffffc0e11fe0 RCX: 0000000000000000
[ 87.083968] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff9c30f8a75138
[ 87.085097] RBP: ffff9c30f8a75138 R08: fffffffffffffffe R09: 0000000000000000
[ 87.086224] R10: ffff9c30c0b43300 R11: 0000000000000003 R12: ffff9c30f8a75250
[ 87.087332] R13: ffff9c30c1657000 R14: ffff9c30f8a75138 R15: ffff9c30f054b3c0
[ 87.088499] FS: 00007f9dd7864740(0000) GS:ffff9c3237dc0000(0000)
knlGS:0000000000000000
[ 87.089756] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 87.090868] CR2: 000055b8f711f018 CR3: 0000000105a6a005 CR4: 0000000000770ee0
[ 87.091974] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 87.093089] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 87.094207] PKRU: 55555554
[ 87.094661] Call Trace:
[ 87.095080] nfs4_evict_inode+0x57/0x70 [nfsv4]
[ 87.095853] evict+0xd1/0x180
[ 87.096357] do_unlinkat+0x198/0x280
[ 87.096964] __x64_sys_unlinkat+0x33/0x60
[ 87.097615] do_syscall_64+0x3b/0x90
[ 87.098218] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 87.099008] RIP: 0033:0x7f9dd7959e8b
[ 87.099593] Code: 73 01 c3 48 8b 0d ed ff 0c 00 f7 d8 64 89 01 48
83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 07 01 00
00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d bd ff 0c 00 f7 d8 64 89
01 48
[ 87.102601] RSP: 002b:00007fff2dfde878 EFLAGS: 00000246 ORIG_RAX:
0000000000000107
[ 87.103771] RAX: ffffffffffffffda RBX: 000055b8f711b7e0 RCX: 00007f9dd7959e8b
[ 87.104871] RDX: 0000000000000000 RSI: 000055b8f711a380 RDI: 00000000ffffff9c
[ 87.106158] RBP: 000055b8f711a2f0 R08: 0000000000000003 R09: 0000000000000000
[ 87.107266] R10: 0000000000000100 R11: 0000000000000246 R12: 0000000000000000
[ 87.108379] R13: 00007fff2dfde9a0 R14: 0000000000000000 R15: 0000000000000002
[ 87.109477] ---[ end trace a1355ed0b42315fa ]---
[ 87.429954] run fstests generic/006 at 2021-10-10 10:01:39

118 void nfs_clear_inode(struct inode *inode)
119 {
120 /*
121 * The following should never happen...
122 */
123 WARN_ON_ONCE(nfs_have_writebacks(inode));
124 WARN_ON_ONCE(!list_empty(&NFS_I(inode)->open_files));
125 nfs_zap_acl_cache(inode);
126 nfs_access_zap_cache(inode);
127 nfs_fscache_clear_inode(inode);
128 }

2021-10-10 21:33:40

by Trond Myklebust

[permalink] [raw]
Subject: Re: Oops in nfs_scan_commit running xfstest generic/005 with NFSv4.2 and hammerspace flexfiles server

On Sun, 2021-10-10 at 10:35 -0400, David Wysochanski wrote:
> Now I get the below WARN_ON pop though indicating
> nfs_have_writebacks() is true when inside nfs_clear_inode() and I
> think I saw this once before.
> I think we need some simple fixup to nfs_have_writebacks() due to the
> union-ization in your patch:
> commit b712e11b99eadba5b4003dd815adb368835fb5d5
> Author: Trond Myklebust <[email protected]>
> Date:   Tue Sep 28 17:41:41 2021 -0400
>
>     NFS: Save some space in the inode
>
>     Save some space in the nfs_inode by setting up an anonymous union
> with
>     the fields that are peculiar to a specific type of filesystem
> object.
>
>     Signed-off-by: Trond Myklebust <[email protected]>
>
>
> You may want to fold something like this into the above which fixes
> the WARN for me:
> diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
> index a5aef2cbe4ee..5a110ecf2d85 100644
> --- a/include/linux/nfs_fs.h
> +++ b/include/linux/nfs_fs.h
> @@ -579,7 +579,9 @@ extern int nfs_access_get_cached(struct inode
> *inode, const struct cred *cred, s
>  static inline int
>  nfs_have_writebacks(struct inode *inode)
>  {
> -       return atomic_long_read(&NFS_I(inode)->nrequests) != 0;
> +       if (S_ISREG(inode->i_mode))
> +               return atomic_long_read(&NFS_I(inode)->nrequests) !=
> 0;
> +       return 0;
>  }
>


Thanks again for testing and for the bug reports, Dave! Can you please
resend the above patch with a signed-off-by line? I'll be happy to
apply it.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]