2017-06-28 14:48:03

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH] Stable request to fix a reference leak and list corruption

Hi Greg,

Could we please queue up the following patch as a stable fix for
commit a974deee47? It needs to be applied to v4.10 and older.

Thanks
Trond

Kinglong Mee (1):
NFSv4: fix a reference leak caused WARNING messages

fs/nfs/nfs4proc.c | 2 --
1 file changed, 2 deletions(-)

--
2.9.4



2017-06-28 14:48:05

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH] NFSv4: fix a reference leak caused WARNING messages

From: Kinglong Mee <[email protected]>

commit 366a1569bff3fe14abfdf9285e31e05e091745f5 upstream.

Because nfs4_opendata_access() has close the state when access is denied,
so the state isn't leak.
Rather than revert the commit a974deee47, I'd like clean the strange state close.

[ 1615.094218] ------------[ cut here ]------------
[ 1615.094607] WARNING: CPU: 0 PID: 23702 at lib/list_debug.c:31 __list_add_valid+0x8e/0xa0
[ 1615.094913] list_add double add: new=ffff9d7901d9f608, prev=ffff9d7901d9f608, next=ffff9d7901ee8dd0.
[ 1615.095458] Modules linked in: nfsv4(E) nfs(E) nfsd(E) tun bridge stp llc fuse ip_set nfnetlink vmw_vsock_vmci_transport vsock f2fs snd_seq_midi snd_seq_midi_event fscrypto coretemp ppdev crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_rapl_perf vmw_balloon snd_ens1371 joydev gameport snd_ac97_codec ac97_bus snd_seq snd_pcm snd_rawmidi snd_timer snd_seq_device snd soundcore nfit parport_pc parport acpi_cpufreq tpm_tis tpm_tis_core tpm i2c_piix4 vmw_vmci shpchp auth_rpcgss nfs_acl lockd(E) grace sunrpc(E) xfs libcrc32c vmwgfx drm_kms_helper ttm drm crc32c_intel mptspi e1000 serio_raw scsi_transport_spi mptscsih mptbase ata_generic pata_acpi fjes [last unloaded: nfs]
[ 1615.097663] CPU: 0 PID: 23702 Comm: fstest Tainted: G W E 4.11.0-rc1+ #517
[ 1615.098015] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[ 1615.098807] Call Trace:
[ 1615.099183] dump_stack+0x63/0x86
[ 1615.099578] __warn+0xcb/0xf0
[ 1615.099967] warn_slowpath_fmt+0x5f/0x80
[ 1615.100370] __list_add_valid+0x8e/0xa0
[ 1615.100760] nfs4_put_state_owner+0x75/0xc0 [nfsv4]
[ 1615.101136] __nfs4_close+0x109/0x140 [nfsv4]
[ 1615.101524] nfs4_close_state+0x15/0x20 [nfsv4]
[ 1615.101949] nfs4_close_context+0x21/0x30 [nfsv4]
[ 1615.102691] __put_nfs_open_context+0xb8/0x110 [nfs]
[ 1615.103155] put_nfs_open_context+0x10/0x20 [nfs]
[ 1615.103586] nfs4_file_open+0x13b/0x260 [nfsv4]
[ 1615.103978] do_dentry_open+0x20a/0x2f0
[ 1615.104369] ? nfs4_copy_file_range+0x30/0x30 [nfsv4]
[ 1615.104739] vfs_open+0x4c/0x70
[ 1615.105106] ? may_open+0x5a/0x100
[ 1615.105469] path_openat+0x623/0x1420
[ 1615.105823] do_filp_open+0x91/0x100
[ 1615.106174] ? __alloc_fd+0x3f/0x170
[ 1615.106568] do_sys_open+0x130/0x220
[ 1615.106920] ? __put_cred+0x3d/0x50
[ 1615.107256] SyS_open+0x1e/0x20
[ 1615.107588] entry_SYSCALL_64_fastpath+0x1a/0xa9
[ 1615.107922] RIP: 0033:0x7fab599069b0
[ 1615.108247] RSP: 002b:00007ffcf0600d78 EFLAGS: 00000246 ORIG_RAX: 0000000000000002
[ 1615.108575] RAX: ffffffffffffffda RBX: 00007fab59bcfae0 RCX: 00007fab599069b0
[ 1615.108896] RDX: 0000000000000200 RSI: 0000000000000200 RDI: 00007ffcf060255e
[ 1615.109211] RBP: 0000000000040010 R08: 0000000000000000 R09: 0000000000000016
[ 1615.109515] R10: 00000000000006a1 R11: 0000000000000246 R12: 0000000000041000
[ 1615.109806] R13: 0000000000040010 R14: 0000000000001000 R15: 0000000000002710
[ 1615.110152] ---[ end trace 96ed63b1306bf2f3 ]---

Fixes: a974deee47 ("NFSv4: Fix memory and state leak in...")
Signed-off-by: Kinglong Mee <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>
---
fs/nfs/nfs4proc.c | 2 --
1 file changed, 2 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 1b183686c6d4..c1f5369cd339 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -2258,8 +2258,6 @@ static int nfs4_opendata_access(struct rpc_cred *cred,
if ((mask & ~cache.mask & (MAY_READ | MAY_EXEC)) == 0)
return 0;

- /* even though OPEN succeeded, access is denied. Close the file */
- nfs4_close_state(state, fmode);
return -EACCES;
}

--
2.9.4


2017-07-02 08:53:55

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH] Stable request to fix a reference leak and list corruption

On Wed, Jun 28, 2017 at 10:47:57AM -0400, Trond Myklebust wrote:
> Hi Greg,
>
> Could we please queue up the following patch as a stable fix for
> commit a974deee47? It needs to be applied to v4.10 and older.

Now applied, thanks.

greg k-h

2017-07-05 20:29:24

by Robert Kudyba

[permalink] [raw]
Subject: Re: [PATCH] Stable request to fix a reference leak and list corruption

>> Could we please queue up the following patch as a stable fix for
>> commit a974deee47? It needs to be applied to v4.10 and older.
>
> Now applied, thanks.

Until kernel 4.11.9 is into Fedora's updates, I downgraded our server
acting as the NIS master to 4.10.16-200. But all NIS users time out
with "ypserv: #011-> Error #-3. Could this be a different issue with
rpcbind or nfs-utils? Here are some debug enabled logs for RPC with a
different error "xs_error_report client ffff8d8223caa000, error=113"

Jul 5 16:13:05 dsm kernel: RPC: looking up machine cred for service *
Jul 5 16:13:05 dsm kernel: RPC: set up xprt to 150.108.64.64
(port 2049) via tcp
Jul 5 16:13:05 dsm kernel: RPC: created transport
ffff8d8223cab800 with 65536 slots
Jul 5 16:13:05 dsm kernel: RPC: creating nfs client for erdos
(xprt ffff8d8223cab800)
Jul 5 16:13:05 dsm kernel: RPC: creating GSS authenticator for
client ffff8d8102d37c00
Jul 5 16:13:05 dsm kernel: RPC: Couldn't create auth handle
(flavor 390004)
Jul 5 16:13:05 dsm kernel: RPC: destroying transport ffff8d8223cab800
Jul 5 16:13:05 dsm kernel: RPC: xs_destroy xprt ffff8d8223cab800
Jul 5 16:13:05 dsm kernel: RPC: xs_close xprt ffff8d8223cab800
Jul 5 16:13:05 dsm kernel: RPC: disconnected transport ffff8d8223cab800
Jul 5 16:13:05 dsm kernel: RPC: set up xprt to ourip (port 2049) via tcp
Jul 5 16:13:05 dsm kernel: RPC: created transport
ffff8d8223caa000 with 65536 slots
Jul 5 16:13:05 dsm kernel: RPC: creating nfs client for erdos
(xprt ffff8d8223caa000)
Jul 5 16:13:05 dsm kernel: RPC: creating UNIX authenticator for
client ffff8d8102d37c00
Jul 5 16:13:05 dsm kernel: RPC: new task initialized, procpid 5281
Jul 5 16:13:05 dsm kernel: RPC: allocated task ffff8d822385a900
Jul 5 16:13:05 dsm kernel: RPC: 65012 __rpc_execute flags=0x680
Jul 5 16:13:05 dsm kernel: RPC: 65012 call_start nfs4 proc NULL (sync)
Jul 5 16:13:05 dsm kernel: RPC: 65012 call_reserve (status 0)
Jul 5 16:13:05 dsm kernel: RPC: 65012 reserved req ffff8d8102d36000
xid fea54e04
Jul 5 16:13:05 dsm kernel: RPC: wake_up_first(ffff8d8223caa170
"xprt_sending")
Jul 5 16:13:05 dsm kernel: RPC: 65012 call_reserveresult (status 0)
Jul 5 16:13:05 dsm kernel: RPC: 65012 call_refresh (status 0)
Jul 5 16:13:05 dsm kernel: RPC: 65012 holding NULL cred ffffffffc036e440
Jul 5 16:13:05 dsm kernel: RPC: 65012 refreshing NULL cred ffffffffc036e440
Jul 5 16:13:05 dsm kernel: RPC: 65012 call_refreshresult (status 0)
Jul 5 16:13:05 dsm kernel: RPC: 65012 call_allocate (status 0)
Jul 5 16:13:05 dsm kernel: RPC: 65012 allocated buffer of size 96 at
ffff8d8223caa800
Jul 5 16:13:05 dsm kernel: RPC: 65012 call_bind (status 0)
Jul 5 16:13:05 dsm kernel: RPC: 65012 call_connect xprt
ffff8d8223caa000 is not connected
Jul 5 16:13:05 dsm kernel: RPC: 65012 xprt_connect xprt
ffff8d8223caa000 is not connected
Jul 5 16:13:05 dsm kernel: RPC: 65012 sleep_on(queue "xprt_pending"
time 4557317640)
Jul 5 16:13:05 dsm kernel: RPC: 65012 added to queue ffff8d8223caa218
"xprt_pending"
Jul 5 16:13:05 dsm kernel: RPC: 65012 setting alarm for 60000 ms
Jul 5 16:13:05 dsm kernel: RPC: xs_connect scheduled xprt
ffff8d8223caa000
Jul 5 16:13:05 dsm kernel: RPC: 65012 sync task going to sleep
Jul 5 16:13:05 dsm kernel: RPC: xs_bind 0.0.0.0:699: ok (0)
Jul 5 16:13:05 dsm kernel: RPC: worker connecting xprt
ffff8d8223caa000 via tcp to our ip (port 2049)
Jul 5 16:13:05 dsm kernel: RPC: ffff8d8223caa000 connect status
115 connected 0 sock state 2
Jul 5 16:13:05 dsm kernel: RPC: wake_up_first(ffff8d8223caa170
"xprt_sending")
Jul 5 16:13:05 dsm kernel: RPC: xs_error_report client
ffff8d8223caa000, error=113...
Jul 5 16:13:05 dsm kernel: RPC: 65012 __rpc_wake_up_task (now 4557317640)
Jul 5 16:13:05 dsm kernel: RPC: 65012 disabling timer
Jul 5 16:13:05 dsm kernel: RPC: 65012 removed from queue
ffff8d8223caa218 "xprt_pending"
Jul 5 16:13:05 dsm kernel: RPC: __rpc_wake_up_task done
Jul 5 16:13:05 dsm kernel: RPC: 65012 sync task resuming
Jul 5 16:13:05 dsm kernel: RPC: xs_tcp_state_change client
ffff8d8223caa000...
Jul 5 16:13:05 dsm kernel: RPC: state 7 conn 0 dead 0 zapped 1
sk_shutdown 3
Jul 5 16:13:05 dsm kernel: RPC: 65012 xprt_connect_status: retrying
Jul 5 16:13:05 dsm kernel: RPC: 65012 call_connect_status (status -113)
Jul 5 16:13:05 dsm kernel: RPC: disconnected transport ffff8d8223caa000
Jul 5 16:13:05 dsm kernel: RPC: 65012 return 0, status -113

2017-07-05 20:59:34

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH] Stable request to fix a reference leak and list corruption

T24gV2VkLCAyMDE3LTA3LTA1IGF0IDE2OjI5IC0wNDAwLCBSb2JlcnQgS3VkeWJhIHdyb3RlOg0K
PiA+ID4gQ291bGQgd2UgcGxlYXNlIHF1ZXVlIHVwIHRoZSBmb2xsb3dpbmcgcGF0Y2ggYXMgYSBz
dGFibGUgZml4IGZvcg0KPiA+ID4gY29tbWl0IGE5NzRkZWVlNDc/IEl0IG5lZWRzIHRvIGJlIGFw
cGxpZWQgdG8gdjQuMTAgYW5kIG9sZGVyLg0KPiA+IA0KPiA+IE5vdyBhcHBsaWVkLCB0aGFua3Mu
DQo+IA0KPiBVbnRpbCBrZXJuZWwgNC4xMS45IGlzIGludG8gRmVkb3JhJ3MgdXBkYXRlcywgSSBk
b3duZ3JhZGVkIG91ciBzZXJ2ZXINCj4gYWN0aW5nIGFzIHRoZSBOSVMgbWFzdGVyIHRvIDQuMTAu
MTYtMjAwLiBCdXQgYWxsIE5JUyB1c2VycyB0aW1lIG91dA0KPiB3aXRoICJ5cHNlcnY6ICMwMTEt
PiBFcnJvciAjLTMuIENvdWxkIHRoaXMgYmUgYSBkaWZmZXJlbnQgaXNzdWUgd2l0aA0KPiBycGNi
aW5kIG9yIG5mcy11dGlscz8gSGVyZSBhcmUgc29tZSBkZWJ1ZyBlbmFibGVkIGxvZ3MgZm9yIFJQ
QyB3aXRoIGENCj4gZGlmZmVyZW50IGVycm9yICJ4c19lcnJvcl9yZXBvcnQgY2xpZW50IGZmZmY4
ZDgyMjNjYWEwMDAsIGVycm9yPTExMyINCj4gDQoNCkVycm9yIDExMyBpcyBFSE9TVFVOUkVBQ0gu
IEl0IG1lYW5zIHRoYXQgZWl0aGVyIHlvdXIgc2VydmVyIGlzIGRvd24sIG9yDQp0aGVyZSBpcyBz
b21lIG90aGVyIG5ldHdvcmtpbmcgaXNzdWUgdGhhdCBpcyBwcmV2ZW50aW5nIHRoZSBjbGllbnQg
ZnJvbQ0KY29ubmVjdGluZyB0byBpdCAoZS5nLiBhIGZpcmV3YWxsIHNldHRpbmcsIHJvdXRlIGNv
bmZpZ3VyYXRpb24sIGFycA0KY2FjaGUgcG9sbHV0aW9uPykuIEVpdGhlciB3YXksIGl0IGlzIHVu
cmVsYXRlZCB0byB0aGlzIHBhcnRpY3VsYXINCnBhdGNoLg0KDQotLSANClRyb25kIE15a2xlYnVz
dA0KTGludXggTkZTIGNsaWVudCBtYWludGFpbmVyLCBQcmltYXJ5RGF0YQ0KdHJvbmQubXlrbGVi
dXN0QHByaW1hcnlkYXRhLmNvbQ0K


2017-07-06 14:25:11

by Robert Kudyba

[permalink] [raw]
Subject: Re: [PATCH] Stable request to fix a reference leak and list corruption

>> > > Could we please queue up the following patch as a stable fix for
>> > > commit a974deee47? It needs to be applied to v4.10 and older.
>> >
>> > Now applied, thanks.
>>
>> Until kernel 4.11.9 is into Fedora's updates, I downgraded our server
>> acting as the NIS master to 4.10.16-200. But all NIS users time out
>> with "ypserv: #011-> Error #-3. Could this be a different issue with
>> rpcbind or nfs-utils? Here are some debug enabled logs for RPC with a
>> different error "xs_error_report client ffff8d8223caa000, error=113"
>>
>
> Error 113 is EHOSTUNREACH. It means that either your server is down, or
> there is some other networking issue that is preventing the client from
> connecting to it (e.g. a firewall setting, route configuration, arp
> cache pollution?). Either way, it is unrelated to this particular
> patch.

Ahh indeed it was firewalld. Not sure why this just cropped up,
perhaps some update? Anyways once I ran:

firewall-cmd --permanent --add-service=nfs
firewall-cmd --permanent --add-service=mountd
firewall-cmd --permanent --add-service=rpc-bind
firewall-cmd --reload

We were back in business with NFS sharing. Thanks so much for the
reply & hint as I couldn't find what error 113 meant.