Return-Path: Received: from smtp-o-1.desy.de ([131.169.56.154]:55075 "EHLO smtp-o-1.desy.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755105AbdCTQQo (ORCPT ); Mon, 20 Mar 2017 12:16:44 -0400 Received: from smtp-map-1.desy.de (smtp-map-1.desy.de [131.169.56.66]) by smtp-o-1.desy.de (DESY-O-1) with ESMTP id B603F280381 for ; Mon, 20 Mar 2017 17:16:41 +0100 (CET) Date: Mon, 20 Mar 2017 17:16:39 +0100 (CET) From: "Mkrtchyan, Tigran" To: Linux NFS Mailing list Cc: Steve Dickson Message-ID: <310787874.3038617.1490026599650.JavaMail.zimbra@desy.de> In-Reply-To: <45574919.3034342.1490025160438.JavaMail.zimbra@desy.de> References: <45574919.3034342.1490025160438.JavaMail.zimbra@desy.de> Subject: Re: pNFS: invalid IP:port selection when talks to DS MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Sender: linux-nfs-owner@vger.kernel.org List-ID: re-sending without attachments. The capture failes can be found at: client <-> mds: https://desycloud.desy.de/index.php/s/58JFyfMQmNF99pU client <-> ds: https://desycloud.desy.de/index.php/s/dKf290ikQcifL9K Tigran. ----- Original Message ----- > From: "Mkrtchyan, Tigran" > To: "Linux NFS Mailing list" > Cc: "Steve Dickson" > Sent: Monday, March 20, 2017 4:52:40 PM > Subject: pNFS: invalid IP:port selection when talks to DS > Dear (p)NFS-ors, >=20 > we observe VERY unpleasant situation with pNFS in the production. > Our hosts run multiple DSes on different ports, usually 24001-24009. > With CentOS7 (3.10.0-514.6.2.el7.x86_64) we see that client takes > a wrong port number when talks to data server: >=20 > If client uses different DSes on the same host, then at some point it sta= rts > to send data to the wrong port number: >=20 > Client <=3D> MDS: >=20 >=20 > 1 0.000000000 131.169.251.53 =E2=86=92 131.169.51.35 NFS V4 Call OPEN = DH: > 0x7cbc716b/MIL-68-onebatch-80C-30s-00057.tif.metadata > 2 0.001469799 131.169.51.35 =E2=86=92 131.169.251.53 NFS V4 Reply (Cal= l In 1) OPEN > StateID: 0xec18 > 3 0.001578128 131.169.251.53 =E2=86=92 131.169.51.35 NFS V4 Call SETAT= TR FH: 0x6ccf3dfa > 4 0.002657187 131.169.51.35 =E2=86=92 131.169.251.53 NFS V4 Reply (Cal= l In 3) SETATTR > 5 0.003243819 131.169.251.53 =E2=86=92 131.169.51.35 NFS V4 Call LAYOU= TGET > 6 0.014603386 131.169.51.35 =E2=86=92 131.169.251.53 NFS V4 Reply (Cal= l In 5) LAYOUTGET > 7 0.014899121 131.169.251.53 =E2=86=92 131.169.51.35 NFS V4 Call GETDE= VINFO > 8 0.015014216 131.169.51.35 =E2=86=92 131.169.251.53 NFS V4 Reply (Cal= l In 7) GETDEVINFO > Opcode: GETDEVINFO (47) > Status: NFS4_OK (0) > layout type: LAYOUT4_NFSV4_1_FILES (1) > device index: 0 > r_netid: tcp > length: 3 > contents: tcp > fill bytes: opaque data > r_addr: 131.169.51.50.93.197 > length: 20 > contents: 131.169.51.50.93.197 > r_netid: tcp > length: 3 > contents: tcp > fill bytes: opaque data > r_addr: 131.169.51.50.93.197 > length: 20 > contents: 131.169.51.50.93.197 > notification bitmap: 6 > notification bitmap: 0 > [Main Opcode: GETDEVINFO (47)] >=20 > 9 0.105442455 131.169.251.53 =E2=86=92 131.169.51.35 NFS V4 Call TEST_= STATEID > 10 0.105521354 131.169.51.35 =E2=86=92 131.169.251.53 NFS V4 Reply (Cal= l In 9) > TEST_STATEID >=20 >=20 >=20 > NOTICE, that 131.169.51.50.93.197 corresponds to port 24005. >=20 > client <=3D> DS >=20 > $ tshark -r ds-write.pcap -n -z conv,tcp > 1 0.000000 131.169.251.53 =E2=86=92 131.169.51.50 NFS V4 Call WRITE = StateID: 0xff01 > Offset: 0 Len: 3968 > 2 0.000090 131.169.51.50 =E2=86=92 131.169.251.53 NFS V4 Reply (Call= In 1) WRITE > Status: NFS4ERR_BAD_STATEID > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D > TCP Conversations > Filter: > | <- = | | -> | | Total | Relative | Duration | > | Frames Bytes= | | Frames Bytes | | Frames Bytes | Start | > | | > 131.169.51.50:24006 <-> 131.169.251.53:847 1 42= 40 > 1 168 2 4408 0.000000000 0.0001 > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D >=20 > NOTICE, that it talks to DS on port 24006! >=20 > Is there know fix which is missing in CentOS7? I can't reproduce it with > 4.9 kernel (or it's harder to reproduce). >=20 >=20 > The packages are attached. >=20 > Tigran.