2015-09-11 11:38:50

by Russell King - ARM Linux

[permalink] [raw]
Subject: NFS/TCP/IPv6 acting strangely in 4.2

I have a recent Marvell Armada 388 board here which uses the mvneta
driver. I'm seeing some weird effects with NFS with it acting as a
client. Unfortunately, the board does not have a functional RTC.

The NFS server is the same NFS server that I've used for years with
multiple other clients (it's my main NFS server) and it continues to
support all other clients without any ill effect. The NFS server
kernel is rather old now, being a 3.x kernel.

The NFS client appears to connect using TCP/IPv6 and initially is
accessible. Everything appears to work normally, and then the NFS
server appears to stop responding.

Running tcpdump on the NFS server, and then dumping the captured packets
with tshark (because tcpdump appears not to understand IPv6 SYNs on the
NFS port) shows:

1 0.000000 fe80::250:43ff:fe02:201 -> fe80::214:fdff:fe10:4f86 ICMPv6 Neighbor Solicitation for fe80::214:fdff:fe10:4f86 from 00:50:43:02:02:01
2 0.000252 fe80::214:fdff:fe10:4f86 -> fe80::250:43ff:fe02:201 ICMPv6 Neighbor Advertisement fe80::214:fdff:fe10:4f86 (sol)
3 0.030036 armada388 -> n2100 TCP 843→nfs [SYN] Seq=936803106 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=892366 TSecr=0 WS=128
4 0.030409 n2100 -> armada388 TCP nfs→843 [SYN, ACK] Seq=409465870 Ack=936803107 Win=28560 Len=0 MSS=1440 SACK_PERM=1 TSval=818169117 TSecr=892366 WS=64
5 0.030493 armada388 -> n2100 TCP [TCP Port numbers reused] 843→nfs [SYN] Seq=936803633 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=892366 TSecr=0 WS=128
6 0.030699 n2100 -> armada388 TCP nfs→843 [RST, ACK] Seq=0 Ack=936803634 Win=0 Len=0
7 0.030766 armada388 -> n2100 TCP 843→nfs [RST] Seq=936803107 Win=0 Len=0
8 3.040150 armada388 -> n2100 TCP [TCP Port numbers reused] 843→nfs [SYN] Seq=983834371 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=892667 TSecr=0 WS=128
9 3.040467 n2100 -> armada388 TCP nfs→843 [SYN, ACK] Seq=456498252 Ack=983834372 Win=28560 Len=0 MSS=1440 SACK_PERM=1 TSval=818169418 TSecr=892667 WS=64
10 3.040552 armada388 -> n2100 TCP [TCP Port numbers reused] 843→nfs [SYN] Seq=983834922 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=892667 TSecr=0 WS=128
11 3.040771 n2100 -> armada388 TCP nfs→843 [RST, ACK] Seq=0 Ack=983834923 Win=0 Len=0
12 3.040845 armada388 -> n2100 TCP 843→nfs [RST] Seq=983834372 Win=0 Len=0
13 6.050296 armada388 -> n2100 TCP [TCP Port numbers reused] 843→nfs [SYN] Seq=1030865629 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=892968 TSecr=0 WS=128
14 6.050673 n2100 -> armada388 TCP nfs→843 [SYN, ACK] Seq=503532268 Ack=1030865630 Win=28560 Len=0 MSS=1440 SACK_PERM=1 TSval=818169719 TSecr=892968 WS=64
15 6.050761 armada388 -> n2100 TCP [TCP Port numbers reused] 843→nfs [SYN] Seq=1030866205 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=892968 TSecr=0 WS=128

It seems rather strange that the Armada388 NFS client is trying to
connect _two_ TCP/IPv6 connections from the same address and the same
port but with different sequence numbers (which suggests that packet
numbers 5, 10 and 15 are the problem ones.

However, what happens with the reset packets is most interesting.
Packet 6 looks like it's resetting the duplicate connection caused by
packet 5. Packet 7 looks like it's resetting the initial connection
from packet 4.

It seems fairly reproducable, though I haven't worked out exactly what
triggers it.

Is this a known bug? Any ideas where to start looking?

Thanks.

--
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.


2015-09-11 13:04:55

by Eric Dumazet

[permalink] [raw]
Subject: Re: NFS/TCP/IPv6 acting strangely in 4.2

On Fri, 2015-09-11 at 12:38 +0100, Russell King - ARM Linux wrote:
> I have a recent Marvell Armada 388 board here which uses the mvneta
> driver. I'm seeing some weird effects with NFS with it acting as a
> client. Unfortunately, the board does not have a functional RTC.
>
> The NFS server is the same NFS server that I've used for years with
> multiple other clients (it's my main NFS server) and it continues to
> support all other clients without any ill effect. The NFS server
> kernel is rather old now, being a 3.x kernel.
>
> The NFS client appears to connect using TCP/IPv6 and initially is
> accessible. Everything appears to work normally, and then the NFS
> server appears to stop responding.
>
> Running tcpdump on the NFS server, and then dumping the captured
> packets
> with tshark (because tcpdump appears not to understand IPv6 SYNs on
> the
> NFS port) shows:
>
> 3 0.030036 armada388 -> n2100 TCP 843→nfs [SYN] Seq=936803106
> Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=892366 TSecr=0 WS=128

> 4 0.030409 n2100 -> armada388 TCP nfs→843 [SYN, ACK] Seq=409465870
> Ack=936803107 Win=28560 Len=0 MSS=1440 SACK_PERM=1
> TSval=818169117 TSecr=892366 WS=64

> 5 0.030493 armada388 -> n2100 TCP [TCP Port numbers reused]
> 843→nfs [SYN] Seq=936803633 Win=28800 Len=0 MSS=1440 SACK_PERM=1
> TSval=892366 TSecr=0 WS=128

Yes, this packet looks very wrong. Like two simultaneous connect with
same source port. It is not a retransmit (Seq numbers differ)

> 6 0.030699 n2100 -> armada388 TCP nfs→843 [RST, ACK] Seq=0
> Ack=936803634 Win=0 Len=0
> 7 0.030766 armada388 -> n2100 TCP 843→nfs [RST] Seq=936803107
> Win=0 Len=0

I suspect port reuse in NFS being broken.

Have you tried to apply

commit 099392048cd443349c50310f7fdc96070e40f4e7
Author: Trond Myklebust <[email protected]>
Date: Sat Aug 29 19:11:21 2015 -0700

SUNRPC: Prevent SYN+SYNACK+RST storms

Add a shutdown() call before we release the socket in order to ensure the
reset is sent before we try to reconnect.

Signed-off-by: Trond Myklebust <[email protected]>

diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 214ca9dfb14e..7be90bc1a7c2 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -822,6 +822,8 @@ static void xs_reset_transport(struct sock_xprt *transport)
if (atomic_read(&transport->xprt.swapper))
sk_clear_memalloc(sk);

+ kernel_sock_shutdown(sock, SHUT_RDWR);
+
write_lock_bh(&sk->sk_callback_lock);
transport->inet = NULL;
transport->sock = NULL;



2015-09-11 13:20:38

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: NFS/TCP/IPv6 acting strangely in 4.2

On Fri, Sep 11, 2015 at 06:04:51AM -0700, Eric Dumazet wrote:
> On Fri, 2015-09-11 at 12:38 +0100, Russell King - ARM Linux wrote:
> > I have a recent Marvell Armada 388 board here which uses the mvneta
> > driver. I'm seeing some weird effects with NFS with it acting as a
> > client. Unfortunately, the board does not have a functional RTC.
> >
> > The NFS server is the same NFS server that I've used for years with
> > multiple other clients (it's my main NFS server) and it continues to
> > support all other clients without any ill effect. The NFS server
> > kernel is rather old now, being a 3.x kernel.
> >
> > The NFS client appears to connect using TCP/IPv6 and initially is
> > accessible. Everything appears to work normally, and then the NFS
> > server appears to stop responding.
> >
> > Running tcpdump on the NFS server, and then dumping the captured
> > packets
> > with tshark (because tcpdump appears not to understand IPv6 SYNs on
> > the
> > NFS port) shows:
> >
> > 3 0.030036 armada388 -> n2100 TCP 843→nfs [SYN] Seq=936803106
> > Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=892366 TSecr=0 WS=128
>
> > 4 0.030409 n2100 -> armada388 TCP nfs→843 [SYN, ACK] Seq=409465870
> > Ack=936803107 Win=28560 Len=0 MSS=1440 SACK_PERM=1
> > TSval=818169117 TSecr=892366 WS=64
>
> > 5 0.030493 armada388 -> n2100 TCP [TCP Port numbers reused]
> > 843→nfs [SYN] Seq=936803633 Win=28800 Len=0 MSS=1440 SACK_PERM=1
> > TSval=892366 TSecr=0 WS=128
>
> Yes, this packet looks very wrong. Like two simultaneous connect with
> same source port. It is not a retransmit (Seq numbers differ)
>
> > 6 0.030699 n2100 -> armada388 TCP nfs→843 [RST, ACK] Seq=0
> > Ack=936803634 Win=0 Len=0
> > 7 0.030766 armada388 -> n2100 TCP 843→nfs [RST] Seq=936803107
> > Win=0 Len=0
>
> I suspect port reuse in NFS being broken.
>
> Have you tried to apply

Thanks, I'll give this a go and report back.

I've been trying to debug this unsuccessfully. It's really not helpful
to enable rpc debugging, and then be greeted with this from printk:

[ 2760.156924] RPC: state 4 conn 1 dead 0 zapped 1 sk_shutdown 3
[ 2760.163218] RPC: xs_tcp_state_change client ee240800...
[ 2760.168976] RPC: state 5 conn 0 dead 0 zapped 1 sk_shutdown 3
[ 2760.175273] RPC: xs_tcp_state_change client ee240800...
[ 2760.181032] RPC: state 7 conn 0 dead 0 zapped 1 sk_shutdown 3
[ 2760.187315] RPC: disconnected transport ee240800
[ 2760.192467] RPC: xs_tcp_state_change client ee240800...
[ 2760.198224] RPC: state 7 conn 0 dead 0 zapped 1 sk_shutdown 3
[ 2760.204505] RPC: disconnected transport ee240800
[ 2760.209653] RPC: xs_tcp_data_ready...
[ 2760.215001] RPC: wake_up_first(ee240918 "xprt_sending")
lspci -vvx |less
[ 2805.912330] RPC: looking up Generic cred
[ 2805.916791] RPC: looking up Generic cred
[ 2805.921259] RPC: looking up Generic cred
[ 2805.921784] RPC: looking up Generic cred
[ 2805.921787] RPC: looking up Generic cred
[ 2805.921788] RPC: looking up Generic cred
[ 2805.921792] RPC: new task initialized, procpid 327
[ 2805.921794] RPC: allocated task ee358300
[ 2805.921797] RPC: 1335 __rpc_execute flags=0x80
[ 2805.921800] RPC: 1335 call_start nfs3 proc ACCESS (sync)
[ 2805.921801] RPC: 1335 call_reserve (status 0)
[ 2805.921805] RPC: 1335 reserved req ee8a2900 xid af74741b
** 168 printk messages dropped ** [ 2805.928207] RPC: read reply XID b174741b
** 366 printk messages dropped ** [ 2805.941242] RPC: 1343 setting alarm for 60000 ms
** 310 printk messages dropped ** [ 2805.958274] RPC: xprt = ee240800, tcp_copied = 120, tcp_offset = 120, tcp_reclen = 120
** 89 printk messages dropped ** [ 2805.959613] RPC: 1350 reserved req ee8a2900 xid be74741b
[ 2805.959614] RPC: wake_up_first(ee240918 "xprt_sending")

which is just where we want to see the printk messages. This kind'a
makes printk and the rpc debug stuff rather useless. :(

--
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

2015-09-11 14:34:05

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: NFS/TCP/IPv6 acting strangely in 4.2

On Fri, Sep 11, 2015 at 06:04:51AM -0700, Eric Dumazet wrote:
> On Fri, 2015-09-11 at 12:38 +0100, Russell King - ARM Linux wrote:
> > Running tcpdump on the NFS server, and then dumping the captured
> > packets
> > with tshark (because tcpdump appears not to understand IPv6 SYNs on
> > the
> > NFS port) shows:
> >
> > 3 0.030036 armada388 -> n2100 TCP 843→nfs [SYN] Seq=936803106
> > Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=892366 TSecr=0 WS=128
>
> > 4 0.030409 n2100 -> armada388 TCP nfs→843 [SYN, ACK] Seq=409465870
> > Ack=936803107 Win=28560 Len=0 MSS=1440 SACK_PERM=1
> > TSval=818169117 TSecr=892366 WS=64
>
> > 5 0.030493 armada388 -> n2100 TCP [TCP Port numbers reused]
> > 843→nfs [SYN] Seq=936803633 Win=28800 Len=0 MSS=1440 SACK_PERM=1
> > TSval=892366 TSecr=0 WS=128
>
> Yes, this packet looks very wrong. Like two simultaneous connect with
> same source port. It is not a retransmit (Seq numbers differ)
>
> > 6 0.030699 n2100 -> armada388 TCP nfs→843 [RST, ACK] Seq=0
> > Ack=936803634 Win=0 Len=0
> > 7 0.030766 armada388 -> n2100 TCP 843→nfs [RST] Seq=936803107
> > Win=0 Len=0
>
> I suspect port reuse in NFS being broken.
>
> Have you tried to apply
>
> commit 099392048cd443349c50310f7fdc96070e40f4e7
> Author: Trond Myklebust <[email protected]>
> Date: Sat Aug 29 19:11:21 2015 -0700
>
> SUNRPC: Prevent SYN+SYNACK+RST storms
>
> Add a shutdown() call before we release the socket in order to ensure the
> reset is sent before we try to reconnect.
>
> Signed-off-by: Trond Myklebust <[email protected]>

Applying this patch changes the behaviour slightly, but doesn't fix the
problem:

1 0.000000 armada388 -> n2100 TCP hcp-wismar→nfs [SYN] Seq=3766378552 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=240384 TSecr=0 WS=128
2 0.000431 n2100 -> armada388 TCP nfs→hcp-wismar [SYN, ACK] Seq=1297069321 Ack=3766378553 Win=28560 Len=0 MSS=1440 SACK_PERM=1 TSval=869361932 TSecr=240384 WS=64
3 0.000518 armada388 -> n2100 TCP [TCP Port numbers reused] hcp-wismar→nfs [SYN] Seq=3766379468 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=240384 TSecr=0 WS=128
4 0.000740 n2100 -> armada388 TCP nfs→hcp-wismar [RST, ACK] Seq=0 Ack=3766379469 Win=0 Len=0
5 0.000812 armada388 -> n2100 TCP [TCP Port numbers reused] hcp-wismar→nfs [SYN] Seq=3766379980 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=240384 TSecr=0 WS=128
6 0.001051 n2100 -> armada388 TCP nfs→hcp-wismar [SYN, ACK] Seq=1297080280 Ack=3766379981 Win=28560 Len=0 MSS=1440 SACK_PERM=1 TSval=869361932 TSecr=240384 WS=64
7 0.001137 armada388 -> n2100 TCP [TCP Port numbers reused] hcp-wismar→nfs [SYN] Seq=3766380519 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=240384 TSecr=0 WS=128
8 0.001352 n2100 -> armada388 TCP nfs→hcp-wismar [RST, ACK] Seq=0 Ack=3766380520 Win=0 Len=0

Notice the lack of second reset, but the syns trying to re-use the same
port but with different sequence numbers is still there.

Eventually, we get to a storm of SYNs:

468 0.084816 armada388 -> n2100 TCP [TCP Port numbers reused] hcp-wismar→nfs [SYN] Seq=3766488832 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=240384 TSecr=0 WS=128
469 0.085046 n2100 -> armada388 TCP nfs→hcp-wismar [SYN, ACK] Seq=1298392849 Ack=3766488833 Win=28560 Len=0 MSS=1440 SACK_PERM=1 TSval=869361940 TSecr=240384 WS=64
470 0.085136 armada388 -> n2100 TCP [TCP Port numbers reused] hcp-wismar→nfs [SYN] Seq=3766489333 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=240384 TSecr=0 WS=128
471 0.085391 n2100 -> armada388 TCP nfs→hcp-wismar [RST, ACK] Seq=0 Ack=3766489334 Win=0 Len=0
472 0.085482 armada388 -> n2100 TCP [TCP Port numbers reused] hcp-wismar→nfs [SYN] Seq=3766489762 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=240384 TSecr=0 WS=128
473 0.085777 armada388 -> n2100 TCP [TCP Port numbers reused] hcp-wismar→nfs [SYN] Seq=3766490263 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=240384 TSecr=0 WS=128
474 0.086023 armada388 -> n2100 TCP [TCP Port numbers reused] hcp-wismar→nfs [SYN] Seq=3766490694 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=240384 TSecr=0 WS=128
...
526 0.104063 armada388 -> n2100 TCP [TCP Port numbers reused] hcp-wismar→nfs [SYN] Seq=3766514610 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=240384 TSecr=0 WS=128
527 0.104308 armada388 -> n2100 TCP [TCP Port numbers reused] hcp-wismar→nfs [SYN] Seq=3766515182 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=240384 TSecr=0 WS=128
528 0.104560 armada388 -> n2100 TCP [TCP Port numbers reused] hcp-wismar→nfs [SYN] Seq=3766515621 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=240384 TSecr=0 WS=128
529 0.104799 armada388 -> n2100 TCP [TCP Port numbers reused] hcp-wismar→nfs [SYN] Seq=3766516122 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=240384 TSecr=0 WS=128
530 0.105052 armada388 -> n2100 TCP [TCP Port numbers reused] hcp-wismar→nfs [SYN] Seq=3766516556 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=240384 TSecr=0 WS=128
...
550 0.110487 armada388 -> n2100 TCP [TCP Port numbers reused] hcp-wismar→nfs [SYN] Seq=3766525755 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=240384 TSecr=0 WS=128
551 0.114016 n2100 -> armada388 TCP nfs→hcp-wismar [SYN, ACK] Seq=1298403514 Ack=3766489763 Win=28560 Len=0 MSS=1440 SACK_PERM=1 TSval=869361940 TSecr=240384 WS=64

followed by a storm of SYN,ACKs and resets. This seems to happen several
times in the packet log.

At some point something starts going really crazy:

9341 2.668325 n2100 -> armada388 TCP nfs→hcp-wismar [ACK] Seq=1337671519 Ack=3802565165 Win=28560 Len=0 TSval=869362196 TSecr=240615
9342 2.668413 n2100 -> armada388 TCP [TCP Dup ACK 9341#1] nfs→hcp-wismar [ACK] Seq=1337671519 Ack=3802565165 Win=28560 Len=0 TSval=869362196 TSecr=240615
9343 2.668423 n2100 -> armada388 TCP [TCP Dup ACK 9341#2] nfs→hcp-wismar [ACK] Seq=1337671519 Ack=3802565165 Win=28560 Len=0 TSval=869362196 TSecr=240615
9344 2.668433 n2100 -> armada388 TCP [TCP Dup ACK 9341#3] nfs→hcp-wismar [ACK] Seq=1337671519 Ack=3802565165 Win=28560 Len=0 TSval=869362196 TSecr=240615
...
9419 2.669586 n2100 -> armada388 TCP [TCP Dup ACK 9341#78] nfs→hcp-wismar [ACK] Seq=1337671519 Ack=3802565165 Win=28560 Len=0 TSval=869362198 TSecr=240615

There's no other packets logged between any of these duplicated acks and
I don't seem to have anything preceeding that with a sequence number of
380256516*, so I assume tcpdump dropped some packets.

Eventually, we get to:

31213 8.247815 armada388 -> n2100 TCP [TCP Port numbers reused] hcp-wismar→nfs [SYN] Seq=3892006879 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=241188 TSecr=0 WS=128
...
31337 8.297654 armada388 -> n2100 TCP hcp-wismar→nfs [RST] Seq=3891941177 Win=0 Len=0
31338 8.297753 armada388 -> n2100 TCP hcp-wismar→nfs [RST] Seq=3891942022 Win=0 Len=0
31339 8.297798 armada388 -> n2100 TCP hcp-wismar→nfs [RST] Seq=3891942867 Win=0 Len=0
31340 8.297835 armada388 -> n2100 TCP hcp-wismar→nfs [RST] Seq=3891943715 Win=0 Len=0
31341 8.297872 armada388 -> n2100 TCP hcp-wismar→nfs [RST] Seq=3891944560 Win=0 Len=0
31342 8.298076 armada388 -> n2100 TCP hcp-wismar→nfs [RST] Seq=3891945416 Win=0 Len=0
31343 8.300232 n2100 -> armada388 TCP nfs→hcp-wismar [SYN, ACK] Seq=1425940018 Ack=3892006880 Win=28560 Len=0 MSS=1440 SACK_PERM=1 TSval=869362756 TSecr=241188 WS=64

31344 9.050382 armada388 -> n2100 TCP [TCP Port numbers reused] hcp-wismar→nfs [SYN] Seq=3892315104 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=241289 TSecr=0 WS=128
31345 9.050760 n2100 -> armada388 TCP nfs→hcp-wismar [SYN, ACK] Seq=1438481320 Ack=3892315105 Win=28560 Len=0 MSS=1440 SACK_PERM=1 TSval=869362837 TSecr=241289 WS=64
31346 9.051115 armada388 -> n2100 TCP hcp-wismar→nfs [ACK] Seq=3892315105 Ack=1438481321 Win=28800 Len=0 TSval=241289 TSecr=869362837
31347 9.051624 armada388 -> n2100 NFS V3 ACCESS Call, FH: 0x905379cc, [Check: RD LU MD XT DL]
31348 9.051878 n2100 -> armada388 TCP nfs→hcp-wismar [ACK] Seq=1438481321 Ack=3892315225 Win=28608 Len=0 TSval=869362837 TSecr=241289
31349 9.052164 armada388 -> n2100 NFS V3 ACCESS Call, FH: 0x905379cc, [Check: RD LU MD XT DL]

where things start working again.

It looks like 0c78789e3a030615c6650fde89546cadf40ec2cc might be relevant
too, but I don't see that solving the multiple _concurrent_ connection
attempts with the same port number - presumably it's somehow trying to
make the same socket repeatedly connect despite a previous connection
being in progress, which would have nothing to do with cleaning up a
previous attempt.

To me, it looks like the sunrpc code isn't tracking whether a TCP
connection is already in progress, and waiting for the TCP connection
to either finish connecting, or fail to connect before attempting
another connection.

--
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

2015-09-11 15:06:22

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: NFS/TCP/IPv6 acting strangely in 4.2

On Fri, Sep 11, 2015 at 03:33:47PM +0100, Russell King - ARM Linux wrote:
> It looks like 0c78789e3a030615c6650fde89546cadf40ec2cc might be relevant
> too, but I don't see that solving the multiple _concurrent_ connection
> attempts with the same port number - presumably it's somehow trying to
> make the same socket repeatedly connect despite a previous connection
> being in progress, which would have nothing to do with cleaning up a
> previous attempt.

As I suspected, applying the above commit in addition does not solve the
problem, I still see the same behaviour: SYN SYNACK SYN RSTACK, SYN
SYNACK SYN RSTACK, and eventual SYN storms.

I do have this captured as well:

2558 0.834316 armada388 -> n2100 TCP [TCP Port numbers reused] rpasswd→nfs [SYN] Seq=1053655487 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=60001 TSecr=0 WS=128
2559 0.834572 n2100 -> armada388 TCP nfs→rpasswd [SYN, ACK] Seq=3076611574 Ack=1053655488 Win=28560 Len=0 MSS=1440 SACK_PERM=1 TSval=869622246 TSecr=60001 WS=64
2560 0.834666 armada388 -> n2100 TCP [TCP Port numbers reused] rpasswd→nfs [SYN] Seq=1054228544 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=60005 TSecr=0 WS=128
2561 0.834895 n2100 -> armada388 TCP nfs→rpasswd [ACK] Seq=3076611575 Ack=1053655488 Win=28560 Len=0 TSval=869622246 TSecr=60001

The packet at 2561 looks wrong to me - this doesn't follow what I know
would be the standard TCP setup of syn, synack, ack, because that final
ack is in the wrong direction.

2562 0.834965 armada388 -> n2100 TCP [TCP Port numbers reused] rpasswd→nfs [SYN] Seq=1054228969 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=60005 TSecr=0 WS=128
2563 0.835156 n2100 -> armada388 TCP nfs→rpasswd [ACK] Seq=3076611575 Ack=1053655488 Win=28560 Len=0 TSval=869622246 TSecr=60001
2564 0.835218 armada388 -> n2100 TCP [TCP Port numbers reused] rpasswd→nfs [SYN] Seq=1054229397 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=60005 TSecr=0 WS=128
2565 0.835420 n2100 -> armada388 TCP nfs→rpasswd [ACK] Seq=3076611575 Ack=1053655488 Win=28560 Len=0 TSval=869622246 TSecr=60001
2566 0.835488 armada388 -> n2100 TCP [TCP Port numbers reused] rpasswd→nfs [SYN] Seq=1054229824 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=60005 TSecr=0 WS=128
2567 0.835681 n2100 -> armada388 TCP nfs→rpasswd [ACK] Seq=3076611575 Ack=1053655488 Win=28560 Len=0 TSval=869622246 TSecr=60001
2568 0.835753 armada388 -> n2100 TCP [TCP Port numbers reused] rpasswd→nfs [SYN] Seq=1054230261 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=60005 TSecr=0 WS=128
2569 0.835971 n2100 -> armada388 TCP nfs→rpasswd [ACK] Seq=3076611575 Ack=1053655488 Win=28560 Len=0 TSval=869622246 TSecr=60001
...
2701 0.858226 n2100 -> armada388 TCP nfs→rpasswd [ACK] Seq=3076611575 Ack=1053655488 Win=28560 Len=0 TSval=869622247 TSecr=60001
2702 0.858270 n2100 -> armada388 TCP [TCP Dup ACK 2701#1] nfs→rpasswd [ACK] Seq=3076611575 Ack=1053655488 Win=28560 Len=0 TSval=869622247 TSecr=60001
2703 0.858283 n2100 -> armada388 TCP [TCP Dup ACK 2701#2] nfs→rpasswd [ACK] Seq=3076611575 Ack=1053655488 Win=28560 Len=0 TSval=869622247 TSecr=60001
...
2772 0.859373 n2100 -> armada388 TCP [TCP Dup ACK 2701#71] nfs→rpasswd [ACK] Seq=3076611575 Ack=1053655488 Win=28560 Len=0 TSval=869622248 TSecr=60001
2773 0.892349 armada388 -> n2100 TCP [TCP Port numbers reused] rpasswd→nfs [SYN] Seq=1054273361 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=60005 TSecr=0 WS=128
2774 0.892725 n2100 -> armada388 TCP nfs→rpasswd [ACK] Seq=3076611575 Ack=1053655488 Win=28560 Len=0 TSval=869622252 TSecr=60001
...
2960 0.979230 n2100 -> armada388 TCP nfs→rpasswd [ACK] Seq=3076611575 Ack=1053655488 Win=28560 Len=0 TSval=869622259 TSecr=60001
2961 0.979278 n2100 -> armada388 TCP [TCP Dup ACK 2960#1] nfs→rpasswd [ACK] Seq=3076611575 Ack=1053655488 Win=28560 Len=0 TSval=869622259 TSecr=60001
2962 0.979291 n2100 -> armada388 TCP [TCP Dup ACK 2960#2] nfs→rpasswd [ACK] Seq=3076611575 Ack=1053655488 Win=28560 Len=0 TSval=869622259 TSecr=60001
...
3030 0.980329 n2100 -> armada388 TCP [TCP Dup ACK 2960#70] nfs→rpasswd [ACK] Seq=3076611575 Ack=1053655488 Win=28560 Len=0 TSval=869622260 TSecr=60001
3031 0.980343 n2100 -> armada388 TCP [TCP Dup ACK 2960#71] nfs→rpasswd [ACK] Seq=3076611575 Ack=1053655488 Win=28560 Len=0 TSval=869622260 TSecr=60001
3032 0.989005 armada388 -> n2100 TCP [TCP Port numbers reused] rpasswd→nfs [SYN] Seq=1056293352 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=60018 TSecr=0 WS=128
3033 0.989462 n2100 -> armada388 TCP nfs→rpasswd [ACK] Seq=3076611575 Ack=1053655488 Win=28560 Len=0 TSval=869622262 TSecr=60001
...
3206 1.011270 armada388 -> n2100 TCP [TCP Port numbers reused] rpasswd→nfs [SYN] Seq=1056917791 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=60022 TSecr=0 WS=128
3207 1.011721 n2100 -> armada388 TCP nfs→rpasswd [ACK] Seq=3076611575 Ack=1053655488 Win=28560 Len=0 TSval=869622263 TSecr=60001
3208 1.011765 n2100 -> armada388 TCP [TCP Dup ACK 3207#1] nfs→rpasswd [ACK] Seq=3076611575 Ack=1053655488 Win=28560 Len=0 TSval=869622263 TSecr=60001
3209 1.011779 n2100 -> armada388 TCP [TCP Dup ACK 3207#2] nfs→rpasswd [ACK] Seq=3076611575 Ack=1053655488 Win=28560 Len=0 TSval=869622263 TSecr=60001
...
3247 1.012250 n2100 -> armada388 TCP [TCP Dup ACK 3207#40] nfs→rpasswd [ACK] Seq=3076611575 Ack=1053655488 Win=28560 Len=0 TSval=869622264 TSecr=60001
3248 1.012528 armada388 -> n2100 TCP [TCP Port numbers reused] rpasswd→nfs [SYN] Seq=1056918216 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=60022 TSecr=0 WS=128
3249 1.012881 n2100 -> armada388 TCP nfs→rpasswd [ACK] Seq=3076611575 Ack=1053655488 Win=28560 Len=0 TSval=869622264 TSecr=60001
... eventually ...
3483 1.096726 armada388 -> n2100 TCP [TCP Port numbers reused] rpasswd→nfs [SYN] Seq=1057880967 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=60028 TSecr=0 WS=128
3484 1.099249 n2100 -> armada388 TCP nfs→rpasswd [ACK] Seq=3076611575 Ack=1053655488 Win=28560 Len=0 TSval=869622271 TSecr=60001
3485 1.099368 n2100 -> armada388 TCP [TCP Dup ACK 3484#1] nfs→rpasswd [ACK] Seq=3076611575 Ack=1053655488 Win=28560 Len=0 TSval=869622271 TSecr=60001
3486 1.099401 n2100 -> armada388 TCP [TCP Dup ACK 3484#2] nfs→rpasswd [ACK] Seq=3076611575 Ack=1053655488 Win=28560 Len=0 TSval=869622271 TSecr=60001
...
3541 1.100250 n2100 -> armada388 TCP [TCP Dup ACK 3484#57] nfs→rpasswd [ACK] Seq=3076611575 Ack=1053655488 Win=28560 Len=0 TSval=869622272 TSecr=60001
3542 1.100261 n2100 -> armada388 TCP [TCP Dup ACK 3484#58] nfs→rpasswd [ACK] Seq=3076611575 Ack=1053655488 Win=28560 Len=0 TSval=869622272 TSecr=60001
3543 1.109405 armada388 -> n2100 TCP [TCP Port numbers reused] rpasswd→nfs [SYN] Seq=1057881400 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=60028 TSecr=0 WS=128
3544 1.109778 n2100 -> armada388 TCP nfs→rpasswd [ACK] Seq=3076611575 Ack=1053655488 Win=28560 Len=0 TSval=869622274 TSecr=60001
...
3555 1.111223 armada388 -> n2100 TCP [TCP Port numbers reused] rpasswd→nfs [SYN] Seq=1057883994 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=60028 TSecr=0 WS=128
3556 1.111403 n2100 -> armada388 TCP nfs→rpasswd [ACK] Seq=3076611575 Ack=1053655488 Win=28560 Len=0 TSval=869622274 TSecr=60001
...
3604 1.190099 armada388 -> n2100 TCP rpasswd→nfs [RST] Seq=1053655488 Win=0 Len=0

It looks like the SYNs for the new connections provoke ACKs from an
existing connection but obviously different sequence number. That
looks weird, shouldn't they be provoking resets or being ignored?

What if these SYN packets were spoofed from half-way across the net?
It seems like it could be used to provoke ACK traffic (with valid
sequence numbers) against a potential victim. However, that's the
3.x kernel which is responding with those ACKs - let's put that to
one side for the time being until there's an answer for the NFS
problem.

--
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

2015-09-11 15:18:45

by Eric Dumazet

[permalink] [raw]
Subject: Re: NFS/TCP/IPv6 acting strangely in 4.2

On Fri, 2015-09-11 at 16:06 +0100, Russell King - ARM Linux wrote:
> On Fri, Sep 11, 2015 at 03:33:47PM +0100, Russell King - ARM Linux wrote:
> > It looks like 0c78789e3a030615c6650fde89546cadf40ec2cc might be relevant
> > too, but I don't see that solving the multiple _concurrent_ connection
> > attempts with the same port number - presumably it's somehow trying to
> > make the same socket repeatedly connect despite a previous connection
> > being in progress, which would have nothing to do with cleaning up a
> > previous attempt.
>
> As I suspected, applying the above commit in addition does not solve the
> problem, I still see the same behaviour: SYN SYNACK SYN RSTACK, SYN
> SYNACK SYN RSTACK, and eventual SYN storms.
>
> I do have this captured as well:
>
> 2558 0.834316 armada388 -> n2100 TCP [TCP Port numbers reused] rpasswd→nfs [SYN] Seq=1053655487 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=60001 TSecr=0 WS=128
> 2559 0.834572 n2100 -> armada388 TCP nfs→rpasswd [SYN, ACK] Seq=3076611574 Ack=1053655488 Win=28560 Len=0 MSS=1440 SACK_PERM=1 TSval=869622246 TSecr=60001 WS=64
> 2560 0.834666 armada388 -> n2100 TCP [TCP Port numbers reused] rpasswd→nfs [SYN] Seq=1054228544 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=60005 TSecr=0 WS=128
> 2561 0.834895 n2100 -> armada388 TCP nfs→rpasswd [ACK] Seq=3076611575 Ack=1053655488 Win=28560 Len=0 TSval=869622246 TSecr=60001
>
> The packet at 2561 looks wrong to me - this doesn't follow what I know
> would be the standard TCP setup of syn, synack, ack, because that final
> ack is in the wrong direction.
>

This 2561 packet is an ACK packet, because n2100 has a SYN_RECV socket
created by packet 2558.

It receives a SYN packet (2560) that it interprets as a packet slightly
out of sequence (1054228544 being above 1053655487) for this SYN_RECV

The wrong packet is 2560, not 2561




2015-09-11 16:24:30

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: NFS/TCP/IPv6 acting strangely in 4.2

On Fri, Sep 11, 2015 at 08:18:43AM -0700, Eric Dumazet wrote:
> On Fri, 2015-09-11 at 16:06 +0100, Russell King - ARM Linux wrote:
> > On Fri, Sep 11, 2015 at 03:33:47PM +0100, Russell King - ARM Linux wrote:
> > > It looks like 0c78789e3a030615c6650fde89546cadf40ec2cc might be relevant
> > > too, but I don't see that solving the multiple _concurrent_ connection
> > > attempts with the same port number - presumably it's somehow trying to
> > > make the same socket repeatedly connect despite a previous connection
> > > being in progress, which would have nothing to do with cleaning up a
> > > previous attempt.
> >
> > As I suspected, applying the above commit in addition does not solve the
> > problem, I still see the same behaviour: SYN SYNACK SYN RSTACK, SYN
> > SYNACK SYN RSTACK, and eventual SYN storms.
> >
> > I do have this captured as well:
> >
> > 2558 0.834316 armada388 -> n2100 TCP [TCP Port numbers reused] rpasswd→nfs [SYN] Seq=1053655487 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=60001 TSecr=0 WS=128
> > 2559 0.834572 n2100 -> armada388 TCP nfs→rpasswd [SYN, ACK] Seq=3076611574 Ack=1053655488 Win=28560 Len=0 MSS=1440 SACK_PERM=1 TSval=869622246 TSecr=60001 WS=64
> > 2560 0.834666 armada388 -> n2100 TCP [TCP Port numbers reused] rpasswd→nfs [SYN] Seq=1054228544 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=60005 TSecr=0 WS=128
> > 2561 0.834895 n2100 -> armada388 TCP nfs→rpasswd [ACK] Seq=3076611575 Ack=1053655488 Win=28560 Len=0 TSval=869622246 TSecr=60001
> >
> > The packet at 2561 looks wrong to me - this doesn't follow what I know
> > would be the standard TCP setup of syn, synack, ack, because that final
> > ack is in the wrong direction.
> >
>
> This 2561 packet is an ACK packet, because n2100 has a SYN_RECV socket
> created by packet 2558.
>
> It receives a SYN packet (2560) that it interprets as a packet slightly
> out of sequence (1054228544 being above 1053655487) for this SYN_RECV
>
> The wrong packet is 2560, not 2561

Ok.

Looking deeper at the XPRT sunrpc code, I have to wonder about the
sanity of this:

void xprt_connect(struct rpc_task *task)
{
...
if (!xprt_connected(xprt)) {
...
if (test_bit(XPRT_CLOSING, &xprt->state))
return;
if (xprt_test_and_set_connecting(xprt))
return;
xprt->stat.connect_start = jiffies;
xprt->ops->connect(xprt, task);

That calls into xs_connect(), which schedules a workqueue to do the
connection. The workqueue will call xs_tcp_setup_socket().

xs_tcp_setup_socket() creates a socket if one didn't exist, otherwise
re-using the previously obtained socket (which'll be why its using the
same socket) and then goes on to call xs_tcp_finish_connecting().

xs_tcp_finish_connecting() calls kernel_connect(), which will return
-EINPROGRESS. We seem to treat EINPROGRESS as if the connection was
successful:

case 0:
case -EINPROGRESS:
case -EALREADY:
xprt_unlock_connect(xprt, transport);
xprt_clear_connecting(xprt);
return;

and the xprt_clear_connecting() results in this whole path being
re-opened: the socket is not yet connected, so xprt_connected() will
return false, and despite the socket connection still being mid-way
through being connected, we clear the "connecting" status, causing
xprt_test_and_set_connecting() to return false.

That allows us to re-call xprt->ops->connect, re-queue the connect
worker, and re-run the call to kernel_connect() for a socket which is
already mid-way through being connected.

Shouldn't the "connecting" status only be cleared when either the socket
has _finished_ connecting, or when the connection has _failed_ to connect,
and not when it's mid-way through connecting?

I've not been able to prove this: I've set rpc_debug to 129 to log
just xprt and trans RPC facilities, and that's sufficient to change
the timing such that this doesn't happen.

--
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

2015-09-11 16:49:45

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: NFS/TCP/IPv6 acting strangely in 4.2

On Fri, Sep 11, 2015 at 05:24:17PM +0100, Russell King - ARM Linux wrote:
> On Fri, Sep 11, 2015 at 08:18:43AM -0700, Eric Dumazet wrote:
> > On Fri, 2015-09-11 at 16:06 +0100, Russell King - ARM Linux wrote:
> > > On Fri, Sep 11, 2015 at 03:33:47PM +0100, Russell King - ARM Linux wrote:
> > > > It looks like 0c78789e3a030615c6650fde89546cadf40ec2cc might be relevant
> > > > too, but I don't see that solving the multiple _concurrent_ connection
> > > > attempts with the same port number - presumably it's somehow trying to
> > > > make the same socket repeatedly connect despite a previous connection
> > > > being in progress, which would have nothing to do with cleaning up a
> > > > previous attempt.
> > >
> > > As I suspected, applying the above commit in addition does not solve the
> > > problem, I still see the same behaviour: SYN SYNACK SYN RSTACK, SYN
> > > SYNACK SYN RSTACK, and eventual SYN storms.
> > >
> > > I do have this captured as well:
> > >
> > > 2558 0.834316 armada388 -> n2100 TCP [TCP Port numbers reused] rpasswd→nfs [SYN] Seq=1053655487 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=60001 TSecr=0 WS=128
> > > 2559 0.834572 n2100 -> armada388 TCP nfs→rpasswd [SYN, ACK] Seq=3076611574 Ack=1053655488 Win=28560 Len=0 MSS=1440 SACK_PERM=1 TSval=869622246 TSecr=60001 WS=64
> > > 2560 0.834666 armada388 -> n2100 TCP [TCP Port numbers reused] rpasswd→nfs [SYN] Seq=1054228544 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=60005 TSecr=0 WS=128
> > > 2561 0.834895 n2100 -> armada388 TCP nfs→rpasswd [ACK] Seq=3076611575 Ack=1053655488 Win=28560 Len=0 TSval=869622246 TSecr=60001
> > >
> > > The packet at 2561 looks wrong to me - this doesn't follow what I know
> > > would be the standard TCP setup of syn, synack, ack, because that final
> > > ack is in the wrong direction.
> > >
> >
> > This 2561 packet is an ACK packet, because n2100 has a SYN_RECV socket
> > created by packet 2558.
> >
> > It receives a SYN packet (2560) that it interprets as a packet slightly
> > out of sequence (1054228544 being above 1053655487) for this SYN_RECV
> >
> > The wrong packet is 2560, not 2561
>
> Ok.
>
> Looking deeper at the XPRT sunrpc code, I have to wonder about the
> sanity of this:
>
> void xprt_connect(struct rpc_task *task)
> {
> ...
> if (!xprt_connected(xprt)) {
> ...
> if (test_bit(XPRT_CLOSING, &xprt->state))
> return;
> if (xprt_test_and_set_connecting(xprt))
> return;
> xprt->stat.connect_start = jiffies;
> xprt->ops->connect(xprt, task);
>
> That calls into xs_connect(), which schedules a workqueue to do the
> connection. The workqueue will call xs_tcp_setup_socket().
>
> xs_tcp_setup_socket() creates a socket if one didn't exist, otherwise
> re-using the previously obtained socket (which'll be why its using the
> same socket) and then goes on to call xs_tcp_finish_connecting().
>
> xs_tcp_finish_connecting() calls kernel_connect(), which will return
> -EINPROGRESS. We seem to treat EINPROGRESS as if the connection was
> successful:
>
> case 0:
> case -EINPROGRESS:
> case -EALREADY:
> xprt_unlock_connect(xprt, transport);
> xprt_clear_connecting(xprt);
> return;
>
> and the xprt_clear_connecting() results in this whole path being
> re-opened: the socket is not yet connected, so xprt_connected() will
> return false, and despite the socket connection still being mid-way
> through being connected, we clear the "connecting" status, causing
> xprt_test_and_set_connecting() to return false.
>
> That allows us to re-call xprt->ops->connect, re-queue the connect
> worker, and re-run the call to kernel_connect() for a socket which is
> already mid-way through being connected.
>
> Shouldn't the "connecting" status only be cleared when either the socket
> has _finished_ connecting, or when the connection has _failed_ to connect,
> and not when it's mid-way through connecting?
>
> I've not been able to prove this: I've set rpc_debug to 129 to log
> just xprt and trans RPC facilities, and that's sufficient to change
> the timing such that this doesn't happen.

Following that idea, I just tried the patch below, and it seems to work.
I don't know whether it handles all cases after a call to kernel_connect(),
but it stops the multiple connection attempts:

1 0.000000 armada388 -> n2100 TCP 1009→nfs [SYN] Seq=3794066539 Win=28560 Len=0 MSS=1440 SACK_PERM=1 TSval=15712 TSecr=870317691 WS=128
2 0.000414 n2100 -> armada388 TCP nfs→1009 [SYN, ACK] Seq=1884476522 Ack=3794066540 Win=28560 Len=0 MSS=1440 SACK_PERM=1 TSval=870318939 TSecr=15712 WS=64
3 0.000787 armada388 -> n2100 TCP 1009→nfs [ACK] Seq=3794066540 Ack=1884476523 Win=28672 Len=0 TSval=15712 TSecr=870318939
4 0.001304 armada388 -> n2100 NFS V3 ACCESS Call, FH: 0x905379cc, [Check: RD LU MD XT DL]
5 0.001566 n2100 -> armada388 TCP nfs→1009 [ACK] Seq=1884476523 Ack=3794066660 Win=28608 Len=0 TSval=870318939 TSecr=15712
6 0.001640 armada388 -> n2100 NFS V3 ACCESS Call, FH: 0x905379cc, [Check: RD LU MD XT DL]
7 0.001866 n2100 -> armada388 TCP nfs→1009 [ACK] Seq=1884476523 Ack=3794066780 Win=28608 Len=0 TSval=870318939 TSecr=15712
8 0.003070 n2100 -> armada388 NFS V3 ACCESS Reply (Call In 4), [Allowed: RD LU MD XT DL]
9 0.003415 armada388 -> n2100 TCP 1009→nfs [ACK] Seq=3794066780 Ack=1884476647 Win=28672 Len=0 TSval=15712 TSecr=870318939
10 0.003592 armada388 -> n2100 NFS V3 ACCESS Call, FH: 0xe15fc9c9, [Check: RD LU MD XT DL]
11 0.004354 n2100 -> armada388 NFS V3 ACCESS Reply (Call In 6), [Allowed: RD LU MD XT DL]
12 0.004682 armada388 -> n2100 NFS V3 ACCESS Call, FH: 0xe15fc9c9, [Check: RD LU MD XT DL]
13 0.005365 n2100 -> armada388 NFS V3 ACCESS Reply (Call In 10), [Allowed: RD LU MD XT DL]
14 0.005701 armada388 -> n2100 NFS V3 GETATTR Call, FH: 0xe15fc9c9
...

net/sunrpc/xprtsock.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index ff5b6a2e62c3..c456d6e51c56 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -1450,6 +1450,7 @@ static void xs_tcp_state_change(struct sock *sk)
switch (sk->sk_state) {
case TCP_ESTABLISHED:
spin_lock(&xprt->transport_lock);
+ xprt_clear_connecting(xprt);
if (!xprt_test_and_set_connected(xprt)) {
struct sock_xprt *transport = container_of(xprt,
struct sock_xprt, xprt);
@@ -1474,12 +1475,14 @@ static void xs_tcp_state_change(struct sock *sk)
smp_mb__before_atomic();
clear_bit(XPRT_CONNECTED, &xprt->state);
clear_bit(XPRT_CLOSE_WAIT, &xprt->state);
+ clear_bit(XPRT_CONNECTING, &xprt->state);
smp_mb__after_atomic();
break;
case TCP_CLOSE_WAIT:
/* The server initiated a shutdown of the socket */
xprt->connect_cookie++;
clear_bit(XPRT_CONNECTED, &xprt->state);
+ clear_bit(XPRT_CONNECTING, &xprt->state);
xs_tcp_force_close(xprt);
case TCP_CLOSING:
/*
@@ -1493,6 +1496,7 @@ static void xs_tcp_state_change(struct sock *sk)
set_bit(XPRT_CLOSING, &xprt->state);
smp_mb__before_atomic();
clear_bit(XPRT_CONNECTED, &xprt->state);
+ clear_bit(XPRT_CONNECTING, &xprt->state);
smp_mb__after_atomic();
break;
case TCP_CLOSE:
@@ -2237,11 +2241,13 @@ static void xs_tcp_setup_socket(struct work_struct *work)
xs_tcp_force_close(xprt);
break;
case 0:
- case -EINPROGRESS:
case -EALREADY:
xprt_unlock_connect(xprt, transport);
xprt_clear_connecting(xprt);
return;
+ case -EINPROGRESS:
+ xprt_unlock_connect(xprt, transport);
+ return;
case -EINVAL:
/* Happens, for instance, if the user specified a link
* local IPv6 address without a scope-id.


--
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

2015-09-16 07:15:27

by Gregory CLEMENT

[permalink] [raw]
Subject: Re: NFS/TCP/IPv6 acting strangely in 4.2

Hi Damien,

On mer., sept. 16 2015, Damien Thébault <[email protected]> wrote:

> On Fri, 2015-09-11 at 12:38 +0100, Russell King - ARM Linux wrote:
>> I have a recent Marvell Armada 388 board here which uses the mvneta
>> driver. I'm seeing some weird effects with NFS with it acting as a
>> client.
>
> Hello,
>
> I'm upgrading a Marvelle Armada 370 board using the mvneta driver from
> 4.0 to 4.2 and noticed issues with NFS booting.
> Basically, most of the time init returns with an error code, or
> programs segfault or throw illegal instructions.
>
> Since it worked fine on 4.0 I bisected until I found commit
> a84e32894191cfcbffa54180d78d7d4654d56c20 "net: mvneta: fix refilling
> for Rx DMA buffers".
>
> If I revert this commit, everything seems to get back to normal.
> Could you try it ? The two issues look very similar.

Actually there was a bug with this commit, but a fix had been submitted
and accepted yesterday, you can find him here:
https://patchwork.ozlabs.org/patch/518111/.

Thanks,

Gregory

>
> Regards

--
Gregory Clement, Free Electrons
Kernel, drivers, real-time and embedded Linux
development, consulting, training and support.
http://free-electrons.com

2015-09-16 07:25:29

by Damien Thébault

[permalink] [raw]
Subject: Re: NFS/TCP/IPv6 acting strangely in 4.2

T24gRnJpLCAyMDE1LTA5LTExIGF0IDEyOjM4ICswMTAwLCBSdXNzZWxsIEtpbmcgLSBBUk0gTGlu
dXggd3JvdGU6DQo+IEkgaGF2ZSBhIHJlY2VudCBNYXJ2ZWxsIEFybWFkYSAzODggYm9hcmQgaGVy
ZSB3aGljaCB1c2VzIHRoZSBtdm5ldGENCj4gZHJpdmVyLiAgSSdtIHNlZWluZyBzb21lIHdlaXJk
IGVmZmVjdHMgd2l0aCBORlMgd2l0aCBpdCBhY3RpbmcgYXMgYQ0KPiBjbGllbnQuDQoNCkhlbGxv
LA0KDQpJJ20gdXBncmFkaW5nIGEgTWFydmVsbGUgQXJtYWRhIDM3MCBib2FyZCB1c2luZyB0aGUg
bXZuZXRhIGRyaXZlciBmcm9tDQo0LjAgdG8gNC4yIGFuZCBub3RpY2VkIGlzc3VlcyB3aXRoIE5G
UyBib290aW5nLg0KQmFzaWNhbGx5LCBtb3N0IG9mIHRoZSB0aW1lIGluaXQgcmV0dXJucyB3aXRo
IGFuIGVycm9yIGNvZGUsIG9yDQpwcm9ncmFtcyBzZWdmYXVsdCBvciB0aHJvdyBpbGxlZ2FsIGlu
c3RydWN0aW9ucy4NCg0KU2luY2UgaXQgd29ya2VkIGZpbmUgb24gNC4wIEkgYmlzZWN0ZWQgdW50
aWwgSSBmb3VuZCBjb21taXQNCmE4NGUzMjg5NDE5MWNmY2JmZmE1NDE4MGQ3OGQ3ZDQ2NTRkNTZj
MjAgIm5ldDogbXZuZXRhOiBmaXggcmVmaWxsaW5nDQpmb3IgUnggRE1BIGJ1ZmZlcnMiLg0KDQpJ
ZiBJIHJldmVydCB0aGlzIGNvbW1pdCwgZXZlcnl0aGluZyBzZWVtcyB0byBnZXQgYmFjayB0byBu
b3JtYWwuDQpDb3VsZCB5b3UgdHJ5IGl0ID8gVGhlIHR3byBpc3N1ZXMgbG9vayB2ZXJ5IHNpbWls
YXIuDQoNClJlZ2FyZHMNCi0tIA0KRGFtaWVuIFRow6liYXVsdA0KDQo=

2015-09-16 07:32:11

by Willy Tarreau

[permalink] [raw]
Subject: Re: NFS/TCP/IPv6 acting strangely in 4.2

Hi,

On Wed, Sep 16, 2015 at 06:53:57AM +0000, Damien Th?bault wrote:
> On Fri, 2015-09-11 at 12:38 +0100, Russell King - ARM Linux wrote:
> > I have a recent Marvell Armada 388 board here which uses the mvneta
> > driver. I'm seeing some weird effects with NFS with it acting as a
> > client.
>
> Hello,
>
> I'm upgrading a Marvelle Armada 370 board using the mvneta driver from
> 4.0 to 4.2 and noticed issues with NFS booting.
> Basically, most of the time init returns with an error code, or
> programs segfault or throw illegal instructions.
>
> Since it worked fine on 4.0 I bisected until I found commit
> a84e32894191cfcbffa54180d78d7d4654d56c20 "net: mvneta: fix refilling
> for Rx DMA buffers".
>
> If I revert this commit, everything seems to get back to normal.
> Could you try it ? The two issues look very similar.

I'm not sure but I'm seeing that the accounting was changed by this
patch without being certain of the implications; if the revert above
works, it would be nice to try to only apply this just to see if
that's indeed an accounting error or not :

diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 62e48bc..4205867 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -1463,6 +1463,7 @@ static int mvneta_rx(struct mvneta_port *pp, int rx_todo,
{
struct net_device *dev = pp->dev;
int rx_done;
+ int missed = 0;
u32 rcvd_pkts = 0;
u32 rcvd_bytes = 0;

@@ -1527,6 +1528,7 @@ static int mvneta_rx(struct mvneta_port *pp, int rx_todo,
if (err) {
netdev_err(dev, "Linux processing - Can't refill\n");
rxq->missed++;
+ missed++;
goto err_drop_frame;
}

@@ -1561,7 +1563,7 @@ static int mvneta_rx(struct mvneta_port *pp, int rx_todo,
}

/* Update rxq management counters */
- mvneta_rxq_desc_num_update(pp, rxq, rx_done, rx_done);
+ mvneta_rxq_desc_num_update(pp, rxq, rx_done, rx_done - missed);

return rx_done;
}

Regards,
Willy


2015-09-16 07:39:21

by Damien Thébault

[permalink] [raw]
Subject: Re: NFS/TCP/IPv6 acting strangely in 4.2

T24gV2VkLCAyMDE1LTA5LTE2IGF0IDA5OjE1ICswMjAwLCBHcmVnb3J5IENMRU1FTlQgd3JvdGU6
DQo+ID4gU2luY2UgaXQgd29ya2VkIGZpbmUgb24gNC4wIEkgYmlzZWN0ZWQgdW50aWwgSSBmb3Vu
ZCBjb21taXQNCj4gPiBhODRlMzI4OTQxOTFjZmNiZmZhNTQxODBkNzhkN2Q0NjU0ZDU2YzIwICJu
ZXQ6IG12bmV0YTogZml4DQo+ID4gcmVmaWxsaW5nDQo+ID4gZm9yIFJ4IERNQSBidWZmZXJzIi4N
Cj4gPiANCj4gPiBJZiBJIHJldmVydCB0aGlzIGNvbW1pdCwgZXZlcnl0aGluZyBzZWVtcyB0byBn
ZXQgYmFjayB0byBub3JtYWwuDQo+ID4gQ291bGQgeW91IHRyeSBpdCA/IFRoZSB0d28gaXNzdWVz
IGxvb2sgdmVyeSBzaW1pbGFyLg0KPiBBY3R1YWxseSB0aGVyZSB3YXMgYSBidWcgd2l0aCB0aGlz
IGNvbW1pdCwgYnV0IGEgZml4IGhhZCBiZWVuDQo+IHN1Ym1pdHRlZA0KPiBhbmQgYWNjZXB0ZWQg
eWVzdGVyZGF5LCB5b3UgY2FuIGZpbmQgaGltIGhlcmU6DQo+IGh0dHBzOi8vcGF0Y2h3b3JrLm96
bGFicy5vcmcvcGF0Y2gvNTE4MTExLy4NCg0KSGVsbG8sIHRoaXMgaW5kZWVkIGZpeGVzIHRoZSBp
c3N1ZSBmb3IgbWUsIHRoYW5rcy4NCi0tIA0KRGFtaWVuIFRow6liYXVsdA0KUiZEIEVuZ2luZWVy
DQpWSVRFQw0KDQpULiArMzMgMSA0NiA3MyAwNiAwNg0KRi4gKzMzIDkgNTkgODUgOTkgOTINCkUu
IGRhbWllbi50aGViYXVsdEB2aXRlYy5jb20NCmh0dHA6Ly93d3cudml0ZWMuY29tDQoNCg==

2015-09-17 13:57:20

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: NFS/TCP/IPv6 acting strangely in 4.2

On Fri, Sep 11, 2015 at 05:49:38PM +0100, Russell King - ARM Linux wrote:
> Following that idea, I just tried the patch below, and it seems to work.
> I don't know whether it handles all cases after a call to kernel_connect(),
> but it stops the multiple connection attempts:
>
> 1 0.000000 armada388 -> n2100 TCP 1009→nfs [SYN] Seq=3794066539 Win=28560 Len=0 MSS=1440 SACK_PERM=1 TSval=15712 TSecr=870317691 WS=128
> 2 0.000414 n2100 -> armada388 TCP nfs→1009 [SYN, ACK] Seq=1884476522 Ack=3794066540 Win=28560 Len=0 MSS=1440 SACK_PERM=1 TSval=870318939 TSecr=15712 WS=64
> 3 0.000787 armada388 -> n2100 TCP 1009→nfs [ACK] Seq=3794066540 Ack=1884476523 Win=28672 Len=0 TSval=15712 TSecr=870318939
> 4 0.001304 armada388 -> n2100 NFS V3 ACCESS Call, FH: 0x905379cc, [Check: RD LU MD XT DL]
> 5 0.001566 n2100 -> armada388 TCP nfs→1009 [ACK] Seq=1884476523 Ack=3794066660 Win=28608 Len=0 TSval=870318939 TSecr=15712
> 6 0.001640 armada388 -> n2100 NFS V3 ACCESS Call, FH: 0x905379cc, [Check: RD LU MD XT DL]
> 7 0.001866 n2100 -> armada388 TCP nfs→1009 [ACK] Seq=1884476523 Ack=3794066780 Win=28608 Len=0 TSval=870318939 TSecr=15712
> 8 0.003070 n2100 -> armada388 NFS V3 ACCESS Reply (Call In 4), [Allowed: RD LU MD XT DL]
> 9 0.003415 armada388 -> n2100 TCP 1009→nfs [ACK] Seq=3794066780 Ack=1884476647 Win=28672 Len=0 TSval=15712 TSecr=870318939
> 10 0.003592 armada388 -> n2100 NFS V3 ACCESS Call, FH: 0xe15fc9c9, [Check: RD LU MD XT DL]
> 11 0.004354 n2100 -> armada388 NFS V3 ACCESS Reply (Call In 6), [Allowed: RD LU MD XT DL]
> 12 0.004682 armada388 -> n2100 NFS V3 ACCESS Call, FH: 0xe15fc9c9, [Check: RD LU MD XT DL]
> 13 0.005365 n2100 -> armada388 NFS V3 ACCESS Reply (Call In 10), [Allowed: RD LU MD XT DL]
> 14 0.005701 armada388 -> n2100 NFS V3 GETATTR Call, FH: 0xe15fc9c9
> ...

NFS people - any comments on this patch? Is it the correct way to solve
this problem (please see the first message in this thread for the problem.)
Without this patch, NFS is unusable as it tries to launch multiple new
connections from the same port to the NFS server without giving the NFS
server time to respond and establish the TCP connection.

>
> net/sunrpc/xprtsock.c | 8 +++++++-
> 1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> index ff5b6a2e62c3..c456d6e51c56 100644
> --- a/net/sunrpc/xprtsock.c
> +++ b/net/sunrpc/xprtsock.c
> @@ -1450,6 +1450,7 @@ static void xs_tcp_state_change(struct sock *sk)
> switch (sk->sk_state) {
> case TCP_ESTABLISHED:
> spin_lock(&xprt->transport_lock);
> + xprt_clear_connecting(xprt);
> if (!xprt_test_and_set_connected(xprt)) {
> struct sock_xprt *transport = container_of(xprt,
> struct sock_xprt, xprt);
> @@ -1474,12 +1475,14 @@ static void xs_tcp_state_change(struct sock *sk)
> smp_mb__before_atomic();
> clear_bit(XPRT_CONNECTED, &xprt->state);
> clear_bit(XPRT_CLOSE_WAIT, &xprt->state);
> + clear_bit(XPRT_CONNECTING, &xprt->state);
> smp_mb__after_atomic();
> break;
> case TCP_CLOSE_WAIT:
> /* The server initiated a shutdown of the socket */
> xprt->connect_cookie++;
> clear_bit(XPRT_CONNECTED, &xprt->state);
> + clear_bit(XPRT_CONNECTING, &xprt->state);
> xs_tcp_force_close(xprt);
> case TCP_CLOSING:
> /*
> @@ -1493,6 +1496,7 @@ static void xs_tcp_state_change(struct sock *sk)
> set_bit(XPRT_CLOSING, &xprt->state);
> smp_mb__before_atomic();
> clear_bit(XPRT_CONNECTED, &xprt->state);
> + clear_bit(XPRT_CONNECTING, &xprt->state);
> smp_mb__after_atomic();
> break;
> case TCP_CLOSE:
> @@ -2237,11 +2241,13 @@ static void xs_tcp_setup_socket(struct work_struct *work)
> xs_tcp_force_close(xprt);
> break;
> case 0:
> - case -EINPROGRESS:
> case -EALREADY:
> xprt_unlock_connect(xprt, transport);
> xprt_clear_connecting(xprt);
> return;
> + case -EINPROGRESS:
> + xprt_unlock_connect(xprt, transport);
> + return;
> case -EINVAL:
> /* Happens, for instance, if the user specified a link
> * local IPv6 address without a scope-id.
>
>
> --
> FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
> according to speedtest.net.
>
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

--
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

2015-09-17 14:06:36

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: NFS/TCP/IPv6 acting strangely in 4.2

On Wed, Sep 16, 2015 at 06:53:57AM +0000, Damien Th?bault wrote:
> On Fri, 2015-09-11 at 12:38 +0100, Russell King - ARM Linux wrote:
> > I have a recent Marvell Armada 388 board here which uses the mvneta
> > driver. I'm seeing some weird effects with NFS with it acting as a
> > client.
>
> Hello,
>
> I'm upgrading a Marvelle Armada 370 board using the mvneta driver from
> 4.0 to 4.2 and noticed issues with NFS booting.
> Basically, most of the time init returns with an error code, or
> programs segfault or throw illegal instructions.
>
> Since it worked fine on 4.0 I bisected until I found commit
> a84e32894191cfcbffa54180d78d7d4654d56c20 "net: mvneta: fix refilling
> for Rx DMA buffers".
>
> If I revert this commit, everything seems to get back to normal.
> Could you try it ? The two issues look very similar.

If you look at my original problem report, you'll see that has nothing
to do with the problem I'm seeing.

My problem is:

- TCP disconnects
- NFS tries to establish a new connection with the server, sending a SYN
- NFS server replies with a SYNACK
- NFS client immediately sends another SYN with a different sequence
number, so it's a _new_ attempt to connect to the NFS server.

At this point, the socket for the previous SYNACK'd connection has been
destroyed mid-setup.

This is because the sunrpc code is horribly racy - it doesn't block a
second attempt to call kernel_connect() on a socket which is already in
the process of connecting to the NFS server.

Even if the SYNACK had been corrupted (due to mvneta's rx code), that
has no bearing on the race in the sunrpc layer that destroys the previous
socket before the TCP SYN/SYNACK/ACK handshake has had a chance to
complete.

--
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

2015-09-17 14:18:53

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFS/TCP/IPv6 acting strangely in 4.2

Hi Russell,

On Thu, 2015-09-17 at 14:57 +0100, Russell King - ARM Linux wrote:
> On Fri, Sep 11, 2015 at 05:49:38PM +0100, Russell King - ARM Linux
> wrote:
> > Following that idea, I just tried the patch below, and it seems to
> > work.
> > I don't know whether it handles all cases after a call to
> > kernel_connect(),
> > but it stops the multiple connection attempts:
> >
> > 1 0.000000 armada388 -> n2100 TCP 1009→nfs [SYN] Seq=3794066539
> > Win=28560 Len=0 MSS=1440 SACK_PERM=1 TSval=15712 TSecr=870317691
> > WS=128
> > 2 0.000414 n2100 -> armada388 TCP nfs→1009 [SYN, ACK]
> > Seq=1884476522 Ack=3794066540 Win=28560 Len=0 MSS=1440 SACK_PERM=1
> > TSval=870318939 TSecr=15712 WS=64
> > 3 0.000787 armada388 -> n2100 TCP 1009→nfs [ACK] Seq=3794066540
> > Ack=1884476523 Win=28672 Len=0 TSval=15712 TSecr=870318939
> > 4 0.001304 armada388 -> n2100 NFS V3 ACCESS Call, FH:
> > 0x905379cc, [Check: RD LU MD XT DL]
> > 5 0.001566 n2100 -> armada388 TCP nfs→1009 [ACK] Seq=1884476523
> > Ack=3794066660 Win=28608 Len=0 TSval=870318939 TSecr=15712
> > 6 0.001640 armada388 -> n2100 NFS V3 ACCESS Call, FH:
> > 0x905379cc, [Check: RD LU MD XT DL]
> > 7 0.001866 n2100 -> armada388 TCP nfs→1009 [ACK] Seq=1884476523
> > Ack=3794066780 Win=28608 Len=0 TSval=870318939 TSecr=15712
> > 8 0.003070 n2100 -> armada388 NFS V3 ACCESS Reply (Call In 4),
> > [Allowed: RD LU MD XT DL]
> > 9 0.003415 armada388 -> n2100 TCP 1009→nfs [ACK] Seq=3794066780
> > Ack=1884476647 Win=28672 Len=0 TSval=15712 TSecr=870318939
> > 10 0.003592 armada388 -> n2100 NFS V3 ACCESS Call, FH:
> > 0xe15fc9c9, [Check: RD LU MD XT DL]
> > 11 0.004354 n2100 -> armada388 NFS V3 ACCESS Reply (Call In 6),
> > [Allowed: RD LU MD XT DL]
> > 12 0.004682 armada388 -> n2100 NFS V3 ACCESS Call, FH:
> > 0xe15fc9c9, [Check: RD LU MD XT DL]
> > 13 0.005365 n2100 -> armada388 NFS V3 ACCESS Reply (Call In 10),
> > [Allowed: RD LU MD XT DL]
> > 14 0.005701 armada388 -> n2100 NFS V3 GETATTR Call, FH:
> > 0xe15fc9c9
> > ...
>
> NFS people - any comments on this patch? Is it the correct way to
> solve
> this problem (please see the first message in this thread for the
> problem.)
> Without this patch, NFS is unusable as it tries to launch multiple
> new
> connections from the same port to the NFS server without giving the
> NFS
> server time to respond and establish the TCP connection.

I agree that it addresses a real problem here, however there are a
couple of issues with the patch itself:

AFAICS, the 2 possible next states for SYN_SENT are TCP_ESTABLISHED and
TCP_CLOSE, so if the connection attempt fails, this patch leaves the
XPRT_CONNECTING flag set.
There is also the issue that clearing XPRT_CONNECTING in TCP_FIN_WAIT1,
TCP_CLOSE_WAIT and TCP_CLOSING could interfere with another connection
attempt by canceling the XPRT_CONNECTING state.

How about the following? It is based on your patch, but adds a check to
ensure that xs_tcp_state_change() doesn't clear the 'connecting' state
more than once (which could otherwise still happen in the TCP_CLOSE
case).

8<-------------------------------------------------------------------

2015-09-17 16:27:54

by Benjamin Coddington

[permalink] [raw]
Subject: Re: NFS/TCP/IPv6 acting strangely in 4.2

On Thu, 17 Sep 2015, Trond Myklebust wrote:

> Hi Russell,
>
> On Thu, 2015-09-17 at 14:57 +0100, Russell King - ARM Linux wrote:
> > On Fri, Sep 11, 2015 at 05:49:38PM +0100, Russell King - ARM Linux
> > wrote:
> > > Following that idea, I just tried the patch below, and it seems to
> > > work.
> > > I don't know whether it handles all cases after a call to
> > > kernel_connect(),
> > > but it stops the multiple connection attempts:
> > >
> > > 1 0.000000 armada388 -> n2100 TCP 1009→nfs [SYN] Seq=3794066539
> > > Win=28560 Len=0 MSS=1440 SACK_PERM=1 TSval=15712 TSecr=870317691
> > > WS=128
> > > 2 0.000414 n2100 -> armada388 TCP nfs→1009 [SYN, ACK]
> > > Seq=1884476522 Ack=3794066540 Win=28560 Len=0 MSS=1440 SACK_PERM=1
> > > TSval=870318939 TSecr=15712 WS=64
> > > 3 0.000787 armada388 -> n2100 TCP 1009→nfs [ACK] Seq=3794066540
> > > Ack=1884476523 Win=28672 Len=0 TSval=15712 TSecr=870318939
> > > 4 0.001304 armada388 -> n2100 NFS V3 ACCESS Call, FH:
> > > 0x905379cc, [Check: RD LU MD XT DL]
> > > 5 0.001566 n2100 -> armada388 TCP nfs→1009 [ACK] Seq=1884476523
> > > Ack=3794066660 Win=28608 Len=0 TSval=870318939 TSecr=15712
> > > 6 0.001640 armada388 -> n2100 NFS V3 ACCESS Call, FH:
> > > 0x905379cc, [Check: RD LU MD XT DL]
> > > 7 0.001866 n2100 -> armada388 TCP nfs→1009 [ACK] Seq=1884476523
> > > Ack=3794066780 Win=28608 Len=0 TSval=870318939 TSecr=15712
> > > 8 0.003070 n2100 -> armada388 NFS V3 ACCESS Reply (Call In 4),
> > > [Allowed: RD LU MD XT DL]
> > > 9 0.003415 armada388 -> n2100 TCP 1009→nfs [ACK] Seq=3794066780
> > > Ack=1884476647 Win=28672 Len=0 TSval=15712 TSecr=870318939
> > > 10 0.003592 armada388 -> n2100 NFS V3 ACCESS Call, FH:
> > > 0xe15fc9c9, [Check: RD LU MD XT DL]
> > > 11 0.004354 n2100 -> armada388 NFS V3 ACCESS Reply (Call In 6),
> > > [Allowed: RD LU MD XT DL]
> > > 12 0.004682 armada388 -> n2100 NFS V3 ACCESS Call, FH:
> > > 0xe15fc9c9, [Check: RD LU MD XT DL]
> > > 13 0.005365 n2100 -> armada388 NFS V3 ACCESS Reply (Call In 10),
> > > [Allowed: RD LU MD XT DL]
> > > 14 0.005701 armada388 -> n2100 NFS V3 GETATTR Call, FH:
> > > 0xe15fc9c9
> > > ...
> >
> > NFS people - any comments on this patch? Is it the correct way to
> > solve
> > this problem (please see the first message in this thread for the
> > problem.)
> > Without this patch, NFS is unusable as it tries to launch multiple
> > new
> > connections from the same port to the NFS server without giving the
> > NFS
> > server time to respond and establish the TCP connection.
>
> I agree that it addresses a real problem here, however there are a
> couple of issues with the patch itself:
>
> AFAICS, the 2 possible next states for SYN_SENT are TCP_ESTABLISHED and
> TCP_CLOSE, so if the connection attempt fails, this patch leaves the
> XPRT_CONNECTING flag set.
> There is also the issue that clearing XPRT_CONNECTING in TCP_FIN_WAIT1,
> TCP_CLOSE_WAIT and TCP_CLOSING could interfere with another connection
> attempt by canceling the XPRT_CONNECTING state.
>
> How about the following? It is based on your patch, but adds a check to
> ensure that xs_tcp_state_change() doesn't clear the 'connecting' state
> more than once (which could otherwise still happen in the TCP_CLOSE
> case).
>
> 8<-------------------------------------------------------------------
> From 4dbfdebbc09982a9248866f8256549456e2b2efd Mon Sep 17 00:00:00 2001
> From: Trond Myklebust <[email protected]>
> Date: Wed, 16 Sep 2015 23:43:17 -0400
> Subject: [PATCH] SUNRPC: Ensure that we wait for connections to complete
> before retrying
>
> Commit 718ba5b87343, moved the responsibility for unlocking the socket to
> xs_tcp_setup_socket, meaning that the socket will be unlocked before we
> know that it has finished trying to connect. The following patch is based on
> an initial patch by Russell King to ensure that we delay clearing the
> XPRT_SOCK_CONNECTING flag until we either know that we failed to initiate
> a connection attempt, or the connection attempt itself failed.
>
> Fixes: 718ba5b87343 ("SUNRPC: Add helpers to prevent socket create from racing")
> Reported-by: Russell King <[email protected]>
> Signed-off-by: Trond Myklebust <[email protected]>

This fixes up my network segmentation problem, tested on top of your "Fix
races between socket connection and destroy code".

Tested-by: Benjamin Coddington <[email protected]>

Ben

> ---
> include/linux/sunrpc/xprtsock.h | 3 +++
> net/sunrpc/xprtsock.c | 11 ++++++++---
> 2 files changed, 11 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/sunrpc/xprtsock.h b/include/linux/sunrpc/xprtsock.h
> index 7591788e9fbf..357e44c1a46b 100644
> --- a/include/linux/sunrpc/xprtsock.h
> +++ b/include/linux/sunrpc/xprtsock.h
> @@ -42,6 +42,7 @@ struct sock_xprt {
> /*
> * Connection of transports
> */
> + unsigned long sock_state;
> struct delayed_work connect_worker;
> struct sockaddr_storage srcaddr;
> unsigned short srcport;
> @@ -76,6 +77,8 @@ struct sock_xprt {
> */
> #define TCP_RPC_REPLY (1UL << 6)
>
> +#define XPRT_SOCK_CONNECTING 1U
> +
> #endif /* __KERNEL__ */
>
> #endif /* _LINUX_SUNRPC_XPRTSOCK_H */
> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> index 7be90bc1a7c2..5bac27983e2a 100644
> --- a/net/sunrpc/xprtsock.c
> +++ b/net/sunrpc/xprtsock.c
> @@ -1435,6 +1435,7 @@ out:
> static void xs_tcp_state_change(struct sock *sk)
> {
> struct rpc_xprt *xprt;
> + struct sock_xprt *transport;
>
> read_lock_bh(&sk->sk_callback_lock);
> if (!(xprt = xprt_from_sock(sk)))
> @@ -1446,13 +1447,12 @@ static void xs_tcp_state_change(struct sock *sk)
> sock_flag(sk, SOCK_ZAPPED),
> sk->sk_shutdown);
>
> + transport = container_of(xprt, struct sock_xprt, xprt);
> trace_rpc_socket_state_change(xprt, sk->sk_socket);
> switch (sk->sk_state) {
> case TCP_ESTABLISHED:
> spin_lock(&xprt->transport_lock);
> if (!xprt_test_and_set_connected(xprt)) {
> - struct sock_xprt *transport = container_of(xprt,
> - struct sock_xprt, xprt);
>
> /* Reset TCP record info */
> transport->tcp_offset = 0;
> @@ -1461,6 +1461,8 @@ static void xs_tcp_state_change(struct sock *sk)
> transport->tcp_flags =
> TCP_RCV_COPY_FRAGHDR | TCP_RCV_COPY_XID;
> xprt->connect_cookie++;
> + clear_bit(XPRT_SOCK_CONNECTING, &transport->sock_state);
> + xprt_clear_connecting(xprt);
>
> xprt_wake_pending_tasks(xprt, -EAGAIN);
> }
> @@ -1496,6 +1498,9 @@ static void xs_tcp_state_change(struct sock *sk)
> smp_mb__after_atomic();
> break;
> case TCP_CLOSE:
> + if (test_and_clear_bit(XPRT_SOCK_CONNECTING,
> + &transport->sock_state))
> + xprt_clear_connecting(xprt);
> xs_sock_mark_closed(xprt);
> }
> out:
> @@ -2179,6 +2184,7 @@ static int xs_tcp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock)
> /* Tell the socket layer to start connecting... */
> xprt->stat.connect_count++;
> xprt->stat.connect_start = jiffies;
> + set_bit(XPRT_SOCK_CONNECTING, &transport->sock_state);
> ret = kernel_connect(sock, xs_addr(xprt), xprt->addrlen, O_NONBLOCK);
> switch (ret) {
> case 0:
> @@ -2240,7 +2246,6 @@ static void xs_tcp_setup_socket(struct work_struct *work)
> case -EINPROGRESS:
> case -EALREADY:
> xprt_unlock_connect(xprt, transport);
> - xprt_clear_connecting(xprt);
> return;
> case -EINVAL:
> /* Happens, for instance, if the user specified a link
> --
> 2.4.3
>
> --
> Trond Myklebust
> Linux NFS client maintainer, PrimaryData
> [email protected]
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2015-09-17 21:48:12

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: NFS/TCP/IPv6 acting strangely in 4.2

On Thu, Sep 17, 2015 at 10:18:29AM -0400, Trond Myklebust wrote:
> Hi Russell,
>
> On Thu, 2015-09-17 at 14:57 +0100, Russell King - ARM Linux wrote:
> > On Fri, Sep 11, 2015 at 05:49:38PM +0100, Russell King - ARM Linux
> > wrote:
> > > Following that idea, I just tried the patch below, and it seems to
> > > work.
> > > I don't know whether it handles all cases after a call to
> > > kernel_connect(),
> > > but it stops the multiple connection attempts:
> > >
> > > 1 0.000000 armada388 -> n2100 TCP 1009→nfs [SYN] Seq=3794066539
> > > Win=28560 Len=0 MSS=1440 SACK_PERM=1 TSval=15712 TSecr=870317691
> > > WS=128
> > > 2 0.000414 n2100 -> armada388 TCP nfs→1009 [SYN, ACK]
> > > Seq=1884476522 Ack=3794066540 Win=28560 Len=0 MSS=1440 SACK_PERM=1
> > > TSval=870318939 TSecr=15712 WS=64
> > > 3 0.000787 armada388 -> n2100 TCP 1009→nfs [ACK] Seq=3794066540
> > > Ack=1884476523 Win=28672 Len=0 TSval=15712 TSecr=870318939
> > > 4 0.001304 armada388 -> n2100 NFS V3 ACCESS Call, FH:
> > > 0x905379cc, [Check: RD LU MD XT DL]
> > > 5 0.001566 n2100 -> armada388 TCP nfs→1009 [ACK] Seq=1884476523
> > > Ack=3794066660 Win=28608 Len=0 TSval=870318939 TSecr=15712
> > > 6 0.001640 armada388 -> n2100 NFS V3 ACCESS Call, FH:
> > > 0x905379cc, [Check: RD LU MD XT DL]
> > > 7 0.001866 n2100 -> armada388 TCP nfs→1009 [ACK] Seq=1884476523
> > > Ack=3794066780 Win=28608 Len=0 TSval=870318939 TSecr=15712
> > > 8 0.003070 n2100 -> armada388 NFS V3 ACCESS Reply (Call In 4),
> > > [Allowed: RD LU MD XT DL]
> > > 9 0.003415 armada388 -> n2100 TCP 1009→nfs [ACK] Seq=3794066780
> > > Ack=1884476647 Win=28672 Len=0 TSval=15712 TSecr=870318939
> > > 10 0.003592 armada388 -> n2100 NFS V3 ACCESS Call, FH:
> > > 0xe15fc9c9, [Check: RD LU MD XT DL]
> > > 11 0.004354 n2100 -> armada388 NFS V3 ACCESS Reply (Call In 6),
> > > [Allowed: RD LU MD XT DL]
> > > 12 0.004682 armada388 -> n2100 NFS V3 ACCESS Call, FH:
> > > 0xe15fc9c9, [Check: RD LU MD XT DL]
> > > 13 0.005365 n2100 -> armada388 NFS V3 ACCESS Reply (Call In 10),
> > > [Allowed: RD LU MD XT DL]
> > > 14 0.005701 armada388 -> n2100 NFS V3 GETATTR Call, FH:
> > > 0xe15fc9c9
> > > ...
> >
> > NFS people - any comments on this patch? Is it the correct way to
> > solve
> > this problem (please see the first message in this thread for the
> > problem.)
> > Without this patch, NFS is unusable as it tries to launch multiple
> > new
> > connections from the same port to the NFS server without giving the
> > NFS
> > server time to respond and establish the TCP connection.
>
> I agree that it addresses a real problem here, however there are a
> couple of issues with the patch itself:
>
> AFAICS, the 2 possible next states for SYN_SENT are TCP_ESTABLISHED and
> TCP_CLOSE, so if the connection attempt fails, this patch leaves the
> XPRT_CONNECTING flag set.
> There is also the issue that clearing XPRT_CONNECTING in TCP_FIN_WAIT1,
> TCP_CLOSE_WAIT and TCP_CLOSING could interfere with another connection
> attempt by canceling the XPRT_CONNECTING state.
>
> How about the following? It is based on your patch, but adds a check to
> ensure that xs_tcp_state_change() doesn't clear the 'connecting' state
> more than once (which could otherwise still happen in the TCP_CLOSE
> case).

This patch also seems to fix the problem I've been seeing.

Yes, I wasn't sure about my patch - I didn't spend much time properly
reading and understanding the sunrpc code, beyond analysing what was
going on to cause the problem and deciding on a way to stop it happening.
I really wasn't sure that clearing the connecting flag everywhere I did
was the right thing, which is why I didn't send the patch properly
dressed up.

> 8<-------------------------------------------------------------------
> >From 4dbfdebbc09982a9248866f8256549456e2b2efd Mon Sep 17 00:00:00 2001
> From: Trond Myklebust <[email protected]>
> Date: Wed, 16 Sep 2015 23:43:17 -0400
> Subject: [PATCH] SUNRPC: Ensure that we wait for connections to complete
> before retrying
>
> Commit 718ba5b87343, moved the responsibility for unlocking the socket to
> xs_tcp_setup_socket, meaning that the socket will be unlocked before we
> know that it has finished trying to connect. The following patch is based on
> an initial patch by Russell King to ensure that we delay clearing the
> XPRT_SOCK_CONNECTING flag until we either know that we failed to initiate
> a connection attempt, or the connection attempt itself failed.
>
> Fixes: 718ba5b87343 ("SUNRPC: Add helpers to prevent socket create from racing")
> Reported-by: Russell King <[email protected]>

Reported-by: Russell King <[email protected]>
Tested-by: Russell King <[email protected]>

Thanks.

> Signed-off-by: Trond Myklebust <[email protected]>
> ---
> include/linux/sunrpc/xprtsock.h | 3 +++
> net/sunrpc/xprtsock.c | 11 ++++++++---
> 2 files changed, 11 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/sunrpc/xprtsock.h b/include/linux/sunrpc/xprtsock.h
> index 7591788e9fbf..357e44c1a46b 100644
> --- a/include/linux/sunrpc/xprtsock.h
> +++ b/include/linux/sunrpc/xprtsock.h
> @@ -42,6 +42,7 @@ struct sock_xprt {
> /*
> * Connection of transports
> */
> + unsigned long sock_state;
> struct delayed_work connect_worker;
> struct sockaddr_storage srcaddr;
> unsigned short srcport;
> @@ -76,6 +77,8 @@ struct sock_xprt {
> */
> #define TCP_RPC_REPLY (1UL << 6)
>
> +#define XPRT_SOCK_CONNECTING 1U
> +
> #endif /* __KERNEL__ */
>
> #endif /* _LINUX_SUNRPC_XPRTSOCK_H */
> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> index 7be90bc1a7c2..5bac27983e2a 100644
> --- a/net/sunrpc/xprtsock.c
> +++ b/net/sunrpc/xprtsock.c
> @@ -1435,6 +1435,7 @@ out:
> static void xs_tcp_state_change(struct sock *sk)
> {
> struct rpc_xprt *xprt;
> + struct sock_xprt *transport;
>
> read_lock_bh(&sk->sk_callback_lock);
> if (!(xprt = xprt_from_sock(sk)))
> @@ -1446,13 +1447,12 @@ static void xs_tcp_state_change(struct sock *sk)
> sock_flag(sk, SOCK_ZAPPED),
> sk->sk_shutdown);
>
> + transport = container_of(xprt, struct sock_xprt, xprt);
> trace_rpc_socket_state_change(xprt, sk->sk_socket);
> switch (sk->sk_state) {
> case TCP_ESTABLISHED:
> spin_lock(&xprt->transport_lock);
> if (!xprt_test_and_set_connected(xprt)) {
> - struct sock_xprt *transport = container_of(xprt,
> - struct sock_xprt, xprt);
>
> /* Reset TCP record info */
> transport->tcp_offset = 0;
> @@ -1461,6 +1461,8 @@ static void xs_tcp_state_change(struct sock *sk)
> transport->tcp_flags =
> TCP_RCV_COPY_FRAGHDR | TCP_RCV_COPY_XID;
> xprt->connect_cookie++;
> + clear_bit(XPRT_SOCK_CONNECTING, &transport->sock_state);
> + xprt_clear_connecting(xprt);
>
> xprt_wake_pending_tasks(xprt, -EAGAIN);
> }
> @@ -1496,6 +1498,9 @@ static void xs_tcp_state_change(struct sock *sk)
> smp_mb__after_atomic();
> break;
> case TCP_CLOSE:
> + if (test_and_clear_bit(XPRT_SOCK_CONNECTING,
> + &transport->sock_state))
> + xprt_clear_connecting(xprt);
> xs_sock_mark_closed(xprt);
> }
> out:
> @@ -2179,6 +2184,7 @@ static int xs_tcp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock)
> /* Tell the socket layer to start connecting... */
> xprt->stat.connect_count++;
> xprt->stat.connect_start = jiffies;
> + set_bit(XPRT_SOCK_CONNECTING, &transport->sock_state);
> ret = kernel_connect(sock, xs_addr(xprt), xprt->addrlen, O_NONBLOCK);
> switch (ret) {
> case 0:
> @@ -2240,7 +2246,6 @@ static void xs_tcp_setup_socket(struct work_struct *work)
> case -EINPROGRESS:
> case -EALREADY:
> xprt_unlock_connect(xprt, transport);
> - xprt_clear_connecting(xprt);
> return;
> case -EINVAL:
> /* Happens, for instance, if the user specified a link
> --
> 2.4.3
>
> --
> Trond Myklebust
> Linux NFS client maintainer, PrimaryData
> [email protected]
>
>
>

--
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

2015-09-17 22:03:21

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFS/TCP/IPv6 acting strangely in 4.2

On Thu, Sep 17, 2015 at 5:47 PM, Russell King - ARM Linux
<[email protected]> wrote:
> On Thu, Sep 17, 2015 at 10:18:29AM -0400, Trond Myklebust wrote:
>> Hi Russell,
>>
>> On Thu, 2015-09-17 at 14:57 +0100, Russell King - ARM Linux wrote:
>> > On Fri, Sep 11, 2015 at 05:49:38PM +0100, Russell King - ARM Linux
>> > wrote:
>> > > Following that idea, I just tried the patch below, and it seems to
>> > > work.
>> > > I don't know whether it handles all cases after a call to
>> > > kernel_connect(),
>> > > but it stops the multiple connection attempts:
>> > >
>> > > 1 0.000000 armada388 -> n2100 TCP 1009→nfs [SYN] Seq=3794066539
>> > > Win=28560 Len=0 MSS=1440 SACK_PERM=1 TSval=15712 TSecr=870317691
>> > > WS=128
>> > > 2 0.000414 n2100 -> armada388 TCP nfs→1009 [SYN, ACK]
>> > > Seq=1884476522 Ack=3794066540 Win=28560 Len=0 MSS=1440 SACK_PERM=1
>> > > TSval=870318939 TSecr=15712 WS=64
>> > > 3 0.000787 armada388 -> n2100 TCP 1009→nfs [ACK] Seq=3794066540
>> > > Ack=1884476523 Win=28672 Len=0 TSval=15712 TSecr=870318939
>> > > 4 0.001304 armada388 -> n2100 NFS V3 ACCESS Call, FH:
>> > > 0x905379cc, [Check: RD LU MD XT DL]
>> > > 5 0.001566 n2100 -> armada388 TCP nfs→1009 [ACK] Seq=1884476523
>> > > Ack=3794066660 Win=28608 Len=0 TSval=870318939 TSecr=15712
>> > > 6 0.001640 armada388 -> n2100 NFS V3 ACCESS Call, FH:
>> > > 0x905379cc, [Check: RD LU MD XT DL]
>> > > 7 0.001866 n2100 -> armada388 TCP nfs→1009 [ACK] Seq=1884476523
>> > > Ack=3794066780 Win=28608 Len=0 TSval=870318939 TSecr=15712
>> > > 8 0.003070 n2100 -> armada388 NFS V3 ACCESS Reply (Call In 4),
>> > > [Allowed: RD LU MD XT DL]
>> > > 9 0.003415 armada388 -> n2100 TCP 1009→nfs [ACK] Seq=3794066780
>> > > Ack=1884476647 Win=28672 Len=0 TSval=15712 TSecr=870318939
>> > > 10 0.003592 armada388 -> n2100 NFS V3 ACCESS Call, FH:
>> > > 0xe15fc9c9, [Check: RD LU MD XT DL]
>> > > 11 0.004354 n2100 -> armada388 NFS V3 ACCESS Reply (Call In 6),
>> > > [Allowed: RD LU MD XT DL]
>> > > 12 0.004682 armada388 -> n2100 NFS V3 ACCESS Call, FH:
>> > > 0xe15fc9c9, [Check: RD LU MD XT DL]
>> > > 13 0.005365 n2100 -> armada388 NFS V3 ACCESS Reply (Call In 10),
>> > > [Allowed: RD LU MD XT DL]
>> > > 14 0.005701 armada388 -> n2100 NFS V3 GETATTR Call, FH:
>> > > 0xe15fc9c9
>> > > ...
>> >
>> > NFS people - any comments on this patch? Is it the correct way to
>> > solve
>> > this problem (please see the first message in this thread for the
>> > problem.)
>> > Without this patch, NFS is unusable as it tries to launch multiple
>> > new
>> > connections from the same port to the NFS server without giving the
>> > NFS
>> > server time to respond and establish the TCP connection.
>>
>> I agree that it addresses a real problem here, however there are a
>> couple of issues with the patch itself:
>>
>> AFAICS, the 2 possible next states for SYN_SENT are TCP_ESTABLISHED and
>> TCP_CLOSE, so if the connection attempt fails, this patch leaves the
>> XPRT_CONNECTING flag set.
>> There is also the issue that clearing XPRT_CONNECTING in TCP_FIN_WAIT1,
>> TCP_CLOSE_WAIT and TCP_CLOSING could interfere with another connection
>> attempt by canceling the XPRT_CONNECTING state.
>>
>> How about the following? It is based on your patch, but adds a check to
>> ensure that xs_tcp_state_change() doesn't clear the 'connecting' state
>> more than once (which could otherwise still happen in the TCP_CLOSE
>> case).
>
> This patch also seems to fix the problem I've been seeing.
>
> Yes, I wasn't sure about my patch - I didn't spend much time properly
> reading and understanding the sunrpc code, beyond analysing what was
> going on to cause the problem and deciding on a way to stop it happening.
> I really wasn't sure that clearing the connecting flag everywhere I did
> was the right thing, which is why I didn't send the patch properly
> dressed up.
>
>> 8<-------------------------------------------------------------------
>> >From 4dbfdebbc09982a9248866f8256549456e2b2efd Mon Sep 17 00:00:00 2001
>> From: Trond Myklebust <[email protected]>
>> Date: Wed, 16 Sep 2015 23:43:17 -0400
>> Subject: [PATCH] SUNRPC: Ensure that we wait for connections to complete
>> before retrying
>>
>> Commit 718ba5b87343, moved the responsibility for unlocking the socket to
>> xs_tcp_setup_socket, meaning that the socket will be unlocked before we
>> know that it has finished trying to connect. The following patch is based on
>> an initial patch by Russell King to ensure that we delay clearing the
>> XPRT_SOCK_CONNECTING flag until we either know that we failed to initiate
>> a connection attempt, or the connection attempt itself failed.
>>
>> Fixes: 718ba5b87343 ("SUNRPC: Add helpers to prevent socket create from racing")
>> Reported-by: Russell King <[email protected]>
>
> Reported-by: Russell King <[email protected]>
> Tested-by: Russell King <[email protected]>
>

Thanks Russell!

2015-09-17 22:03:51

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFS/TCP/IPv6 acting strangely in 4.2

On Thu, Sep 17, 2015 at 12:27 PM, Benjamin Coddington
<[email protected]> wrote:
> On Thu, 17 Sep 2015, Trond Myklebust wrote:
>
>> Hi Russell,
>>
>> On Thu, 2015-09-17 at 14:57 +0100, Russell King - ARM Linux wrote:
>> > On Fri, Sep 11, 2015 at 05:49:38PM +0100, Russell King - ARM Linux
>> > wrote:
>> > > Following that idea, I just tried the patch below, and it seems to
>> > > work.
>> > > I don't know whether it handles all cases after a call to
>> > > kernel_connect(),
>> > > but it stops the multiple connection attempts:
>> > >
>> > > 1 0.000000 armada388 -> n2100 TCP 1009→nfs [SYN] Seq=3794066539
>> > > Win=28560 Len=0 MSS=1440 SACK_PERM=1 TSval=15712 TSecr=870317691
>> > > WS=128
>> > > 2 0.000414 n2100 -> armada388 TCP nfs→1009 [SYN, ACK]
>> > > Seq=1884476522 Ack=3794066540 Win=28560 Len=0 MSS=1440 SACK_PERM=1
>> > > TSval=870318939 TSecr=15712 WS=64
>> > > 3 0.000787 armada388 -> n2100 TCP 1009→nfs [ACK] Seq=3794066540
>> > > Ack=1884476523 Win=28672 Len=0 TSval=15712 TSecr=870318939
>> > > 4 0.001304 armada388 -> n2100 NFS V3 ACCESS Call, FH:
>> > > 0x905379cc, [Check: RD LU MD XT DL]
>> > > 5 0.001566 n2100 -> armada388 TCP nfs→1009 [ACK] Seq=1884476523
>> > > Ack=3794066660 Win=28608 Len=0 TSval=870318939 TSecr=15712
>> > > 6 0.001640 armada388 -> n2100 NFS V3 ACCESS Call, FH:
>> > > 0x905379cc, [Check: RD LU MD XT DL]
>> > > 7 0.001866 n2100 -> armada388 TCP nfs→1009 [ACK] Seq=1884476523
>> > > Ack=3794066780 Win=28608 Len=0 TSval=870318939 TSecr=15712
>> > > 8 0.003070 n2100 -> armada388 NFS V3 ACCESS Reply (Call In 4),
>> > > [Allowed: RD LU MD XT DL]
>> > > 9 0.003415 armada388 -> n2100 TCP 1009→nfs [ACK] Seq=3794066780
>> > > Ack=1884476647 Win=28672 Len=0 TSval=15712 TSecr=870318939
>> > > 10 0.003592 armada388 -> n2100 NFS V3 ACCESS Call, FH:
>> > > 0xe15fc9c9, [Check: RD LU MD XT DL]
>> > > 11 0.004354 n2100 -> armada388 NFS V3 ACCESS Reply (Call In 6),
>> > > [Allowed: RD LU MD XT DL]
>> > > 12 0.004682 armada388 -> n2100 NFS V3 ACCESS Call, FH:
>> > > 0xe15fc9c9, [Check: RD LU MD XT DL]
>> > > 13 0.005365 n2100 -> armada388 NFS V3 ACCESS Reply (Call In 10),
>> > > [Allowed: RD LU MD XT DL]
>> > > 14 0.005701 armada388 -> n2100 NFS V3 GETATTR Call, FH:
>> > > 0xe15fc9c9
>> > > ...
>> >
>> > NFS people - any comments on this patch? Is it the correct way to
>> > solve
>> > this problem (please see the first message in this thread for the
>> > problem.)
>> > Without this patch, NFS is unusable as it tries to launch multiple
>> > new
>> > connections from the same port to the NFS server without giving the
>> > NFS
>> > server time to respond and establish the TCP connection.
>>
>> I agree that it addresses a real problem here, however there are a
>> couple of issues with the patch itself:
>>
>> AFAICS, the 2 possible next states for SYN_SENT are TCP_ESTABLISHED and
>> TCP_CLOSE, so if the connection attempt fails, this patch leaves the
>> XPRT_CONNECTING flag set.
>> There is also the issue that clearing XPRT_CONNECTING in TCP_FIN_WAIT1,
>> TCP_CLOSE_WAIT and TCP_CLOSING could interfere with another connection
>> attempt by canceling the XPRT_CONNECTING state.
>>
>> How about the following? It is based on your patch, but adds a check to
>> ensure that xs_tcp_state_change() doesn't clear the 'connecting' state
>> more than once (which could otherwise still happen in the TCP_CLOSE
>> case).
>>
>> 8<-------------------------------------------------------------------
>> From 4dbfdebbc09982a9248866f8256549456e2b2efd Mon Sep 17 00:00:00 2001
>> From: Trond Myklebust <[email protected]>
>> Date: Wed, 16 Sep 2015 23:43:17 -0400
>> Subject: [PATCH] SUNRPC: Ensure that we wait for connections to complete
>> before retrying
>>
>> Commit 718ba5b87343, moved the responsibility for unlocking the socket to
>> xs_tcp_setup_socket, meaning that the socket will be unlocked before we
>> know that it has finished trying to connect. The following patch is based on
>> an initial patch by Russell King to ensure that we delay clearing the
>> XPRT_SOCK_CONNECTING flag until we either know that we failed to initiate
>> a connection attempt, or the connection attempt itself failed.
>>
>> Fixes: 718ba5b87343 ("SUNRPC: Add helpers to prevent socket create from racing")
>> Reported-by: Russell King <[email protected]>
>> Signed-off-by: Trond Myklebust <[email protected]>
>
> This fixes up my network segmentation problem, tested on top of your "Fix
> races between socket connection and destroy code".
>
> Tested-by: Benjamin Coddington <[email protected]>
>

Thanks Ben!