2014-02-08 09:18:38

by Thomas Glanzmann

[permalink] [raw]
Subject: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]

Hello Eric,

> * Thomas Glanzmann <[email protected]> [2014-02-07 08:55]:
> > Creating a 4 TB VMFS filesystem over iSCSI takes 24 seconds on 3.12
> > and 15 minutes on 3.14.0-rc2+.

* Nicholas A. Bellinger <[email protected]> [2014-02-07 20:30]:
> Would it be possible to try a couple of different stable kernel
> versions to help track this down?

I bisected[1] it and found the offending commit f54b311 tcp auto corking
[2] 'if we have a small send and a previous packet is already in the
qdisc or device queue, defer until TX completion or we get more data.'
- Description by David S. Miller

I gathered a pcap with tcp_autocorking on and off.

On: - took 4 seconds to create a 500 GB VMFS file system
https://thomas.glanzmann.de/tmp/tcp_auto_corking_on.pcap.bz2
https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:45:43.png
https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:52:28.png

Off: - took 2 minutes 24 seconds to create a 500 GB VMFS file system
sysctl net.ipv4.tcp_autocorking=0
https://thomas.glanzmann.de/tmp/tcp_auto_corking_off.pcap.bz2
https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:46:34.png
https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:53:17.png

First graph can be generated by opening bunziping the file, opening it
in wireshark and select Statistics > IO Grap and change the unit to
Bytes/Tick. The second graph can be generated by selecting Statistics >
TCP Stream Graph > Round Trip Time.

You can also see that the round trip time increases by factor 25 at
least.

I once saw a similar problem with dealyed ACK packets of the
paravirtulized network driver in xen it caused that the tcp window
filled up and slowed down the throughput from 30 MB/s to less than 100
KB/s the symptom was that the login to a Windows desktop took more than
10 minutes while it used to be below 30 seconds because the profile of
the user was loaded slowly from a CIFS server. At that time the culprit
were also delayed small packets: ACK packets in the CIFS case. However I
only proofed iSCSI regression so far for tcp auto corking but assume we
will see many others if we leave it enabled.

I found the problem by doing the following:
- I compiled kernel by executing the following commands:
yes '' | make oldconfig
time make -j 24
/ make modules_install
/ mkinitramfs -o /boot/initrd.img-bisect <version>

- I cleaned the iSCSI configuration after each test by issuing:
/etc/init.d/target stop
rm /iscsi?/* /etc/target/*

- I configured iSCSI after each reboot
cat > lio-v101.conf <<EOF
set global auto_cd_after_create=false
/backstores/fileio create shared-01.v101.campusvl.de /iscsi1/shared-01.v101.campusvl.de size=500G buffered=true

/iscsi create iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de
/iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/portals create 10.101.99.4
/iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/portals create 10.101.99.5
/iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/portals create 10.102.99.4
/iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/portals create 10.102.99.5
/iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/luns create /backstores/fileio/shared-01.v101.campusvl.de lun=10
/iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/ set attribute authentication=0 demo_mode_write_protect=0 generate_node_acls=1 cache_dynamic_acls=1

saveconfig
yes
EOF
targetcli < lio-v101.conf
And configured a fresh booted ESXi 5.5.0 1331820 via autodeploy
to the iSCSI target, configured the portal, rescanned and
created a 500 GB VMFS 5 filesystem and noticed the time if it
was longer than 2 minutes it was bad if it was below 10 seconds
it was good.
git bisect good/bad

My network config is:

auto bond0
iface bond0 inet static
address 10.100.4.62
netmask 255.255.0.0
gateway 10.100.0.1
slaves eth0 eth1
bond-mode 802.3ad
bond-miimon 100

auto bond0.101
iface bond0.101 inet static
address 10.101.99.4
netmask 255.255.0.0

auto bond1
iface bond1 inet static
address 10.100.5.62
netmask 255.255.0.0
slaves eth2 eth3
bond-mode 802.3ad
bond-miimon 100

auto bond1.101
iface bond1.101 inet static
address 10.101.99.5
netmask 255.255.0.0

I propose to disable tcp_autocorking by default because it obviously degrades
iSCSI performance and probably many other protocols. Also the commit mentions
that applications can explicitly disable auto corking we probably should do
that for the iSCSI target, but I don't know how. Anyone?

[1] http://pbot.rmdir.de/a65q6MjgV36tZnn5jS-DUQ
[2] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=f54b311142a92ea2e42598e347b84e1655caf8e3

Cheers,
Thomas


2014-02-08 09:19:48

by Thomas Glanzmann

[permalink] [raw]
Subject: [PATCH] tcp: disable auto corking by default

When using auto corking with iSCSI the round trip time at least increases by
factor 25 probably more. Other protocols are very likely also effected.

Signed-off-by: Thomas Glanzmann <[email protected]>
---
net/ipv4/tcp.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 4475b3b..da563a4 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -285,7 +285,7 @@ int sysctl_tcp_fin_timeout __read_mostly = TCP_FIN_TIMEOUT;

int sysctl_tcp_min_tso_segs __read_mostly = 2;

-int sysctl_tcp_autocorking __read_mostly = 1;
+int sysctl_tcp_autocorking __read_mostly = 0;

struct percpu_counter tcp_orphan_count;
EXPORT_SYMBOL_GPL(tcp_orphan_count);
--
1.7.10.4

2014-02-08 15:04:14

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH] tcp: disable auto corking by default

On Sat, 2014-02-08 at 10:19 +0100, Thomas Glanzmann wrote:
> When using auto corking with iSCSI the round trip time at least increases by
> factor 25 probably more. Other protocols are very likely also effected.
>
> Signed-off-by: Thomas Glanzmann <[email protected]>
> ---
> net/ipv4/tcp.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)

I think there is no hurry.

We should let auto corking on during 3.14 development cycle so that we
can fix the bugs, and thing of some optimizations.

auto cork gives a strong incentive to applications to use
TCP_CORK/MSG_MORE to avoid overhead of sending multiple small segments.

In the normal case, the extra delay is something like 10 us, so if an
application is really hit by this delay, its a strong sign it could be
improved, especially if auto corking is off.

Lets wait the end of 3.14 dev cycle before considering this patch.

Don't shoot the messenger :)

Thanks !

2014-02-08 16:55:43

by Thomas Glanzmann

[permalink] [raw]
Subject: Re: [PATCH] tcp: disable auto corking by default

Hello Eric,

> > Disable auto corking by default

> We should let auto corking on during 3.14 development cycle so that we
> can fix the bugs, and thing of some optimizations.

I agree that leaving it enabled helps to find bugs, however I'm not
happy with the round trip time degradation.

> auto cork gives a strong incentive to applications to use
> TCP_CORK/MSG_MORE to avoid overhead of sending multiple small
> segments.

I agree. But if it breaks the application many people won't be happy,
for example I spend already 5 hours to track it down.

> In the normal case, the extra delay is something like 10 us, so if an
> application is really hit by this delay, its a strong sign it could be
> improved, especially if auto corking is off.

Yes, but 230 micro seconds for others. :-(

> Lets wait the end of 3.14 dev cycle before considering this patch.

I agree.

Btw. I mixed up the pcaps for autocork on and off, so I moved the files
that they know show what they should show.

Cheers,
Thomas

2014-02-08 17:12:42

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH] tcp: disable auto corking by default

On Sat, 2014-02-08 at 17:55 +0100, Thomas Glanzmann wrote:
> Hello Eric,
>
> > > Disable auto corking by default
>
> > We should let auto corking on during 3.14 development cycle so that we
> > can fix the bugs, and thing of some optimizations.
>
> I agree that leaving it enabled helps to find bugs, however I'm not
> happy with the round trip time degradation.
>
> > auto cork gives a strong incentive to applications to use
> > TCP_CORK/MSG_MORE to avoid overhead of sending multiple small
> > segments.
>
> I agree. But if it breaks the application many people won't be happy,
> for example I spend already 5 hours to track it down.

Sure, but if we put this flag to zero, nobody will ever use it and find
any bug.

Thanks for running latest git tree and be part of linux improvement.

If we can add the MSG_MORE at the right place, your workload might gain
~20% exec time, and maybe 30% better efficiency, since you'll divide by
2 the total number of network segments.

Just to be clear : No stable kernel has yet any issue, right ?


2014-02-08 17:20:27

by Thomas Glanzmann

[permalink] [raw]
Subject: Re: [PATCH] tcp: disable auto corking by default

Hello Eric,

> Sure, but if we put this flag to zero, nobody will ever use it and
> find any bug.

I agree.

> If we can add the MSG_MORE at the right place, your workload might gain
> ~20% exec time, and maybe 30% better efficiency, since you'll divide by
> 2 the total number of network segments.

That is in fact promising.

> Just to be clear: No stable kernel has yet any issue, right?

Not with TCP CORK as it was recently introduced in the development
branch but it will become stable at one point.

Cheers,
Thomas