2005-01-05 08:37:57

by Hubert Tonneau

[permalink] [raw]
Subject: 2.6.10 TCP troubles

Here is the senario:
the Linux machine is writting through libsmbclient
to an OSX machine running Samba

Switching the Linux machine from 2.6.8 to 2.6.10 made the network speed
drop drastically: 20 seconds with 2.6.8, 800 seconds with 2.6.10


2005-01-05 12:08:42

by Francois Romieu

[permalink] [raw]
Subject: Re: 2.6.10 TCP troubles

Hubert Tonneau <[email protected]> :
> Here is the senario:
> the Linux machine is writting through libsmbclient
> to an OSX machine running Samba
>
> Switching the Linux machine from 2.6.8 to 2.6.10 made the network speed
> drop drastically: 20 seconds with 2.6.8, 800 seconds with 2.6.10

Are there any differences in:
- dmesg output
- /proc/interrupts
- disk traffic
- tcpdump output (of course there will)

--
Ueimor

2005-01-05 13:15:33

by Hubert Tonneau

[permalink] [raw]
Subject: Re: 2.6.10 TCP troubles

Francois Romieu wrote:
>
> Hubert Tonneau <[email protected]> :
> > Here is the senario:
> > the Linux machine is writting through libsmbclient
> > to an OSX machine running Samba
> >
> > Switching the Linux machine from 2.6.8 to 2.6.10 made the network speed
> > drop drastically: 20 seconds with 2.6.8, 800 seconds with 2.6.10
>
> Are there any differences in:
> - dmesg output

No.

> - /proc/interrupts
> - disk traffic
> - tcpdump output (of course there will)

I cannot see anymore since it's our main production server, so I switched
back at once. Sorry about that. Anyway, both network traffic and disk traffic
was very low.

The problem is not related to the Linux machine beeing slow because the network
exchange was very fast with other gigabit with flow control connected machines.
The problem seems to me to be related to the way the TCP layer is handling small
troubles (probably lost packets on the Mac side because the Linux machine is
gigabit connected to the switch, with flow control enabled, and the Mac is
100 Mbps connected, full duplex, but without flow control).

Please notice that the Linux machine is the client, and is pushing files to
the Mac, which is quite unusual. If the Mac was the client pulling files from
the PC, I bet things might be very different.

2005-01-05 15:09:06

by Alan

[permalink] [raw]
Subject: Re: 2.6.10 TCP troubles

On Mer, 2005-01-05 at 12:50, Hubert Tonneau wrote:
> troubles (probably lost packets on the Mac side because the Linux machine is
> gigabit connected to the switch, with flow control enabled, and the Mac is
> 100 Mbps connected, full duplex, but without flow control).

Through a firewall ?

2005-01-05 15:54:46

by Barry K. Nathan

[permalink] [raw]
Subject: Re: 2.6.10 TCP troubles

On Wed, Jan 05, 2005 at 12:50:57PM +0000, Hubert Tonneau wrote:
[quote reformatted to fit within 80 columns]
> The problem seems to me to be related to the way the TCP layer is
> handling small troubles (probably lost packets on the Mac side because
> the Linux machine is gigabit connected to the switch, with flow control
> enabled, and the Mac is 100 Mbps connected, full duplex, but without
> flow control).

What OS is the Mac running? If it's Mac OS, then is it Mac OS X or is it
an earlier version?

-Barry K. Nathan <[email protected]>

2005-01-05 15:54:45

by Hubert Tonneau

[permalink] [raw]
Subject: Re: 2.6.10 TCP troubles

Alan Cox wrote:
>
> Mail agent comments:
> Sending server is suspicious.
>
> On Mer, 2005-01-05 at 12:50, Hubert Tonneau wrote:
> > troubles (probably lost packets on the Mac side because the Linux machine is
> > gigabit connected to the switch, with flow control enabled, and the Mac is
> > 100 Mbps connected, full duplex, but without flow control).
>
> Through a firewall ?

No:

Mac <-> 100 Mbps switch <-> gigabit switch <-> Linux

One possible explaination, even if unlikely, might be that Linux 2.6.10 is
faster than 2.6.8, so the Mac start missing packets.

If you want me to make tests, I can switch back to 2.6.10 at night, perform
tests, and switch back to 2.6.8 before production resumes in the morning.

2005-01-05 16:04:02

by Hubert Tonneau

[permalink] [raw]
Subject: Re: 2.6.10 TCP troubles

Barry K. Nathan wrote:
>
> Mail agent comments:
> Sender configuration is suspicious.
>
> On Wed, Jan 05, 2005 at 12:50:57PM +0000, Hubert Tonneau wrote:
> [quote reformatted to fit within 80 columns]
> > The problem seems to me to be related to the way the TCP layer is
> > handling small troubles (probably lost packets on the Mac side because
> > the Linux machine is gigabit connected to the switch, with flow control
> > enabled, and the Mac is 100 Mbps connected, full duplex, but without
> > flow control).
>
> What OS is the Mac running? If it's Mac OS, then is it Mac OS X or is it
> an earlier version?

The Mac are all running OSX.


2005-01-05 18:48:40

by Francois Romieu

[permalink] [raw]
Subject: Re: 2.6.10 TCP troubles

Hubert Tonneau <[email protected]> :
[...]
> The problem seems to me to be related to the way the TCP layer is handling small
> troubles (probably lost packets on the Mac side because the Linux machine is
> gigabit connected to the switch, with flow control enabled, and the Mac is
> 100 Mbps connected, full duplex, but without flow control).

tcpdump should enlighten it.

--
Ueimor

2005-01-05 23:34:17

by Stephen Hemminger

[permalink] [raw]
Subject: Re: 2.6.10 TCP troubles

On Wed, 05 Jan 2005 08:13:17 GMT
Hubert Tonneau <[email protected]> wrote:

> Here is the senario:
> the Linux machine is writting through libsmbclient
> to an OSX machine running Samba
>
> Switching the Linux machine from 2.6.8 to 2.6.10 made the network speed
> drop drastically: 20 seconds with 2.6.8, 800 seconds with 2.6.10
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

Some possiblities:
2.6.8 still had the broken TCP segmentation offload that didn't obey
congestion/slow start. Are you using hardware that supports TSO?
Does 2.6.8 behaviour change if you turn TSO off with ethtool?

Is there window scaling or other issues? Does 2.6.10 get faster if
you turn of window scaling sys.net.ipv4.tcp_window_scaling=0?
Is there a window scale corrupting firewall (like OpenBSD pf)
in the way?

Is there more packet loss on the router or the Mac?

2005-01-05 23:52:59

by Francois Romieu

[permalink] [raw]
Subject: Re: 2.6.10 TCP troubles

Stephen Hemminger <[email protected]> :
[...]
> Is there window scaling or other issues? Does 2.6.10 get faster if

Bingo !

tcpdump exhibits a 2^5 factor differences in the advertised window when
the network speed is low.

--
Ueimor