2002-10-30 10:38:34

by Miquel van Smoorenburg

[permalink] [raw]
Subject: Re: TCP hangs in 2.4 - blocking write() in wait_for_tcp_memory

In article <[email protected]>,
Miquel van Smoorenburg <[email protected]> wrote:
>On the gateway machine, the proxy consistantly hangs in a write().
>I've replaced the squid proxy with a simple perl script + nc to
>make sure it isn't a squid-related problem..

Right, I found the cause of the problem, but I'm not sure if the
application of the kernel is wrong here.

On 2 machines do this:

machine1# socket -s 12345 < /dev/zero > /dev/null # server
machine2# socket -w machine1 12345 < /dev/zero # client

The first command starts a listening process on port 12345, that
sends an infinite stream of zeros to the remote side and sinks
all data received.

The second command connects to the first machine, sends an
infinite stream of zeros, but never does a read() on the socket
(the '-w' option).

The 'socket' program doesn't make the sockets non-blocking, it just
does a select() loop to find out readability/writeability on the
file descriptors.

This makes both socket programs hang in write(), in wait_for_tcp_memory.
Shouldn't the kernel return a short write, instead of hanging
both processes ? select() returned writeability.

As I described in my first mail, this happens in the real world
as well - an application is writing lots of data to the remote
side, while the remote side is sending data too - hang.

Oh, tested it on 2.4.19 and 2.4.20-pre11

Mike.


2002-10-30 12:21:52

by bert hubert

[permalink] [raw]
Subject: Re: TCP hangs in 2.4 - blocking write() in wait_for_tcp_memory

On Wed, Oct 30, 2002 at 10:44:20AM +0000, Miquel van Smoorenburg wrote:

> This makes both socket programs hang in write(), in wait_for_tcp_memory.
> Shouldn't the kernel return a short write, instead of hanging
> both processes ? select() returned writeability.

write(2) is allowed to do a short write on a blocking socket, but not
mandated to do so. In fact I've only seen short writes under
linux on non-blocking sockets.

SuSv3 says:

Blocking/immediate: Blocking is only possible with O_NONBLOCK clear. If
there is enough space for all the data requested to be written immediately,
the implementation should do so. Otherwise, the process may block; that is,
pause until enough space is available for writing. The effective size of a
pipe or FIFO (the maximum amount that can be written in one operation
without blocking) may vary dynamically, depending on the implementation, so
it is not possible to specify a fixed value for it.

...

Partial and deferred writes are only possible with O_NONBLOCK set.

Regards,

bert

--
http://www.PowerDNS.com Versatile DNS Software & Services
http://lartc.org Linux Advanced Routing & Traffic Control HOWTO