2022-06-21 00:43:52

by Ryan P. Nicholl

[permalink] [raw]
Subject: Networking Question

I've been unable to find any Linux API for asynchronously waiting for the TCP send buffer to drain.

The problem I have, in a nutshell, is noted in this part of the documentation:

If fildes refers to a socket, close() shall cause the socket to
be destroyed. If the socket is in connection-mode, and the
SO_LINGER option is set for the socket with non-zero linger time,
and the socket has untransmitted data, then close() shall block
for up to the current linger interval until all data is
transmitted.

Ok, so not good for asynchronous programming, so I could disable the SO_LINGER option, but that leaves me with another problem, namely that I *want* the socket to linger.

The behavior I want is something like, calling "close", getting EAGAIN instead of triggering TCP RST, and something like EPOLLWRITEFLUSHED to wait for the TCP send buffer to be drained. I know neither of these are possible.

Right now the only solution I can think of is to enable SO_LINGER and spawn a thread to run close in, but this might spawn a lot of threads, and doesn't support cancellation well.

Alternatively, I could call getsockopt with TCP_INFO in a loop, but this triggers a lot of wake-ups and might result in sockets hanging around a lot longer than they need to.

I want to allow linger indefinitely on close until some event happens like running out of ram or other resources, basically to intelligently do something like: "OK, we're running low on RAM/resources, time to send RST and drop the send buffer for the 5k worst behaving connections". So unfortunately even with the timeout provided by SO_LINGER, even assuming close would somehow complete in the background, this could be an issue.

Is there any way to do this properly on Linux? If not, any possibility that something like adding EPOLLWRITEFLUSHED would be a welcome addition?

Please CC me on responses.

--
Ryan P. Nicholl


2022-06-21 08:55:53

by Bernd Petrovitsch

[permalink] [raw]
Subject: Re: Networking Question

Hi all!

On 21/06/2022 02:29, Ryan P. Nicholl wrote:
> I've been unable to find any Linux API for asynchronously waiting for the TCP send buffer to drain.
>
> The problem I have, in a nutshell, is noted in this part of the documentation:
>
> If fildes refers to a socket, close() shall cause the socket to
> be destroyed. If the socket is in connection-mode, and the

That's not really a Linux kernel question as such (because that should
work that way on all TCP connections anywhere) but the shutdown()
syscall is probably what you need:
- your side shuts down the sending part of the socket.
- the other side reads data and gets eventually EOF
- the other side call shutdown() for it's sending side when it's done.
- your side gets EOF.
And then your side knows that no data is in flight.
- finally, you clean up with close(). You can shutdown() the receiving
side too but doesn't change anything.

[ deleted SO_LINGER stuff - that's for something completelly different ... ]

Kind regards,
Bernd
--
Bernd Petrovitsch Email : [email protected]
There is NO CLOUD, just other people's computers. - FSFE
LUGA : http://www.luga.at

2022-06-21 16:37:45

by Ryan P. Nicholl

[permalink] [raw]
Subject: Re: Networking Question

Thanks for taking time to respond to this question, but unfortunately Linux's shutdown(2) cannot do what I need. This isn't a general question since POSIX has no standard async API other than AIO which isn't implemented efficiently in GNU/Linux, and I don't think the kernel supports any aio calls directly, instead exposing clone and epoll_* facilities to solve the concurrency issues. However; none of them do what I need efficiently.

In theory, shutdown(2) could be used, the problem is that it's just not efficiently scalable to 100s of thousands or millions of connections. I'm not speaking on behalf of my employeer, but I work for a financial company that processes a lot of network traffic.

What I want essentially is similar to TIME_WAIT but instead of sending an RST packet I want to be notified when the tcp connection is actually drained.

If you call shutdown, my understanding is that you get one of two behaviors:
1. You send RST and immediately discard the send buffer.
2. The call blocks for the SO_LINGER timeout.

Option 1 isn't acceptable because it gives the wrong behavior. Option 2 can be made to work but the downside is that Linux only exposes a synchronous API for it which requires me to have at least 1 thread per shutdown operation. So it could be quite bad if the network is very congested. Unfortunately threads are many times more expensive than sockets/tcp connections. So ideally having few threads and many sockets gives the best use of resources.



--
Ryan P. Nicholl


------- Original Message -------
On Tuesday, June 21st, 2022 at 3:02 AM, Bernd Petrovitsch <[email protected]> wrote:


> Hi all!
>
> On 21/06/2022 02:29, Ryan P. Nicholl wrote:
>
> > I've been unable to find any Linux API for asynchronously waiting for the TCP send buffer to drain.
> >
> > The problem I have, in a nutshell, is noted in this part of the documentation:
> >
> > If fildes refers to a socket, close() shall cause the socket to
> > be destroyed. If the socket is in connection-mode, and the
>
>
> That's not really a Linux kernel question as such (because that should
> work that way on all TCP connections anywhere) but the shutdown()
> syscall is probably what you need:
> - your side shuts down the sending part of the socket.
> - the other side reads data and gets eventually EOF
> - the other side call shutdown() for it's sending side when it's done.
> - your side gets EOF.
> And then your side knows that no data is in flight.
> - finally, you clean up with close(). You can shutdown() the receiving
> side too but doesn't change anything.
>
> [ deleted SO_LINGER stuff - that's for something completelly different ... ]
>
> Kind regards,
> Bernd
> --
> Bernd Petrovitsch Email : [email protected]
> There is NO CLOUD, just other people's computers. - FSFE
> LUGA : http://www.luga.at