Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S270120AbTGMFTU (ORCPT ); Sun, 13 Jul 2003 01:19:20 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S270122AbTGMFTU (ORCPT ); Sun, 13 Jul 2003 01:19:20 -0400 Received: from pizda.ninka.net ([216.101.162.242]:13504 "EHLO pizda.ninka.net") by vger.kernel.org with ESMTP id S270120AbTGMFTL (ORCPT ); Sun, 13 Jul 2003 01:19:11 -0400 Date: Sat, 12 Jul 2003 22:24:57 -0700 From: "David S. Miller" To: Davide Libenzi Cc: e0206@foo21.com, linux-kernel@vger.kernel.org, kuznet@ms2.inr.ac.ru Subject: Re: [Patch][RFC] epoll and half closed TCP connections Message-Id: <20030712222457.3d132897.davem@redhat.com> In-Reply-To: References: <20030712181654.GB15643@srv.foo21.com> X-Mailer: Sylpheed version 0.9.2 (GTK+ 1.2.6; sparc-unknown-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4008 Lines: 101 On Sat, 12 Jul 2003 13:01:21 -0700 (PDT) Davide Libenzi wrote: > > [Cc:ing DaveM ] [Cc:ing Alexey :-) ] Alexey, they seem to want to add some kind of POLLRDHUP thing, comments wrt. TCP and elsewhere in the networking? See below... > On Sat, 12 Jul 2003, Eric Varsanyi wrote: > > > I'm proposing adding a new POLL event type (POLLRDHUP) as way to solve > > a new race introduced by having an edge triggered event mechanism > > (epoll). The problem occurs when a client writes data and then does a > > write side shutdown(). The server (using epoll) sees only one event for > > the read data ready and the read EOF condition and has no way to tell > > that an EOF occurred. > > > > -Eric Varsanyi > > > > Details > > ----------- > > - remote sends data and does a shutdown > > [ we see a data bearing packet and FIN from client on the wire ] > > > > - user mode server gets around to doing accept() and registers > > for EPOLLIN events (along with HUP and ERR which are forced on) > > > > - epoll_wait() returns a single EPOLLIN event on the FD representing > > both the 1/2 shutdown state and data available > > > > At this point there is no way the app can tell if there is a half closed > > connection so it may issue a close() back to the client after writing > > results. Normally the server would distinguish these events by assuming > > EOF if it got a read ready indication and the first read returned 0 bytes, > > or would issue read calls until less data was returned than was asked for. > > > > In a level triggered world this all just works because the read ready > > indication is driven back to the app as long as the socket state is half > > closed. The event driven epoll mechanism folds these two indications > > together and thus loses one 'edge'. > > > > One would be tempted to issue an extra read() after getting back less than > > expected, but this is an extra system call on every read event and you get > > back the same '0' bytes that you get if the buffer is just empty. The only > > sure bet seems to be CTL_MODding the FD to force a re-poll (which would > > cost a syscall and hash-lookup in eventpoll for every read event). > > > > Yes, this is overhead that should be avoided. It is true that you could > use Level Triggered events, but if you structured your app on edge you > should be able to solve this w/out overhead. > > > > > 2) add a new 1/2 closed event type that a poll routine can return > > > > The implementation is trivial, a patch is included below. If this idea sees > > favor I'll fix the other architectures, ipv6, epoll.h, and make a 'real' > > patch. I do not believe any drivers deserve to be modified to return this > > new event. > > This looks good to me. David what do you think ? > > > > > diff -Naur linux-2.4.20/include/asm-i386/poll.h linux-2.4.20_ev/include/asm-i386/poll.h > > --- linux-2.4.20/include/asm-i386/poll.h Thu Jan 23 13:01:28 1997 > > +++ linux-2.4.20_ev/include/asm-i386/poll.h Sat Jul 12 12:29:11 2003 > > @@ -15,6 +15,7 @@ > > #define POLLWRNORM 0x0100 > > #define POLLWRBAND 0x0200 > > #define POLLMSG 0x0400 > > +#define POLLRDHUP 0x0800 > > > > struct pollfd { > > int fd; > > diff -Naur linux-2.4.20/net/ipv4/tcp.c linux-2.4.20_ev/net/ipv4/tcp.c > > --- linux-2.4.20/net/ipv4/tcp.c Tue Jul 8 09:40:42 2003 > > +++ linux-2.4.20_ev/net/ipv4/tcp.c Sat Jul 12 12:29:56 2003 > > @@ -424,7 +424,7 @@ > > if (sk->shutdown == SHUTDOWN_MASK || sk->state == TCP_CLOSE) > > mask |= POLLHUP; > > if (sk->shutdown & RCV_SHUTDOWN) > > - mask |= POLLIN | POLLRDNORM; > > + mask |= POLLIN | POLLRDNORM | POLLRDHUP; > > > > /* Connected? */ > > if ((1 << sk->state) & ~(TCPF_SYN_SENT|TCPF_SYN_RECV)) { > > > > > > - Davide - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/