Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757931AbYFMGbU (ORCPT ); Fri, 13 Jun 2008 02:31:20 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753189AbYFMGbA (ORCPT ); Fri, 13 Jun 2008 02:31:00 -0400 Received: from mx2.mail.elte.hu ([157.181.151.9]:60645 "EHLO mx2.mail.elte.hu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753051AbYFMGaz (ORCPT ); Fri, 13 Jun 2008 02:30:55 -0400 Date: Fri, 13 Jun 2008 08:30:37 +0200 From: Ingo Molnar To: David Miller Cc: kuznet@ms2.inr.ac.ru, vgusev@openvz.org, mcmanus@ducksong.com, xemul@openvz.org, netdev@vger.kernel.org, ilpo.jarvinen@helsinki.fi, linux-kernel@vger.kernel.org Subject: Re: [TCP]: TCP_DEFER_ACCEPT causes leak sockets Message-ID: <20080613063037.GA16943@elte.hu> References: <200806111658.41182.vgusev@openvz.org> <20080611135718.GA26914@ms2.inr.ac.ru> <20080611.165255.242691774.davem@davemloft.net> <20080612.163212.148965080.davem@davemloft.net> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20080612.163212.148965080.davem@davemloft.net> User-Agent: Mutt/1.5.18 (2008-05-17) X-ELTE-VirusStatus: clean X-ELTE-SpamScore: -1.5 X-ELTE-SpamLevel: X-ELTE-SpamCheck: no X-ELTE-SpamVersion: ELTE 2.0 X-ELTE-SpamCheck-Details: score=-1.5 required=5.9 tests=BAYES_00 autolearn=no SpamAssassin version=3.2.3 -1.5 BAYES_00 BODY: Bayesian spam probability is 0 to 1% [score: 0.0000] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3250 Lines: 84 * David Miller wrote: > From: David Miller > Date: Wed, 11 Jun 2008 16:52:55 -0700 (PDT) > > > More and more, the arguments are mounting to completely revert the > > established code path changes, and frankly that is likely what I am > > going to do by the end of today. > > Here is the revert patch I intend to send to Linus: > > tcp: Revert 'process defer accept as established' changes. > > This reverts two changesets, ec3c0982a2dd1e671bad8e9d26c28dcba0039d87 > ("[TCP]: TCP_DEFER_ACCEPT updates - process as established") and > the follow-on bug fix 9ae27e0adbf471c7a6b80102e38e1d5a346b3b38 > ("tcp: Fix slab corruption with ipv6 and tcp6fuzz"). > > This change causes several problems, first reported by Ingo Molnar > as a distcc-over-loopback regression where connections were getting > stuck. > > Ilpo J?rvinen first spotted the locking problems. The new function > added by this code, tcp_defer_accept_check(), only has the > child socket locked, yet it is modifying state of the parent > listening socket. > > Fixing that is non-trivial at best, because we can't simply just grab > the parent listening socket lock at this point, because it would > create an ABBA deadlock. The normal ordering is parent listening > socket --> child socket, but this code path would require the > reverse lock ordering. > > Next is a problem noticed by Vitaliy Gusev, he noted: > > ---------------------------------------- > >--- a/net/ipv4/tcp_timer.c > >+++ b/net/ipv4/tcp_timer.c > >@@ -481,6 +481,11 @@ static void tcp_keepalive_timer (unsigned long data) > > goto death; > > } > > > >+ if (tp->defer_tcp_accept.request && sk->sk_state == TCP_ESTABLISHED) { > >+ tcp_send_active_reset(sk, GFP_ATOMIC); > >+ goto death; > > Here socket sk is not attached to listening socket's request queue. tcp_done() > will not call inet_csk_destroy_sock() (and tcp_v4_destroy_sock() which should > release this sk) as socket is not DEAD. Therefore socket sk will be lost for > freeing. > ---------------------------------------- > > Finally, Alexey Kuznetsov argues that there might not even be any > real value or advantage to these new semantics even if we fix all > of the bugs: > > ---------------------------------------- > Hiding from accept() sockets with only out-of-order data only > is the only thing which is impossible with old approach. Is this really > so valuable? My opinion: no, this is nothing but a new loophole > to consume memory without control. > ---------------------------------------- > > So revert this thing for now. > > Signed-off-by: David S. Miller the 3 reverts have been extensively tested in -tip via: # tip/out-of-tree: 9e5b6ca: tcp: revert DEFER_ACCEPT modifications and the distcc problems are fixed. (The locking fix alone did not fix it conclusively in my testing, possibly due to the follow-on observations outlined in your description.) Tested-by: Ingo Molnar Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/