Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751862Ab1BSIgH (ORCPT ); Sat, 19 Feb 2011 03:36:07 -0500 Received: from mail-bw0-f46.google.com ([209.85.214.46]:45162 "EHLO mail-bw0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750998Ab1BSIgE (ORCPT ); Sat, 19 Feb 2011 03:36:04 -0500 DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=subject:from:to:cc:in-reply-to:references:content-type:date :message-id:mime-version:x-mailer:content-transfer-encoding; b=sU1OY19kbx3MOBnjaCY+GcvzE+lFIPWozNZ26eiDjAKv2UjCAq+b9hqvvu+aqLuJjd cFTa8WINdop+MFVBdtZ2IZyvysyYqslNVwXEQ2lCX7Jm2MWRL7Uuue+KZcR30nU3w3PW VDyqkcy9U5+QoDQOZxayWOP5Xj2MYNCTK5TZU= Subject: [PATCH] tcp: fix inet_twsk_deschedule() From: Eric Dumazet To: "Eric W. Biederman" , David Miller Cc: Arnaldo Carvalho de Melo , Linus Torvalds , Michal Hocko , Ingo Molnar , linux-mm@kvack.org, LKML , netdev@vger.kernel.org, Pavel Emelyanov , Daniel Lezcano In-Reply-To: References: <20110217163531.GF14168@elte.hu> <20110218122938.GB26779@tiehlicka.suse.cz> <20110218162623.GD4862@tiehlicka.suse.cz> <20110218190128.GF13211@ghostprotocols.net> <20110218191146.GG13211@ghostprotocols.net> Content-Type: text/plain; charset="UTF-8" Date: Sat, 19 Feb 2011 09:35:56 +0100 Message-ID: <1298104556.8559.21.camel@edumazet-laptop> Mime-Version: 1.0 X-Mailer: Evolution 2.30.3 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4919 Lines: 124 Le vendredi 18 février 2011 à 12:38 -0800, Eric W. Biederman a écrit : > Arnaldo Carvalho de Melo writes: > > > Em Fri, Feb 18, 2011 at 05:01:28PM -0200, Arnaldo Carvalho de Melo escreveu: > >> Em Fri, Feb 18, 2011 at 10:48:18AM -0800, Linus Torvalds escreveu: > >> > This seems to be a fairly straightforward bug. > >> > > >> > In net/ipv4/inet_timewait_sock.c we have this: > >> > > >> > /* These are always called from BH context. See callers in > >> > * tcp_input.c to verify this. > >> > */ > >> > > >> > /* This is for handling early-kills of TIME_WAIT sockets. */ > >> > void inet_twsk_deschedule(struct inet_timewait_sock *tw, > >> > struct inet_timewait_death_row *twdr) > >> > { > >> > spin_lock(&twdr->death_lock); > >> > .. > >> > > >> > and the intention is clearly that that spin_lock is BH-safe because > >> > it's called from BH context. > >> > > >> > Except that clearly isn't true. It's called from a worker thread: > >> > > >> > > stack backtrace: > >> > > Pid: 10833, comm: kworker/u:1 Not tainted 2.6.38-rc4-359399.2010AroraKernelBeta.fc14.x86_64 #1 > >> > > Call Trace: > >> > > [] ? inet_twsk_deschedule+0x29/0xa0 > >> > > [] ? inet_twsk_purge+0xf6/0x180 > >> > > [] ? inet_twsk_purge+0x30/0x180 > >> > > [] ? tcp_sk_exit_batch+0x1c/0x20 > >> > > [] ? ops_exit_list.clone.0+0x53/0x60 > >> > > [] ? cleanup_net+0x100/0x1b0 > >> > > [] ? process_one_work+0x187/0x4b0 > >> > > [] ? process_one_work+0x121/0x4b0 > >> > > [] ? cleanup_net+0x0/0x1b0 > >> > > [] ? worker_thread+0x15c/0x330 > >> > > >> > so it can deadlock with a BH happening at the same time, afaik. > >> > > >> > The code (and comment) is all from 2005, it looks like the BH->worker > >> > thread has broken the code. But somebody who knows that code better > >> > should take a deeper look at it. > >> > > >> > Added acme to the cc, since the code is attributed to him back in 2005 > >> > ;). Although I don't know how active he's been in networking lately > >> > (seems to be all perf-related). Whatever, it can't hurt. > >> > >> Original code is ANK's, I just made it possible to use with DCCP, and > >> yeah, the smiley is appropriate, something 6 years old and the world > >> around it changing continually... well, thanks for the git blame ;-) > > > > But yeah, your analisys seems correct, with the bug being introduced by > > one of these world around it changing continually issues, networking > > namespaces broke the rules of the game on its cleanup_net() routine, > > adding Pavel to the CC list since it doesn't hurt ;-) > > Which probably gets the bug back around to me. > > I guess this must be one of those ipv4 cases that where the cleanup > simply did not exist in the rmmod sense that we had to invent. > > I think that was Daniel who did the time wait sockets. I do remember > they were a real pain. > > Would a bh_disable be sufficient? I guess I should stop remembering and > look at the code now. > Here is the patch to fix the problem Daniel commit (d315492b1a6ba29d (netns : fix kernel panic in timewait socket destruction) was OK (it did use local_bh_disable()) Problem comes from commit 575f4cd5a5b6394577 (net: Use rcu lookups in inet_twsk_purge.) added in 2.6.33 Thanks ! [PATCH] tcp: fix inet_twsk_deschedule() Eric W. Biederman reported a lockdep splat in inet_twsk_deschedule() This is caused by inet_twsk_purge(), run from process context, and commit 575f4cd5a5b6394577 (net: Use rcu lookups in inet_twsk_purge.) removed the BH disabling that was necessary. Add the BH disabling but fine grained, right before calling inet_twsk_deschedule(), instead of whole function. With help from Linus Torvalds and Eric W. Biederman Reported-by: Eric W. Biederman Signed-off-by: Eric Dumazet CC: Daniel Lezcano CC: Pavel Emelyanov CC: Arnaldo Carvalho de Melo CC: stable (# 2.6.33+) --- net/ipv4/inet_timewait_sock.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c index c5af909..3c8dfa1 100644 --- a/net/ipv4/inet_timewait_sock.c +++ b/net/ipv4/inet_timewait_sock.c @@ -505,7 +505,9 @@ restart: } rcu_read_unlock(); + local_bh_disable(); inet_twsk_deschedule(tw, twdr); + local_bh_enable(); inet_twsk_put(tw); goto restart_rcu; } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/