Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756231Ab0GINGG (ORCPT ); Fri, 9 Jul 2010 09:06:06 -0400 Received: from out01.mta.xmission.com ([166.70.13.231]:44295 "EHLO out01.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755851Ab0GINGE (ORCPT ); Fri, 9 Jul 2010 09:06:04 -0400 To: Oleg Nesterov Cc: Pavel Emelyanov , Andrew Morton , Linux Containers , linux-kernel@vger.kernel.org, Sukadev Bhattiprolu References: <1277399329-18087-1-git-send-email-louis.rilling@kerlabs.com> <20100624191843.GA14205@redhat.com> <20100625102303.GG3773@hawkmoon.kerlabs.com> <20100625183733.GA2627@us.ibm.com> <20100625192945.GA25532@redhat.com> <20100625212618.GA11917@us.ibm.com> <20100625212758.GA30474@redhat.com> <20100625220713.GA31123@us.ibm.com> <20100709121425.GB18586@hawkmoon.kerlabs.com> From: ebiederm@xmission.com (Eric W. Biederman) Date: Fri, 09 Jul 2010 06:05:54 -0700 In-Reply-To: <20100709121425.GB18586@hawkmoon.kerlabs.com> (Louis Rilling's message of "Fri\, 9 Jul 2010 14\:14\:25 +0200") Message-ID: User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-XM-SPF: eid=;;;mid=;;;hst=in01.mta.xmission.com;;;ip=67.188.4.80;;;frm=ebiederm@xmission.com;;;spf=neutral X-SA-Exim-Connect-IP: 67.188.4.80 X-SA-Exim-Mail-From: ebiederm@xmission.com X-Spam-Report: * -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP * 1.5 XMNoVowels Alpha-numberic number with no vowels * 1.5 TR_Symld_Words too many words that have symbols inside * 0.0 T_TM2_M_HEADER_IN_MSG BODY: T_TM2_M_HEADER_IN_MSG * -3.0 BAYES_00 BODY: Bayes spam probability is 0 to 1% * [score: 0.0000] * -0.0 DCC_CHECK_NEGATIVE Not listed in DCC * [sa06 0; Body=1 Fuz1=1 Fuz2=1] * 0.0 T_TooManySym_01 4+ unique symbols in subject * 0.1 XMSolicitRefs_0 Weightloss drug * 0.4 UNTRUSTED_Relay Comes from a non-trusted relay X-Spam-DCC: ; sa06 0; Body=1 Fuz1=1 Fuz2=1 X-Spam-Combo: ;Oleg Nesterov X-Spam-Relay-Country: _RELAYCOUNTRY_ Subject: Re: [RFC][PATCH 2/2] pidns: Remove proc flush races when a pid namespaces are exiting. X-SA-Exim-Version: 4.2.1 (built Thu, 25 Oct 2007 00:26:12 +0000) X-SA-Exim-Scanned: Yes (on in01.mta.xmission.com) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2501 Lines: 67 Louis Rilling writes: > On 08/07/10 21:39 -0700, Eric W. Biederman wrote: >> >> Currently it is possible to put proc_mnt before we have flushed the >> last process that will use the proc_mnt to flush it's proc entries. >> >> This race is fixed by not flushing proc entries for dead pid >> namespaces, and calling pid_ns_release_proc unconditionally from >> zap_pid_ns_processes after the pid namespace has been declared dead. > > One comment below. > >> >> To ensure we don't unnecessarily leak any dcache entries with skipped >> flushes pid_ns_release_proc flushes the entire proc_mnt when it is >> called. >> >> Signed-off-by: Eric W. Biederman >> --- >> fs/proc/base.c | 9 +++++---- >> fs/proc/root.c | 3 +++ >> kernel/pid_namespace.c | 1 + >> 3 files changed, 9 insertions(+), 4 deletions(-) >> >> diff --git a/fs/proc/base.c b/fs/proc/base.c >> index acb7ef8..e9d84e1 100644 >> --- a/fs/proc/base.c >> +++ b/fs/proc/base.c >> @@ -2742,13 +2742,14 @@ void proc_flush_task(struct task_struct *task) >> >> for (i = 0; i <= pid->level; i++) { >> upid = &pid->numbers[i]; >> + >> + /* Don't bother flushing dead pid namespaces */ >> + if (test_bit(PIDNS_DEAD, &upid->ns->flags)) >> + continue; >> + > > IMHO, nothing prevents zap_pid_ns_processes() from setting PIDNS_DEAD and > calling pid_ns_release_proc() right now. zap_pid_ns_processes() does not wait > for EXIT_DEAD (self-reaping) children to be released. Good point we need something probably a lock to prevent proc_mnt from going away here. We might do a little better if we were starting with a specific dentry, those at least have some rcu properties but that isn't a big help. Hmm. Perhaps there is a way to completely restructure this flushing of dentries. It is just an optimization after all so we don't get too many stale dentries building up. It might just be worth it simply kill proc_flush_mnt altogether. I know it is measurable when we don't do the flushing but perhaps there can be a work struct that periodically wakes up and smacks stale proc dentries. Right now I really don't think proc_flush_task is worth the hassle it causes. Grumble, Grumble more thinking to do. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/