Received: by 2002:a05:6a10:7420:0:0:0:0 with SMTP id hk32csp1828513pxb; Fri, 18 Feb 2022 16:54:47 -0800 (PST) X-Google-Smtp-Source: ABdhPJyN6Q1c8aDjxQk4mEwI9Jpr6OguE+R1gRcNLeaN/6ibk0PlR+An1v61VmlZafTovd6onJn5 X-Received: by 2002:a17:906:e13:b0:6ce:21ca:1b91 with SMTP id l19-20020a1709060e1300b006ce21ca1b91mr8691248eji.193.1645232087105; Fri, 18 Feb 2022 16:54:47 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1645232087; cv=none; d=google.com; s=arc-20160816; b=R4fwFzkHR41o29i8l+t++5f3uY5pZhqsyBglnlrthPzMubf5X1aNY98V2CUTOHF55P 7uSgodG132i5zIT8J0QztUDplVH1u9KgBbRnXOrR6zLkSaN9T3j9MXM7RJaqkwzaYqC2 WXJEfqdhzwBSmP1OzegG819oLZq47zb9AL73xsNylDZV97EJgrZ9zUVS51aHrqEnKkBL ZHDYd/effPiq1k0TR/ATbm/dhR+YJCrfVGP4VqxOlLniImrRH8bxD8QaOXftwNCMldM2 dWHCpbkRyt5yPUFNiVs95vpOjvLqXqDSq70dQwksICL93LmOAeyat36zJkxjsb+gsBnO TpIQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=9KBqZ51SHy/SmtV1KvulX04+dbPYuKUPPqc6FquDfH4=; b=awWglyPM9GPSF7oCkwzFXIlqDkSjb4bPZw5R7upCttRbnKKT9UFP4o3xtmxJ235zhl 4GDkupcg19d1IL+wtL6p8ljtp1lrLCnBpbYxu7Ek0nKAkr8NCV17rUMLL4G8c/CUHtRm 89NMdKmdx33hTOx8FvDhj5AQ+TysZIr+VtH0qmJz0AH8s7rinIqZIdGQ36DxRb+2jwN1 J+iiLRVWMQf81cYHDl589jdTCuMePHz4fQkDc/tKShZZ80QwLKuG8EvgBqqmg5RCMxfO fSuhk9spohEmzeCGe/EOD2Rox8C52/yPFvpS+qAHHtVKOFlndxdDI0KUoeBY9huB3xnS OW+Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id nc4si6259186ejc.938.2022.02.18.16.54.24; Fri, 18 Feb 2022 16:54:47 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236436AbiBRVGx (ORCPT + 99 others); Fri, 18 Feb 2022 16:06:53 -0500 Received: from mxb-00190b01.gslb.pphosted.com ([23.128.96.19]:38300 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231464AbiBRVGw (ORCPT ); Fri, 18 Feb 2022 16:06:52 -0500 Received: from zeniv-ca.linux.org.uk (zeniv-ca.linux.org.uk [IPv6:2607:5300:60:148a::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5B3CB261F; Fri, 18 Feb 2022 13:06:33 -0800 (PST) Received: from viro by zeniv-ca.linux.org.uk with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1nLASV-002pya-10; Fri, 18 Feb 2022 21:06:31 +0000 Date: Fri, 18 Feb 2022 21:06:31 +0000 From: Al Viro To: Rik van Riel Cc: linux-kernel@vger.kernel.org, kernel-team@fb.com, linux-fsdevel@vger.kernel.org, paulmck@kernel.org, gscrivan@redhat.com, Eric Biederman , Chris Mason Subject: Re: [PATCH 1/2] vfs: free vfsmount through rcu work from kern_unmount Message-ID: References: <20220218183114.2867528-1-riel@surriel.com> <20220218183114.2867528-2-riel@surriel.com> <5f442a7770fe4ac06b2837e4f937d559f5d17b8b.camel@surriel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: Sender: Al Viro X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Feb 18, 2022 at 08:24:09PM +0000, Al Viro wrote: > On Fri, Feb 18, 2022 at 07:43:43PM +0000, Al Viro wrote: > > On Fri, Feb 18, 2022 at 02:33:31PM -0500, Rik van Riel wrote: > > > On Fri, 2022-02-18 at 19:26 +0000, Al Viro wrote: > > > > On Fri, Feb 18, 2022 at 01:31:13PM -0500, Rik van Riel wrote: > > > > > After kern_unmount returns, callers can no longer access the > > > > > vfsmount structure. However, the vfsmount structure does need > > > > > to be kept around until the end of the RCU grace period, to > > > > > make sure other accesses have all gone away too. > > > > > > > > > > This can be accomplished by either gating each kern_unmount > > > > > on synchronize_rcu (the comment in the code says it all), or > > > > > by deferring the freeing until the next grace period, where > > > > > it needs to be handled in a workqueue due to the locking in > > > > > mntput_no_expire(). > > > > > > > > NAK.? There's code that relies upon kern_unmount() being > > > > synchronous.? That's precisely the reason why MNT_INTERNAL > > > > is treated that way in mntput_no_expire(). > > > > > > Fair enough. Should I make a kern_unmount_rcu() version > > > that gets called just from mq_put_mnt()? > > > > Umm... I'm not sure you can afford having struct ipc_namespace > > freed and reused before the mqueue superblock gets at least to > > deactivate_locked_super(). > > BTW, that's a good demonstration of the problems with making those > beasts async. struct mount is *not* accessed past kern_unmount(), > but the objects used by the superblock might very well be - in > this case they (struct ipc_namespace, pointed to by s->s_fs_data) > are freed by the caller after kern_unmount() returns. And possibly > reused. Now note that they are used as search keys by > mqueue_get_tree() and it becomes very fishy. > > If you want to go that way, make it something like > > void put_ipc_ns(struct ipc_namespace *ns) > { > if (refcount_dec_and_lock(&ns->ns.count, &mq_lock)) { > mq_clear_sbinfo(ns); > spin_unlock(&mq_lock); > kern_unmount_rcu(ns->mq_mnt); > } > } > > and give mqueue this for ->kill_sb(): > > static void mqueue_kill_super(struct super_block *sb) > { > struct ipc_namespace *ns = sb->s_fs_info; > kill_litter_super(sb); > do the rest of free_ipc_ns(); > } > > One thing: kern_unmount_rcu() needs a very big warning about > the caution needed from its callers. It's really not safe > for general use, and it will be a temptation for folks with > scalability problems like this one to just use it instead of > kern_unmount() and declare the problem solved. FWIW, that won't work correctly wrt failure exits. I'm digging through the lifetime rules in there right now, will post when I'm done.