Received: by 2002:a05:6a10:7420:0:0:0:0 with SMTP id hk32csp598742pxb; Tue, 15 Feb 2022 23:19:49 -0800 (PST) X-Google-Smtp-Source: ABdhPJx5LSjBD5Uh1qGY9p3NOCaDinmgEMIHPJhfe/7oQMXZAX5BcS0BIdqCc5TZwLpvKxIP+dfn X-Received: by 2002:a17:90a:120f:b0:1b8:7ba9:e48f with SMTP id f15-20020a17090a120f00b001b87ba9e48fmr257595pja.59.1644995989591; Tue, 15 Feb 2022 23:19:49 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1644995989; cv=none; d=google.com; s=arc-20160816; b=eauTwuvjCsmSraoaWdI+09Lc5AI7NTIpQymESdJGSmoDj3THn//VXjMMJ+ZJ4Lp1cr c75cmiaKSNkmSUNORwMZk8r8rAk4HLaIds1+YSp0uAYQYUVednQyz5VU7CfcwI6Zcon9 vl1oqZRIVjfSMMnujdboxBgv7T9FCvDZwAZHqpy1IgmbN+LY1RGS4xNCwHG1D7vy1Fsk mb4lO/SHM3eJagN5Sfdw+2LqycyE+w0rE3JPwZvF3g9ZIxrX0paLNMJp3jxILH/rWWGl es6scUtda4ddjNIGhUqf7ocOyenOv1QevH6J6w/GBGqigRWJP3ouWnKBlyrEkM6MFLdV xLzA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=yEqo+yEi0wxQB63arUbCfhchM0JYNi78JrYcx/T2VIE=; b=y75lsEZw3FnEpdCyfi5Fd9cpNn/joyQD2Lnx8+rBCqbCG3Q+FruJgHOKBMnSs6uduY hwkBTgNew46oLwVc2detX+9JgMo4Oau2ZI8BXyDDFFFin708eVcltiEbgaQDdSsVXzVq 4L7YGnad469xf3L/qzFQjRskgH4d683dSrOTIJh0lC663QLb4iEkIdh2kkgyxrKehVdo haxGJeY4PpxLQPglRFDNwWFK4h3jrhTlKfIEmGeMLOXCfi18/d8DT6Xu3qCloJD3q517 rm7N5fHz6MBdr89e4sSh+K98MqDD6rifw4RY9LES3zqm0Ar9Yf/afK9MsQA1ySgA+xiZ o6Cg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id f13si3960323plg.262.2022.02.15.23.19.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 15 Feb 2022 23:19:49 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 6CEAF278282; Tue, 15 Feb 2022 22:51:07 -0800 (PST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239931AbiBPDuK (ORCPT + 99 others); Tue, 15 Feb 2022 22:50:10 -0500 Received: from mxb-00190b01.gslb.pphosted.com ([23.128.96.19]:39748 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229452AbiBPDuJ (ORCPT ); Tue, 15 Feb 2022 22:50:09 -0500 Received: from zeniv-ca.linux.org.uk (zeniv-ca.linux.org.uk [IPv6:2607:5300:60:148a::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 58B4B1FA6F; Tue, 15 Feb 2022 19:49:58 -0800 (PST) Received: from viro by zeniv-ca.linux.org.uk with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1nKBKG-002AeZ-O2; Wed, 16 Feb 2022 03:49:56 +0000 Date: Wed, 16 Feb 2022 03:49:56 +0000 From: Al Viro To: Stephen Brennan Cc: linux-kernel@vger.kernel.org, Luis Chamberlain , Andrew Morton , Jan Kara , linux-fsdevel@vger.kernel.org, Arnd Bergmann , Amir Goldstein Subject: Re: [PATCH v2 1/4] dcache: sweep cached negative dentries to the end of list of siblings Message-ID: References: <20220209231406.187668-1-stephen.s.brennan@oracle.com> <20220209231406.187668-2-stephen.s.brennan@oracle.com> <875ypf8s5m.fsf@stepbren-lnx.us.oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: Al Viro X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RDNS_NONE, SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Feb 16, 2022 at 03:27:39AM +0000, Al Viro wrote: > On Tue, Feb 15, 2022 at 06:24:53PM -0800, Stephen Brennan wrote: > > > It seems to me that, if we had taken a reference on child by > > incrementing the reference count prior to unlocking it, then > > dentry_unlist could never have been called, since we would never have > > made it into __dentry_kill. child would still be on the list, and any > > cursor (or sweep_negative) list updates would now be reflected in > > child->d_child.next. But dput is definitely not safe while holding a > > lock on a parent dentry (even more so now thanks to my patch), so that > > is out of the question. > > > > Would dput_to_list be an appropriate solution to that issue? We can > > maintain a dispose list in d_walk and then for any dput which really > > drops the refcount to 0, we can handle them after d_walk is done. It > > shouldn't be that many dentries anyway. > > Interesting idea, but... what happens to behaviour of e.g. > shrink_dcache_parent()? You'd obviously need to modify the test in > select_collect(), but then the selected dentries become likely candidates > for d_walk() itself wanting to move them over to its internal shrink list. > OTOH, __dput_to_list() will just decrement the count and skip the sucker > if it's already on a shrink list... > > It might work, but it really needs a careful analysis wrt. > parallel d_walk(). What happens when you have two threads hitting > shrink_dcache_parent() on two different places, one being an ancestor > of another? That can happen in parallel, and currently it does work > correctly, but that's fairly delicate and there are places where a minor > change could turn O(n) into O(n^2), etc. > > Let me think about that - I'm not saying it's hopeless, and it > would be nice to avoid that subtlety in dentry_unlist(), but there > might be dragons. PS: another obvious change is that d_walk() would become blocking. So e.g. int path_has_submounts(const struct path *parent) { struct check_mount data = { .mnt = parent->mnt, .mounted = 0 }; read_seqlock_excl(&mount_lock); d_walk(parent->dentry, &data, path_check_mount); read_sequnlock_excl(&mount_lock); return data.mounted; } would need a rework - d_walk() is under a spinlock here. Another potential headache in that respect is d_genocide() - currently non-blocking, with this change extremely likely to do evictions. That, however, is not a problem for current in-tree callers - they are all shortly followed by shrink_dcache_parent() or equivalents. path_has_submounts(), though... I'd really hate to reintroduce the "call this on entry/call this on exit" callbacks. Perhaps it would be better to pass the dispose list to d_walk() and have the callers deal with evictions? For that matter, shrink_dcache_parent() and friends would be just fine passing the same list they are collecting into. *growl* autofs_d_automount() has it called under sbi->fs_lock. So we'd need to take the disposal all the way out there, and export shrink_dentry_list() while we are at it. Not pretty ;-/ And no, we can't make the disposal async, so offloading it to a worker or thread is not feasible...