Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp434669pxj; Wed, 2 Jun 2021 02:55:22 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwhPpuscq3GI4IY5/T0OhGwJ1Zt+XFv+U49XCYV84mHeAnTPUepZX3pS+M02iXv3rZ+I3hx X-Received: by 2002:a17:907:6289:: with SMTP id nd9mr33010692ejc.384.1622627721854; Wed, 02 Jun 2021 02:55:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1622627721; cv=none; d=google.com; s=arc-20160816; b=RTLx2iCxiF6MhbKXjQ9yPLCpYVYUCzEIHXwXDEzuDczUY4bDLXKEcgquK9t3fe+P/g T+77p9UuA0gG+Q+XW6jp8p6KEo0KSQIr7KBSVM8vqK/Hu8Ae6xlOZfhT+vlGHaq2zypt F+yMzHZ6u9mEQQb/Aj30n8SkOLIcjA//op49YdtI1zeiezLfn/NS1IpsJES6tlhyc8j+ FqHONjuPjn0BlpNablsm/lgKvlT+4sQ75BjulKF5OC+CPnIuH46ZycQaNoRT03rju2ZA Oto/yagOD+YFKHSzSj3hRLkH28M6GfhIYpCwiGDxyfifFBrMVxE5akt9V4wNxLF+2DHD AePg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=kLWpOnx/iw1CEqFu8aVvjEaq/agq3E2iyw94IErInPc=; b=TYRETN3qiItFf5dHMDA0xuNdnVbzQ+oEo1P1dxWx5o9MyZv7obLQDfhLdI8+vdkT6l mR68txu/PyM0uq5NewF/gmfcPOqRy8AWAgTrxGmzZYKtZXevDidTPS3L4fHxO8HpRaoC uMiJSfpQzU+hHfJa9RmB/vj+H0ZgYsMaSve2IbgvErKhrlpJPZOzvrNpTzfEPgwj1Ezy 3GSlUAgOP2CU2OwHBKiDGHAPw+kH4SK158Mr16xynf19POAjdaoyWOvjSeAxOKkUEZ5F BCLuM+SwwjCMeRypDz+MNniKY9iXceIODITarRjgbQ3fJwEbQ/DKJKgOBYTV2eaOxI69 owyQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=temperror (no key for signature) header.i=@szeredi.hu header.s=google header.b="nY1kbY7/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id y9si1722660edd.548.2021.06.02.02.54.57; Wed, 02 Jun 2021 02:55:21 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=temperror (no key for signature) header.i=@szeredi.hu header.s=google header.b="nY1kbY7/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233009AbhFBJAo (ORCPT + 99 others); Wed, 2 Jun 2021 05:00:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48458 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232940AbhFBJAn (ORCPT ); Wed, 2 Jun 2021 05:00:43 -0400 Received: from mail-vs1-xe2c.google.com (mail-vs1-xe2c.google.com [IPv6:2607:f8b0:4864:20::e2c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 35D02C06174A for ; Wed, 2 Jun 2021 01:59:00 -0700 (PDT) Received: by mail-vs1-xe2c.google.com with SMTP id x8so703064vso.5 for ; Wed, 02 Jun 2021 01:59:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=szeredi.hu; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=kLWpOnx/iw1CEqFu8aVvjEaq/agq3E2iyw94IErInPc=; b=nY1kbY7/QSUCDrXiaFOhA8gbbzhozBN9OX3YIsdbiCCbuY7QO5BqrLZJCFEjB2VYRX asu2pKOjfnO5y0WIadVkgfT8iuJUbt15ZZo2gnik45qZGhELHNFLaqZWP23LwiMPhRtW rIYbrClloptrBKGAgmpU2Yqjpojs1fPfCtQSY= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=kLWpOnx/iw1CEqFu8aVvjEaq/agq3E2iyw94IErInPc=; b=dFIBHenA+5xFrLLQ85cY1HyeoAfltSDicWNGvQobRwaDGaQpn4Q+dJt3IbD1ArWZmm dNn+J8+DKtqKupR1rchjpub9gXnGq4pWrPQObFbnpVqayROFfsvRJjRAqvmcFUa8s1ko U+0/OO1urPdwA8U9djZfj3EYhKtqe5wXprBL35Srp4vTb5wmCLedc7b2wC2mRWsMi5O+ wzmBnF2acyjJ62poxGK+AsqXqG7UKff9qOonn+4Y1l0XG1idCt6USfxKVeexPFkcHINs 5LmSlh8GKUi2sSrMgmO2cG/j6pGybf5wKm6wUTKNP7bdCyUZJpbVloeFPbTBLlBvFYHV LBDw== X-Gm-Message-State: AOAM530EfGFrJmkzMNUotyWEaCQaqVVxP59QfBbSLRKBK7dmLTVBp5CL dR2UrV6kMcW/6sCiXNBc/CxS5ZMhkN7nLEm1+gx/Aw== X-Received: by 2002:a05:6102:b06:: with SMTP id b6mr22371112vst.21.1622624338774; Wed, 02 Jun 2021 01:58:58 -0700 (PDT) MIME-Version: 1.0 References: <162218354775.34379.5629941272050849549.stgit@web.messagingengine.com> <162218364554.34379.636306635794792903.stgit@web.messagingengine.com> <972701826ebb1b3b3e00b12cde821b85eebc9749.camel@themaw.net> In-Reply-To: <972701826ebb1b3b3e00b12cde821b85eebc9749.camel@themaw.net> From: Miklos Szeredi Date: Wed, 2 Jun 2021 10:58:47 +0200 Message-ID: Subject: Re: [REPOST PATCH v4 2/5] kernfs: use VFS negative dentry caching To: Ian Kent Cc: Greg Kroah-Hartman , Tejun Heo , Eric Sandeen , Fox Chen , Brice Goglin , Al Viro , Rick Lindsley , David Howells , Marcelo Tosatti , linux-fsdevel , Kernel Mailing List Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 2 Jun 2021 at 05:44, Ian Kent wrote: > > On Tue, 2021-06-01 at 14:41 +0200, Miklos Szeredi wrote: > > On Fri, 28 May 2021 at 08:34, Ian Kent wrote: > > > > > > If there are many lookups for non-existent paths these negative > > > lookups > > > can lead to a lot of overhead during path walks. > > > > > > The VFS allows dentries to be created as negative and hashed, and > > > caches > > > them so they can be used to reduce the fairly high overhead > > > alloc/free > > > cycle that occurs during these lookups. > > > > Obviously there's a cost associated with negative caching too. For > > normal filesystems it's trivially worth that cost, but in case of > > kernfs, not sure... > > > > Can "fairly high" be somewhat substantiated with a microbenchmark for > > negative lookups? > > Well, maybe, but anything we do for a benchmark would be totally > artificial. > > The reason I added this is because I saw appreciable contention > on the dentry alloc path in one case I saw. If multiple tasks are trying to look up the same negative dentry in parallel, then there will be contention on the parent inode lock. Was this the issue? This could easily be reproduced with an artificial benchmark. > > > diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c > > > index 4c69e2af82dac..5151c712f06f5 100644 > > > --- a/fs/kernfs/dir.c > > > +++ b/fs/kernfs/dir.c > > > @@ -1037,12 +1037,33 @@ static int kernfs_dop_revalidate(struct > > > dentry *dentry, unsigned int flags) > > > if (flags & LOOKUP_RCU) > > > return -ECHILD; > > > > > > - /* Always perform fresh lookup for negatives */ > > > - if (d_really_is_negative(dentry)) > > > - goto out_bad_unlocked; > > > + mutex_lock(&kernfs_mutex); > > > > > > kn = kernfs_dentry_node(dentry); > > > - mutex_lock(&kernfs_mutex); > > > + > > > + /* Negative hashed dentry? */ > > > + if (!kn) { > > > + struct kernfs_node *parent; > > > + > > > + /* If the kernfs node can be found this is a stale > > > negative > > > + * hashed dentry so it must be discarded and the > > > lookup redone. > > > + */ > > > + parent = kernfs_dentry_node(dentry->d_parent); > > > > This doesn't look safe WRT a racing sys_rename(). In this case > > d_move() is called only with parent inode locked, but not with > > kernfs_mutex while ->d_revalidate() may not have parent inode locked. > > After d_move() the old parent dentry can be freed, resulting in use > > after free. Easily fixed by dget_parent(). > > Umm ... I'll need some more explanation here ... > > We are in ref-walk mode so the parent dentry isn't going away. The parent that was used to lookup the dentry in __d_lookup() isn't going away. But it's not necessarily equal to dentry->d_parent anymore. > And this is a negative dentry so rename is going to bail out > with ENOENT way early. You are right. But note that negative dentry in question could be the target of a rename. Current implementation doesn't switch the target's parent or name, but this wasn't always the case (commit 076515fc9267 ("make non-exchanging __d_move() copy ->d_parent rather than swap them")), so a backport of this patch could become incorrect on old enough kernels. So I still think using dget_parent() is the correct way to do this. > > > > > + if (parent) { > > > + const void *ns = NULL; > > > + > > > + if (kernfs_ns_enabled(parent)) > > > + ns = kernfs_info(dentry->d_sb)->ns; > > > + kn = kernfs_find_ns(parent, dentry- > > > >d_name.name, ns); > > > > Same thing with d_name. There's > > take_dentry_name_snapshot()/release_dentry_name_snapshot() to > > properly > > take care of that. > > I don't see that problem either, due to the dentry being negative, > but please explain what your seeing here. Yeah. Negative dentries' names weren't always stable, but that was a long time ago (commit 8d85b4845a66 ("Allow sharing external names after __d_move()")). Thanks, Miklos