Received: by 2002:a05:6358:4e97:b0:b3:742d:4702 with SMTP id ce23csp1802359rwb; Sun, 14 Aug 2022 12:42:38 -0700 (PDT) X-Google-Smtp-Source: AA6agR7rP8shtUJLfAiochmw/7JBgVR3bFiGMqpUUH8LHklKOAr0qD8L0ePoKBn+Jm13yhjIK1AX X-Received: by 2002:a05:6402:248a:b0:440:9709:df09 with SMTP id q10-20020a056402248a00b004409709df09mr11641422eda.42.1660506158782; Sun, 14 Aug 2022 12:42:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1660506158; cv=none; d=google.com; s=arc-20160816; b=r8UhvCUxAd45aGJ1CWsk1QYlYPCByV5ykNQ7hg/Yd0l5K9BIj4M5fTN71/RpCmlFbc jAbkjofintmCco2qwo3db0nke9qyLfrmKK3O4GzLQUtZmBxJWy3RlSNjz69oRF0lL/OC d69p5Kh8Gb0ssVuqp1oDRT3OtfHdkUS8ZJevSkRXfAzdQR7RVgKcgG4kUwHkc3vaWQEW 6A4bUPc2r9aSQgo+kf26/hP2FDynpAl/ZfsTNcXAAEoG2imZaKLC/BCTe9vqRMN94nXa loE1Whkl3UEXLO6+q+kLYzq5ziBflOiXh2soxLc5HqklwV+1RKDEOnmUVId7L9oKJa1/ +gXQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature; bh=t1+0X+jLswVdkejc/MI/VGIatK9gKFmi7huNqRXsrjY=; b=N7ptp8ds4z4Q4d5Tg4OY1sMKjHVgb6P3ggGhrWkOJytq1+C3eoKKU4YJ6mmRAw5Pjm ehB3GlnwKeQJWw4xV4qDj5Ndn6tzsXDgr7ptAUOK6qN8zlXc05LcdCu4JwaEc1YHsboH Mzqi+P6kagQj7uY7wtWoItDDzaDsiNLlQtNWGv05VWloqAf7kb1bcJLj0D3AaM7R/khA UfBGYji7wJAz01J6BS0n8+LTeDtLeudTcmvrwe2r+y6yoA9zIP4/1me1iZVQgl4GfLN3 UmHo458KQSW7dElkHAJA5Vz+6xDLuk09sHtPh3lluGBxjBR9GzI4whDvyISD7xriyhmv uoXQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux.org.uk header.s=zeniv-20220401 header.b=tQd3aws4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=zeniv.linux.org.uk Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ti5-20020a170907c20500b00734b422430bsi4657158ejc.331.2022.08.14.12.42.06; Sun, 14 Aug 2022 12:42:38 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linux.org.uk header.s=zeniv-20220401 header.b=tQd3aws4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=zeniv.linux.org.uk Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231580AbiHNTIj (ORCPT + 99 others); Sun, 14 Aug 2022 15:08:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43600 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229563AbiHNTIh (ORCPT ); Sun, 14 Aug 2022 15:08:37 -0400 Received: from zeniv.linux.org.uk (zeniv.linux.org.uk [IPv6:2a03:a000:7:0:5054:ff:fe1c:15ff]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8ADC8205C3; Sun, 14 Aug 2022 12:08:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=linux.org.uk; s=zeniv-20220401; h=Sender:In-Reply-To:Content-Type: MIME-Version:References:Message-ID:Subject:Cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=t1+0X+jLswVdkejc/MI/VGIatK9gKFmi7huNqRXsrjY=; b=tQd3aws4JXT2vt4V6EBB0swGii kftrvgEuYjuc7McqDq2EpBQ9vhnMdP4bWmTTuOqoazXfFtTEcpse1AlrxJK3JLOIBYAP3L9tMfwWK Y8Xv/B7bdERm/CLPGkrIdOrZ0yS0NLY5qhg+uaD/s2ZopBcr+E3idQ50XHXJJEqzZ/hPd6ojrtkRm a0diwNV8Ss1bqoHOyDZ7dpZcBkpd41AyNaKxnCbGnsmeLFf36bm8Tn8/AegjCw4mKKY28OB+yvMlC HGBGjwlMM70OucyXM1E45uThnJ+fWsi90VcHt3bHUHrcqH14dWkR2lLwIhabOITKXSk+bUI5V4QcG v0lUgltg==; Received: from viro by zeniv.linux.org.uk with local (Exim 4.95 #2 (Red Hat Linux)) id 1oNIyL-004K3f-ET; Sun, 14 Aug 2022 19:08:29 +0000 Date: Sun, 14 Aug 2022 20:08:29 +0100 From: Al Viro To: Linus Torvalds Cc: Nathan Chancellor , Nick Desaulniers , Jeff Layton , Ilya Dryomov , ceph-devel@vger.kernel.org, linux-kernel@vger.kernel.org, Matthew Wilcox , clang-built-linux Subject: Re: [GIT PULL] Ceph updates for 5.20-rc1 Message-ID: References: <5d0b0367a5e28ec5b1f3b995c7792ff9a5cbcbd4.camel@kernel.org> <72a93a2c8910c3615bba7c093c66c18b1a6a2696.camel@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: Al Viro X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_EF,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Aug 11, 2022 at 08:58:54PM -0700, Linus Torvalds wrote: > On Thu, Aug 11, 2022 at 3:43 PM Linus Torvalds > wrote: > > > > Oh, sadly, clang does much worse here. > > > > Gcc ends up being able to not have a stack frame at all for > > __d_lookup_rcu() once that DCACHE_OP_COMPARE case has been moved out. > > The gcc code really looks very nice. > > > > Clang, not so much, and it still has spills and reloads. > > I ended up looking at the clang code generation more than I probably > should have, because I found it so odd. > > Our code is literally written to not need that many values, and it > should be easy to keep everything in registers. > > It turns out that clang is trying much too hard to be clever in > dentry_string_cmp(). The code is literally written so that we keep the > count of remaining characters in 'tcount', and then at the end we can > generate a 'mask' from that to ignore the parts of the pathname that > are beyond the size. [snip] There's a cheap way to reduce the register pressure: seq = raw_seqcount_begin(&dentry->d_seq); if (dentry->d_parent != parent) continue; if (d_unhashed(dentry)) continue; if (dentry->d_name.hash_len != hashlen) continue; if (dentry_cmp(dentry, str, hashlen_len(hashlen)) != 0) continue; *seqp = seq; could move the last store to before dentry_cmp(). Sure, we might get some extra stores out of that. Into a hot cacheline, and if we really hit many of those extra stores, we already have a problem - a lot of collisions both in ->d_parent and ->d_name.hash_len. If that happens, the cost of those extra stores is going to be trivial noise. From correctness POV that should be fine; callers of __d_lookup_rcu() getting NULL either entirely ignore *seqp (d_alloc_parallel()) or proceed to wipe it out (lookup_fast(), by calling try_to_unlazy()). Comments?