Received: by 2002:a05:6358:a55:b0:ec:fcf4:3ecf with SMTP id 21csp5408875rwb; Tue, 17 Jan 2023 13:24:44 -0800 (PST) X-Google-Smtp-Source: AMrXdXuT6qJFvXoBdSlnH/ZxXpMVU5fcpuPI2HSL3nKCMldawpJ2Of38+98e4XaDZCJh/V9cbVw8 X-Received: by 2002:a17:902:a5ca:b0:193:24bf:344d with SMTP id t10-20020a170902a5ca00b0019324bf344dmr5062312plq.57.1673990684638; Tue, 17 Jan 2023 13:24:44 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1673990684; cv=none; d=google.com; s=arc-20160816; b=kn44virTIRYVdVnccl0jG6biAiiuzb+F6scOzDpJMeVgl1/9XPiMJi7l5ftfLgcmkD MIQ4zWo0uzzG16+d5fFghW7dhSFv76CIXMuoR5I+IbBX4OtQTMtb6MCfnEJ03me9H50k fh1/ytfS8YYmULCmLsZOSYSH4mZWNjI89D8deHxsXEK2UUP5Ams9Y5UcnQiC7vM8OhQW ySnKzKkpSQB54vuCiSpthAkfJzDrV9b1WwlX2I7nQnuSCO8qP1ED6v3M8XzdN39+ZjGy R/XJdBxzn7Z8cmCM2atsD8ELhfSHvyXNWMIkJVLAFCjMTvew6CVK9NlEttNF1tY7jjAg 9HHA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=xgV/SZ8ETts0p/F1SYwIOGkOPAJK3JwasEqgPm/ua9k=; b=dGs7oy1/04oDUnpU23StD6jAO+uNT4ZSdXVfTzVP/ivAdfVcik9hhtOt0esdUGzdCG 0CzDMxTC3SXB0danKzDgpK9iM4AEL68Izv84OFsA0y22QmCtE30V5nWh8E4pwfNWiCDV RjTtdvuxVYagfPE2R4smZnOtjdFJuMPaHeprd3G8LVSLvz5n1PZ/QmWvDG6zt9rlu0tE tS//qfdtElh1NxHxm457kN2h6k3mX+scbI4iAuFVm2OiO8JrTsHzXQBjG/b9bnBgb4XT XonRCv3gJRRuqOyzVO9kk5du0cGytnTfr+y6eyPf2/LinEvpmo2RLSQJDYF+AarGxGSe ynCQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=kxbltlCa; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id e14-20020a170902ef4e00b00189aee21a03si32200611plx.423.2023.01.17.13.24.38; Tue, 17 Jan 2023 13:24:44 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=kxbltlCa; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235207AbjAQUd0 (ORCPT + 47 others); Tue, 17 Jan 2023 15:33:26 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52916 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235271AbjAQUbP (ORCPT ); Tue, 17 Jan 2023 15:31:15 -0500 Received: from mail-pg1-x52a.google.com (mail-pg1-x52a.google.com [IPv6:2607:f8b0:4864:20::52a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 050E44FC3D for ; Tue, 17 Jan 2023 11:17:37 -0800 (PST) Received: by mail-pg1-x52a.google.com with SMTP id e10so22747813pgc.9 for ; Tue, 17 Jan 2023 11:17:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=xgV/SZ8ETts0p/F1SYwIOGkOPAJK3JwasEqgPm/ua9k=; b=kxbltlCa5KiY2grjjE02ZlPoiDDLBMthuShIG5sqFiOtBMap+KYDWZYfjsUEIi5ioD OAXYZM1hCGqb4uqPwVpfm+vgJkx/3GI8i1rDazC2IEXMPwUFOZvWqyL9nJmWQpnnEPD5 ZTv/LfN66Fkbj+tCTo3NizQDlvvT6wiBjYQ7zE85jP/MQzN5BD67dM2DzRhNE80vVwXy +7gtsYmLduOf5oBnWupdrlIAxXTJl1QrTVRS8toI9TR6HZz1jWBw3kwc4MGqPkjYfJCW UJINdxAE8fVN4RyRnGRiPS8Hw1W5goGXQUgK3Z/L50KeN0+ZbAxe5q1NOBz+PNd8L/RI iFlw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=xgV/SZ8ETts0p/F1SYwIOGkOPAJK3JwasEqgPm/ua9k=; b=v80Ei5WQyDDFuPH8RrY2xL4hETDymAN1RG71asTQ3Se4z+2QOHxekXeu2g5swRLOXj jT9DWyQpXanXHJnbwOJT5u1ChAyVf74I1rfRcmO//3JSO21C0ATmqm4SN+5DPNgkRRTa WkTbvAbuTwtX+qdS88azJZjkcKoxvOm06BMyTqnZQTakk4dmry410esnz9qBMIDE7IV4 RVQPk4KpTo0DNRgWuP5nrNz7Yrvzm3Z0wrS1Ts2rPfEShZrubOJP5mf/Mouh9bcQE4AP tfXkaJWZ0ISqmvvieJupeWjE3Ff3fyrd29rRvP/8HidzVYMIScc3Arnr/ghpHb2gXfj1 tKOg== X-Gm-Message-State: AFqh2krwu1Wu78MkJOInK6ET/5mP4jcSGFiFZOxCbTDn5l4pQ2vvKgUy 6uFosGSMmg74A/jv86+xZCwbwHjkjtNgHQPBaop7TQ== X-Received: by 2002:a63:78a:0:b0:4ce:52b4:aff8 with SMTP id 132-20020a63078a000000b004ce52b4aff8mr300030pgh.427.1673983047213; Tue, 17 Jan 2023 11:17:27 -0800 (PST) MIME-Version: 1.0 References: <20230111123736.20025-1-kirill.shutemov@linux.intel.com> <20230111123736.20025-9-kirill.shutemov@linux.intel.com> <20230117135703.voaumisreld7crfb@box> In-Reply-To: From: Nick Desaulniers Date: Tue, 17 Jan 2023 11:17:15 -0800 Message-ID: Subject: Re: [PATCHv14 08/17] x86/mm: Reduce untagged_addr() overhead until the first LAM user To: Linus Torvalds Cc: Peter Zijlstra , "Kirill A. Shutemov" , Dave Hansen , Andy Lutomirski , x86@kernel.org, Kostya Serebryany , Andrey Ryabinin , Andrey Konovalov , Alexander Potapenko , Taras Madan , Dmitry Vyukov , "H . J . Lu" , Andi Kleen , Rick Edgecombe , Bharata B Rao , Jacob Pan , Ashok Raj , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Sami Tolvanen , joao@overdrivepizza.com Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jan 17, 2023 at 10:34 AM Linus Torvalds wrote: > > On Tue, Jan 17, 2023 at 10:26 AM Nick Desaulniers > wrote: > > > > On Tue, Jan 17, 2023 at 9:29 AM Linus Torvalds > > wrote: > > > > > > Side note: that's not something new or unusual. It's been the case > > > since I started testing clang - we have several code-paths where we > > > use "unlikely()" to try to get very unlikely cases to be out-of-line, > > > and clang just mostly ignores it, or treats it as a very weak hint. I > > > think the only way to get clang to treat it as a *strong* hint is to > > > use PGO. > > > > I'd be surprised if that were intentional or by design. > > > > Do you guys have a bug report we could look at? > > Heh. I actually sent you an example long ago. Let me go fish it out of > my mail archives and quote some of it below so that you can find it in > yours.. > > Linus > > [ Time passes. Found this email to you and Bill Wendling from Feb 16, > 2020, Message-ID > CAHk-=wigVshsByCMjkUiZyQSR5N5zi2aAeQc+VJCzQV=nm8E7g@mail.gmail.com ]: > > Anyway, I'm looking at clang code generation, and comparing it with > gcc on one of my "this has been optimized to hell and back" functions: > __d_lookup_rcu(). > > It looks like clang does frame pointers, and ignores our > likely/unlikely annotations. > > So this code: > > if (unlikely(parent->d_flags & DCACHE_OP_COMPARE)) { > int tlen; > const char *tname; > ...... > > doesn't actually jump out of line, but instead generates the unlikely > case as the fallthrough: > > testb $2, (%r12) > je .LBB50_9 > ... unlikely code goes here... Perhaps that was compiler version or config specific? $ make LLVM=1 -j128 defconfig fs/dcache.o $ llvm-objdump -d --no-show-raw-insn --disassemble-symbols=__d_lookup_rcu fs/dcache.o 0000000000003210 <__d_lookup_rcu>: 3210: endbr64 3214: pushq %rbp 3215: pushq %r15 3217: pushq %r14 3219: pushq %r12 321b: pushq %rbx 321c: testb $0x2, (%rdi) 321f: jne 0x32d7 <__d_lookup_rcu+0xc7> ... 32d7: popq %rbx 32d8: popq %r12 32da: popq %r14 32dc: popq %r15 32de: popq %rbp 32df: jmp 0x3300 <__d_lookup_rcu_op_compare> That looks like what you want, yeah? Your original report was from nearly 3 years ago; could have fixed a few instances of branch weights not getting propagated since then. What's going on in this case in this thread? I know we don't support hot/cold attributes on labels yet, but if static_branch_likely (or friends) is being used, we assign the indirect branches a 0% likeliness/branch-weight. > > and then the likely case ends up having unfortunate reloads inside the > hot loop. Possibly because it has one fewer free registers than gcc > because of the frame pointer. > > I didn't look into _why_ clang generates frame pointers but gcc > doesn't. It may be just a compiler default, I think we don't end up > explicitly asking either way. > > [ And then Bill replied with this ] > > It's not a no-op. We add branch probabilities to the IR, whether > they're honored or not depends on the transformation. But they > *should* be honored when available. I've seen in the past that instead > of moving unlikely blocks out of the way (like gcc, which moves them > below the function's "ret" instruction), LLVM will do something like > this: > > > > > > > <...> > > I.e. the loop is rotated and the unlikely code is first and the hotter > code is closer together but between the unlikely and conditional test. > Could this be what's going on? Otherwise, maybe clang decided that > it's not beneficial to move the code out-of-line because the benefit > was minimal? (I'm guessing here.) -- Thanks, ~Nick Desaulniers