Received: by 2002:a05:6a10:d5a5:0:0:0:0 with SMTP id gn37csp1663879pxb; Fri, 1 Oct 2021 16:28:43 -0700 (PDT) X-Google-Smtp-Source: ABdhPJz5LcGhsZDkYgZfbPt5aO7xS2VGin1Qm7eD7Yyiu+hE0VDFBmG2RlV9IafWqsHywEWPY5/t X-Received: by 2002:a50:bec8:: with SMTP id e8mr335018edk.231.1633130922968; Fri, 01 Oct 2021 16:28:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1633130922; cv=none; d=google.com; s=arc-20160816; b=z1wAxDpzCkhUmulRy4jHNuLkiQJDlXAumOcwiRCI3o9EDuUbPvjvg/hZ6I4murXZLH G7nZgBlakegLpo42jxxYFjiHZ1Zexi70Lj/odHowb5rgvGoIbYih/g587v016SfPs34j y351dEaqIKHzNAv5UFE99stvOerAb8Cd5COdsViaq0c6aDvGh0z0RN6NCfYjnToZpxY1 Yk01VLNvctO/Mn05KY9TuEuprkZK5iuXya6OvrZKx+ZhLsut0VWo2jWi4Gp5FjohbAAa CX1rkk4XoEDrqvdgG88RVvfnOptEKQfiJQ+OGj65lDHdB0pdp9r1ouSTrmjzm0V9UlAL EA+g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=8tTUgmxOrN1sCYAfdmVo+uo+55hwpJgWtC1Rrm4Zacc=; b=eBkbaR69nu1HN0tipgZHTx87gs1dgBHnh1iYlH+i9CypBHxLe7KdS7CsfyZIqwsax1 kpTLVZvHJrPRy94rd+app8IOl/LZ/sSvH555ALUBSHjUqi3ZAcPaYKmjo0HzDtBmw/As oGSGP/k9bk2LJvkHAACuYGewvHSBMmZfhKSwPkTo4xEkACA18JuP+WqAZxEWvMPs4cjN zZflF1ZfZLdMrtos7N2cZELH0TmiMpWww8U2xbwWCtJHO0D1nCTI0YszA26oxmpLbc62 MxEYvadn8QXYazTDkFUe7C1H6z5RqgDINKYqWziDU4/TirqzKcRGQZnVDLQ59pcFCJR8 j56Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=J1FyGsaR; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id w13si12027348ede.264.2021.10.01.16.28.19; Fri, 01 Oct 2021 16:28:42 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=J1FyGsaR; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231265AbhJAX2U (ORCPT + 99 others); Fri, 1 Oct 2021 19:28:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37224 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231219AbhJAX2S (ORCPT ); Fri, 1 Oct 2021 19:28:18 -0400 Received: from mail-lf1-x134.google.com (mail-lf1-x134.google.com [IPv6:2a00:1450:4864:20::134]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A8EEDC061775 for ; Fri, 1 Oct 2021 16:26:33 -0700 (PDT) Received: by mail-lf1-x134.google.com with SMTP id i4so44949987lfv.4 for ; Fri, 01 Oct 2021 16:26:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=8tTUgmxOrN1sCYAfdmVo+uo+55hwpJgWtC1Rrm4Zacc=; b=J1FyGsaRsU+Qrv4JnJo48w/0H8gWXHKnylcwU852VNVpuRr98sndqv4uCu7zoLGO09 F5RPGJBJbEldivNftWR3SS5dmfayBQgwTyAHOEQgH6EI/Bw0iPe5NtFr9Xt2LxBb2otY qG+MHHK0NL7gXyp/xI+qCTOtDUSZSQgf2MNM3heBo+6OhK90Zahrjs28SB9svBu9lWvL sbQiuHjnTHddhIigj/K1I5KvKY+NkPHfSIKqhvDR8UkqSmlUhYQuHYBByXglJR6YzBzF B968GVMElaWEUFq17FZIcRGpMIq6iFdu/uUCdrBBAyk1KNfNKCpVA+6I3rDgAIPSt2M6 ocVw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=8tTUgmxOrN1sCYAfdmVo+uo+55hwpJgWtC1Rrm4Zacc=; b=hEOLS4RgP3HIJRFoAoJHhLpgiuJXAAQBEbfirlFAnaiBj2g/ozqvAxFCeMvAje5Eh8 2AS2CfZa5vyLAau1T7mpGSQ7vzGuvl3dPFRm6LXf9O9KOSHKLG/9ZLYKCKXeXfyiYW6C AOH0NWOvAWpPzsiWhQAFUZJVLqb6TpY59iebfhkVm5G2jFT7mmy4/vvHlnxbMFUsMbt1 oCTp72++hTHLAvUNehDgqrBx6cVxevYr/3Zkk8NW2OL8o6mr+xRBlrbHrlxiEU0oPkP2 AamdLhX0CVt3AqcL7oLWIiaMncuiMWJX20AbWTZzFjlFrn9XZ2KfK2XIijQvFBKkSE/n u3iw== X-Gm-Message-State: AOAM533RJ9h+Jq2lCPe1UbLnA0smg40mqiJjHnI1q6ytcyf2fe431QsK Fwk1B44If39RWAuntqOMhehUXOFxLVyRwkoChzgn2Q== X-Received: by 2002:a05:6512:705:: with SMTP id b5mr768733lfs.82.1633130791771; Fri, 01 Oct 2021 16:26:31 -0700 (PDT) MIME-Version: 1.0 References: <20210929225850.3889950-1-ndesaulniers@google.com> In-Reply-To: From: Nick Desaulniers Date: Fri, 1 Oct 2021 16:26:19 -0700 Message-ID: Subject: Re: [PATCH] modpost: add allow list for llvm IPSCCP To: Linus Torvalds , Bill Wendling Cc: Masahiro Yamada , Arnd Bergmann , Kees Cook , Nathan Chancellor , Michal Marek , Linux Kbuild mailing list , Linux Kernel Mailing List , llvm@lists.linux.dev Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Sep 30, 2021 at 11:54 AM Linus Torvalds wrote: > > On Wed, Sep 29, 2021 at 5:19 PM Nick Desaulniers > wrote: > > ... > > arch/x86/mm/amdtopology.c:110:7: remark: 'test_bit' not inlined into > > 'amd_numa_init' because too costly to inline (cost=115, threshold=45) > > [-Rpass-missed=inline] > > if (node_isset(nodeid, numa_nodes_parsed)) { > > Yeah, I think that we should just do the __always_inline thing. > > I'd rather have the stupid debug code overhead in the caller - that > may end up knowing that the pointer actually is so that the debug code > goes away - than have "test_bit()" uninlined because there's so much > crazy debug code in it. > > I also happen to believe that we have too much crazy "instrumentation" crap. > > Why is that test_bit() word read so magical that it merits a > "instrument_atomic_read()"? > > But I absolutely detest how KCSAN and some other tooling seems to get > a free pass on doing stupid things, just because they generated bad > warnings so then they can freely generate these much more fundamental > problems because the result is a f*cking mess. > > > Though for the defconfig case...somehow the cost is more than with the > > sanitizers... > > Maybe the solution is that if you have some of the crazy sanitizers, > we just say "the end result is not worth even checking". And stop > checking all the section mismatches, and all the stack size things. > > Because it looks like this is more of a real issue: > > > arch/x86/mm/amdtopology.c:157:7: remark: '__nodes_weight' not inlined > > into 'amd_numa_init' because too costly to inline (cost=930, > > threshold=45) [-Rpass-missed=inline] > > if (!nodes_weight(numa_nodes_parsed)) > > ^ > > Hmm. That's just a "bitmap_weight()", and that function in turn is > __always_inline. > > And the *reason* it is __always_inline is that it really wants to act > as a macro, and look at the second argument and do special things if > it is a small constant value. > > And it looks like clang messes things up by simply not doing enough > simplification before inlining decisions, so it all looks very > complicated to clang, even though when you actually generate code, you > have one (of two) very simple code sequences. > > > > Wouldn't it be better to make > > > them always-inline? > > > > Perhaps, see what that might look like: > > https://github.com/ClangBuiltLinux/linux/issues/1302#issuecomment-807260475 > > Does that look better? > > I suspect that in this case, because of clang deficiencies, that > __always_inline actually is the right thing to do at least on > __nodes_weight. > > Looking at your comment lower down > > https://github.com/ClangBuiltLinux/linux/issues/1302#issuecomment-807757878 > > I really think this is a clang bug, and that you need to do certain > simplifications both before _and_ after inlining. > > Before, because of the inlining cost decisions particularly wrt > constant arguments. > > After, because successful inlining changes things completely. I made that comment because our LowerConstantIntrinsics pass was simplifying our internal representation of __builtin_constant_p and __builtin_object_size in the same pass, after inlining. For the case of (defconfig) __nodes_weight not being inlined into amd_numa_init, it's because hweight_long is a huge block of code all predicated on __builtin_constant_p (BCP). Because we evaluate (BCP) AFTER inlining which is very much necessary for the semantics of BCP, we don't eliminate that entire block. Playing with the cost model though... we basically have the pattern: void caller (void) { caller(42); } void callee (int x) { if (__builtin_constant_p(x)) { // M "instructions" } else { // N "instructions" } } the current cost model says that the cost of inlining callee into caller is ~ M + N + C. If we knew that BCP would fold away based on inling, and which way, we might be able to consider the cost just M + C or N + C. In the case of "(defconfig) __nodes_weight not being inlined into amd_numa_init" M >> N, and yet after inlining BCP evaluates to false (so the estimated cost was M + N but in actuality was closer to N). (See how big __const_hweight64 is before BCP evaluation; in this case callee is the analog for hweight64). I guess if the argument to BCP is a parameter of callee (brittle), then we perhaps should be able to figure whether BCP would evaluate to true or false, then properly select M or N. Ok, let me see if I can go build that into the cost model...and if that actually helps any of these cases... Though I do wonder what happens when there's more than one level here...ie. void caller (void) { mid(42): } void mid (int x) { callee(x); } void callee (int x) { if __builtin_constant_p(x) /* M */; else /* N */; } or void caller0 (void) { mid(42): } // BCP would be true void caller1 (int x) { mid(x); } // BCP would be false void mid (int x) { callee(x); } // looking just at the call site, BCP would be false void callee (int x) { if __builtin_constant_p(x) /* M */; else /* N */; } (Sorry for the brain dump; this is more for me than for you. Good chat!) > > Marking __nodes_weight() be __always_inline just works around clang > being broken in this regard. > > It is _possible_ that it might help to make bitmap_weight() be a macro > instead of an inline function, but it's a kind of sad state of affairs > if that is required. > > And it might well fail - if you don't do the constant propagation > before making inlining decisions, you'll _still_ end up thinking that > bitmap_weight() is very costly because you don't do that > __builtin_constant_p() lowering. > > And then you end up using the (much more expensive) generic function > instead of the cheap "look, for single words this is a trivial" thing. -- Thanks, ~Nick Desaulniers