Received: by 2002:a05:6358:a55:b0:ec:fcf4:3ecf with SMTP id 21csp5482682rwb; Tue, 17 Jan 2023 14:33:54 -0800 (PST) X-Google-Smtp-Source: AMrXdXunzIlkqpKl6edtal3nFqvUTAyz0DRR5dtt9VrWyfewsH7xDIUYIfnQJlrQ+kGQUv9d2w3/ X-Received: by 2002:a05:6a20:b28f:b0:b8:e00a:c40c with SMTP id ei15-20020a056a20b28f00b000b8e00ac40cmr1604098pzb.62.1673994834543; Tue, 17 Jan 2023 14:33:54 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1673994834; cv=none; d=google.com; s=arc-20160816; b=RrE1qu+rmeSOTLIal9NVVhaTF3ZCMd3bszpTj2qOOq+duBDUGbLdq5lni8yTTaAokw Esj+BiG6bY7/hzAJ5t4LuOqxfxnz6q8u2/bAGPvj1yQ8akTI0VPRhu1jU7pgJYO2sa2y 0J06k6DayDLkhbxma9eb3ldxxhiwf/+OmSnwXrmDfj9DM1fDpCYsUay3RPUKNLI/nUy/ +uYbniTHyY3okiDHP6ZL5Zwzg90+YZO4K/KChNGbGI8TG7KCtKDNyLz41xdoyTSYTZHN 8L8azeyzacnb43jX1D401mT1xiIS+G3nzqXFhiFxIf6/pcS6HQtdZ/b/zuf7cihMEn40 1MYg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=/8VAQBZJw6HVtjrG/2YG/pHlRsdBpU1CtimfojJQFw4=; b=dEcKbhQ4r+JxrARNwvFfslC+teeHyG+szjTle/UrfCL4De0GDP4It36jTEEveUx213 n6AiX4bpflcP7G7PNcGgrtazxk9KXqEv8GgHC70WVNq82KgeeYXo6aeFPU1WdalkK4x9 Mj0fD6t2YtZLGU/Z83kzJBq0c8TCwy1qae5vFT8bxPyNJtgfpzpgVOuj27d3Y4Uba3Rw 4IqOjX/r+9N0j/dJL7i1jXt0wr1zbLTnHnZf29QhkPeqMdPzbqyE1xWZbyYGFcWYb0ka OutGtoJO2kpqiic0LsgrjCbie2eh5OnmXODFYafAMathjQsx0Do+m3b/MYWhh8rcqS6P qmlA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=google header.b=NaF0mTKX; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id r139-20020a632b91000000b004ce425ce690si5016213pgr.661.2023.01.17.14.33.48; Tue, 17 Jan 2023 14:33:54 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=google header.b=NaF0mTKX; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229905AbjAQWLT (ORCPT + 46 others); Tue, 17 Jan 2023 17:11:19 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54470 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230224AbjAQWKY (ORCPT ); Tue, 17 Jan 2023 17:10:24 -0500 Received: from mail-qt1-x82b.google.com (mail-qt1-x82b.google.com [IPv6:2607:f8b0:4864:20::82b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B844447EE5 for ; Tue, 17 Jan 2023 12:44:16 -0800 (PST) Received: by mail-qt1-x82b.google.com with SMTP id h21so28115461qta.12 for ; Tue, 17 Jan 2023 12:44:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=/8VAQBZJw6HVtjrG/2YG/pHlRsdBpU1CtimfojJQFw4=; b=NaF0mTKXGLsZlvV/feMeBjnEe9X4ZZaaFt+H7wStvQtVHCvNlMtjziUMrcwGR9AmK8 pOwV+xNrzbgIGpeuNzcyXz848TXcUjB4OQIjHdRXltt1EInK8pSBvoPmwJj/cBJJ9DNt RUi2v4qmFvsl54XqffZV2GuqwTRlO8ybzrZbE= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=/8VAQBZJw6HVtjrG/2YG/pHlRsdBpU1CtimfojJQFw4=; b=MXSwkfbQYzLRdBAAl31yIvEFEcoZGGxmk5+gPjU3309A27uyRlsFskoFH0sYIfi5Nh KPgealDRDIzwLitjq3hJG+WsjbO8dLJo2Fl/p4iOFj7oQlfpuQ1tDMqeYQ0oUOMuYs43 4wz/4m3wIWJoCmteJAINmtFwrxR4Rap2l567o5fuuHOetcZPfWQc41N2fJSXaGopdsqc 8PqCXzZQGUjSoZ8ahjeCXQuebFq1xsA531UkZ2kAZQwCHOE/yQsz6IpbmuToiHb22s5d 90JguNj/4FZaXX2jU1V8uq/AGFwlwmcJ16KiKWodGdgkVT2WpZYizKMwtgp7xtwaufDs R/5w== X-Gm-Message-State: AFqh2krYBwwmbRGU3Efke8rZWb4xNy51U+zGo8kV1KFLuQ9yL0Ita30u q+BhFL/lu2MCdv9I7FzkxoJ/JsWbvbxIJpck X-Received: by 2002:a05:622a:1dc4:b0:3a8:efd:2ef0 with SMTP id bn4-20020a05622a1dc400b003a80efd2ef0mr4680202qtb.60.1673988255424; Tue, 17 Jan 2023 12:44:15 -0800 (PST) Received: from mail-qv1-f42.google.com (mail-qv1-f42.google.com. [209.85.219.42]) by smtp.gmail.com with ESMTPSA id bz25-20020a05622a1e9900b003a591194221sm1285773qtb.7.2023.01.17.12.44.13 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 17 Jan 2023 12:44:14 -0800 (PST) Received: by mail-qv1-f42.google.com with SMTP id u20so6132084qvq.4 for ; Tue, 17 Jan 2023 12:44:13 -0800 (PST) X-Received: by 2002:ad4:50a9:0:b0:532:31b0:b4fa with SMTP id d9-20020ad450a9000000b0053231b0b4famr295229qvq.129.1673988253652; Tue, 17 Jan 2023 12:44:13 -0800 (PST) MIME-Version: 1.0 References: <20230111123736.20025-1-kirill.shutemov@linux.intel.com> <20230111123736.20025-9-kirill.shutemov@linux.intel.com> <20230117135703.voaumisreld7crfb@box> In-Reply-To: From: Linus Torvalds Date: Tue, 17 Jan 2023 12:43:57 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCHv14 08/17] x86/mm: Reduce untagged_addr() overhead until the first LAM user To: Nick Desaulniers Cc: Peter Zijlstra , "Kirill A. Shutemov" , Dave Hansen , Andy Lutomirski , x86@kernel.org, Kostya Serebryany , Andrey Ryabinin , Andrey Konovalov , Alexander Potapenko , Taras Madan , Dmitry Vyukov , "H . J . Lu" , Andi Kleen , Rick Edgecombe , Bharata B Rao , Jacob Pan , Ashok Raj , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Sami Tolvanen , joao@overdrivepizza.com Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-1.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jan 17, 2023 at 12:10 PM Linus Torvalds wrote: > > That said, clang still generates more register pressure than gcc, > causing the function prologue and epilogue to be rather bigger > (pushing and popping six registers, as opposed to gcc that only needs > three) .. and at least part of that is the same thing with the bad byte mask generation (see that "clang *still* messes up" link for details). Basically, the byte mask is computed by mask = bytemask_from_count(tcount); where we have #define bytemask_from_count(cnt) (~(~0ul << (cnt)*8)) and clang tries very very hard to avoid that "multiply by 8", so instead it keeps a shadow copy of that "(cnt)*8" value in the loop. That is wrong for a couple of reasons: (a) it adds register pressure for no good reason (b) when you shift left by that value, only the low 6 bits of that value matters And guess how that "tcount" is updated? It's this: tcount -= sizeof(unsigned long); in the loop, and thus the update of that shadow value of "(cnt)*8" is done as addl $-64, %ecx inside that loop. This is truly stupid and wasted work, because the low 6 bits of the value - remember, the only part that matters - DOES NOT CHANGE when you do that. So clang has decided that it needs to (a) avoid the "expensive" multiply-by-8 at the end by turning it into a repeated "add $-64" inside the loop (b) added register pressure and one extra instruction inside the loop (c) not realized that that extra instruction doesn't actually *do* anything, because it only affects the bits that don't actually matter in the end. which is all kind of silly, wouldn't you agree. Every single step there was pointless. But with my other simplifications, the fact that clang does these extra things is no longer all that noticeable. It *used* to be a horrible disaster because the extra register pressure ended up meaning that you had spills and all kinds of nastiness. Now the function is simple enough that even with the extra register pressure, there's no need for spills. .. until you look at the 32-bit version, which still needs spills. Gcc does too, but clang just makes it worse by having the extra pointless shadow variable. If I cared about 32-bit, I might write up a bugzilla entry. As it is, it's just "clang tries to be clever, and in the process is actually being stupid". Linus