Received: by 2002:a05:6a10:6d10:0:0:0:0 with SMTP id gq16csp965691pxb; Fri, 22 Apr 2022 15:29:58 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyaRuIGJSmFPV8lIwtdPEjLoXFbz1F+Z+ieTlPIfG4gU1ryoBLHcydUd3McFJhDsCaSsKFY X-Received: by 2002:a17:90b:1c87:b0:1ca:f4e:4fbe with SMTP id oo7-20020a17090b1c8700b001ca0f4e4fbemr18433255pjb.159.1650666598260; Fri, 22 Apr 2022 15:29:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1650666598; cv=none; d=google.com; s=arc-20160816; b=Bb73TdcBYleJSZUsQZqHtO1Ty4RCnBZG80g+hz2gr/Y7MWY2nsV2PPVdwgz8affNtI q4IN+tDeOCQLDc06WJn3h2q7EMdqJnSen8//4axkNhmdamNN6hccF0nst2NUoW6KYZFd JdblkfgVrgdqGyLZ5hXyV+8wMPiDj/WhNW9TyBreU0A3JQpplkNAidIbxdkZlY1LaXyy W9a9o0vU9pZaEhjLBrUshhM+lhuE6Nc1+nfbCrLATJaqnrrXMXTAz16HyhtHrj2OsSPv dtX73qFDBl+Ma2lHNSf15WAEAop8k7gNv+qXD+QojYZVquaFLGOJRMgLNlSYJr1xF8FT GUNw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:message-id:date:references :in-reply-to:subject:cc:to:dkim-signature:dkim-signature:from; bh=l5c3CLMMqVYhfDtG3GHGcpKgf6+Vvr2oQsy+wAdxJA8=; b=fdK+ESVj7PPU3wmF56kVJ5WU00ufVstiKN8GKWLQCbCBGXsx4INV0ldga/PfsNtf1j 9R5KIQ2+iTIZc4SwulU3FnJe4qXrvaCbfJ2NNWmfzAY3QME5ijwqZCsB/vqRBRDtSqOw 5lyh2i4old4RmPzQf2seM8ufAVCu5uN71Q52mi2M97n7STYmL0TyAJZ3aXC5XSh5vdb1 cCO2wiemIC4ec2tZ6dpn3ZQtQj8KwK6uc28bLwh3gn1/3l1UKNhTvnq8S8nISeyhsknQ h8gPGprW6IXhMq8ZmEP6tQtODmQct4ryEFY/Ujpp4wxv63ggAscTIp9YcSmk/vtD/aNA tpbg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=ktbNY0QV; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e header.b=EqXgjvnM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id l64-20020a638843000000b003aaf8a37f1bsi134767pgd.531.2022.04.22.15.29.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 22 Apr 2022 15:29:58 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=ktbNY0QV; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e header.b=EqXgjvnM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from out1.vger.email (out1.vger.email [IPv6:2620:137:e000::1:20]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 8BB4C3FBAA2; Fri, 22 Apr 2022 13:37:28 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229445AbiDVTyj (ORCPT + 99 others); Fri, 22 Apr 2022 15:54:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60398 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229626AbiDVTyh (ORCPT ); Fri, 22 Apr 2022 15:54:37 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9AE991632C0 for ; Fri, 22 Apr 2022 12:35:12 -0700 (PDT) From: Thomas Gleixner DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1650655820; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=l5c3CLMMqVYhfDtG3GHGcpKgf6+Vvr2oQsy+wAdxJA8=; b=ktbNY0QVYnCK7w5Uwq2rBIPQKzBxNrZHIHCQgzzcV7OZpC2XvrknB/SrHffIBir+NOz35+ 5k4jihmCCggBqkZ0jq7Rfp5p4s01DjCsUeHqfkwcqJTH/ybdCcaH8BCt5cJ+mVgyuGSHU8 yzLknyxOQ8G27x02oPv0+5NF39ZrQmmc4tA20OSXLcXj6rIByjISAQ7OoehrFBhIqKxrya N5/qMLfQf2LPC5WSODTtKiUwXnf3m3plxL6ZwUJ3FElo4Djbl93vQOcuJHgA3jQf8m8KID 6+WywhpNaZtrTXzv/j2xGThpqCnuS2Y+zKvUnrcne+8DILYxmV4vcuC9IkFHFw== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1650655820; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=l5c3CLMMqVYhfDtG3GHGcpKgf6+Vvr2oQsy+wAdxJA8=; b=EqXgjvnMaDEsZq01kp11O5F1dGAeDCLJOJLOkiN5qrfGo8e+aRSM1c/L0vqTYdEP1pGApC YVGfHGBegPSTULBg== To: Tom Lendacky , Dave Hansen , LKML Cc: x86@kernel.org, Andrew Cooper , "Edgecombe, Rick P" Subject: Re: [patch 3/3] x86/fpu/xsave: Optimize XSAVEC/S when XGETBV1 is supported In-Reply-To: <60e5a4d1-df7c-d3bd-2730-e528cd75c351@amd.com> References: <20220404103741.809025935@linutronix.de> <20220404104820.713066297@linutronix.de> <87ee1t9oka.ffs@tglx> <878rs0vkd2.ffs@tglx> <60e5a4d1-df7c-d3bd-2730-e528cd75c351@amd.com> Date: Fri, 22 Apr 2022 21:30:19 +0200 Message-ID: <87bkws6hmc.ffs@tglx> MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Apr 20 2022 at 13:15, Tom Lendacky wrote: > On 4/19/22 16:22, Thomas Gleixner wrote: >>> That was bare metal and I just checked that this was a production config >>> and not some weird debug muck which breaks large pages. I'll look deeper >>> into that. >> >> I can't find any reasonable explanation. The pages are definitely large >> pages, so yes the dTLB miss count does not make sense, but it's >> consistently faster and it's always the dTLB miss count which makes the >> big difference according to perf. >> >> For enhanced fun, I ran the lot on a AMD Zen3 machine and with the same >> test case (hackbench -l 10000) repeated 10 times by perf stat this is >> consistently slower than the non optimized variant. There is at least an >> explanation for that. A tight loop of 1 Mio xgetbv(1) invocations takes >> 9 Mio cycles on a SKL-X and 50 Mio cycles on a AMD Zen3. > > I'll take a look into this and see what I find. Might be interesting to > see if the actual XSAVES is slower or quicker, too, based on the input mask. > > If the performance slowdown shows up in real world benchmarks, we might > want to consider not using the xgetbv() call on AMD. As things stand now, I'm not going to pursue this further at the moment. The effect on SKL-X is not explainable especially the dTLB miss count decrease does not make any sense. Aside of that I just figured out that it is very sensitive to kernel configurations and I have no idea yet what exactly is the screw to turn to make the effect come and go. So I just go and add the XSAVEC support alone as that's actually something which _is_ beneficial for guests. Thanks, tglx