Received: by 2002:a05:6a10:6d10:0:0:0:0 with SMTP id gq16csp4195356pxb; Tue, 19 Apr 2022 20:03:25 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxIufxxhdmBJ+vAnmGQckbWFK9PR32+PpbjIzZ3LbBf3HEn444S6PVyNu36rTJcIc/Wkl0S X-Received: by 2002:a05:6402:1148:b0:416:a4fb:3c2e with SMTP id g8-20020a056402114800b00416a4fb3c2emr20914063edw.182.1650423805423; Tue, 19 Apr 2022 20:03:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1650423805; cv=none; d=google.com; s=arc-20160816; b=IdRd2hkEcG63HRTZ464Sxfdb6ThToKBCvWn62cDZj1AQqn2qHwo930vyIgRjM57WHx 6JWj5wPUokHVusCgdqXvChgFY6nB3iVK4+vZTCnxmpkn0xoBLWhTPKKSw9zO2pzqPZV/ YgG2JyfYz7qUYJ3MG3quLeIdjD4JXdK876QGYTKjUIDgxk0/CyPyHl53h3DerMQ0ZnnM ZEXh4Btra7tRR0ALHnEsYm2KU2Baf1TyphFQLzXMnVRLnfELroeS0bu7TGKdEkxM96mq FmDD7dX52wW4G1bH/UCi1j9AsCNvGKjxR1qRdOh2x2KjTOaX4wokX7NdUIOYU1vzV27c l5TQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:message-id:date:references :in-reply-to:subject:cc:to:dkim-signature:dkim-signature:from; bh=eocecnDpptAC/EGkvzqxs0DwJFFc0p4YBknIOB3vi64=; b=qOqw2j5lmKaBHnikcbvuDBMdSFAexcutj+mEsYXM5YMEf5wFkEZwMSHe4h1H/91u5p ameVB3mrGCxDzrBRn9RsRSnl/mKHSyKP+yg+3Jw3ypVXQRusDc5N/Xr9qCZUua9FPlCw 3nOpMs/ngl7iZe0Psjt5O8elrGM81DHzQrigPqMhIrTHojNLsNGMUtZXyyaGIxWMZpRP cSRJOZethcYW2C5cSu0lTx2ApY5PFTYZDqXmrufXrMDdrdvUKREl2Mjc0P9I1w6TTU4s /N48HNx9JXFONLtn+KsJmoJF0TnoLELhy6kST2/ONh1Rmbe7TZd+VR/akcrCBTQVZPyu QOmA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=I1NjPxxk; dkim=neutral (no key) header.i=@linutronix.de header.b=XjOXQo2w; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id d6-20020a50ea86000000b0041d6da9e85esi555983edo.379.2022.04.19.20.03.01; Tue, 19 Apr 2022 20:03:25 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=I1NjPxxk; dkim=neutral (no key) header.i=@linutronix.de header.b=XjOXQo2w; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1348057AbiDSVZg (ORCPT + 99 others); Tue, 19 Apr 2022 17:25:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52272 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241346AbiDSVZf (ORCPT ); Tue, 19 Apr 2022 17:25:35 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9230641FA6 for ; Tue, 19 Apr 2022 14:22:51 -0700 (PDT) From: Thomas Gleixner DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1650403370; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=eocecnDpptAC/EGkvzqxs0DwJFFc0p4YBknIOB3vi64=; b=I1NjPxxkkP36M3WKMWrL/IbeVkJTu6D1maHJ+VKQvBTxrbPzy3klZIdP3E6SNDKak+lLE/ /YF5q4ELDCfdx1u1FGoZC5JaxF44/GvTqluec9GOh7z8zeA5pvRQMvx3S3nj5foHBMEclq oFNC7WmRtSl95sSQ2UVTUUkJrSv43+NJ/zQPoHp2OpE4cHu+8GHPQEhwhBiwAMEib1cW/M ZLgcZzbP2/825WaKXrs205LcNwmG/zkaIFmgpSzYdilJBb6WkmbEzMTyYXSamFDMxDTn2l FX/8fKepuaCnrxLyydnZBDDihcfGZj781XOAxFBHpOCcw200+YZKqiwA2T+7eg== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1650403370; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=eocecnDpptAC/EGkvzqxs0DwJFFc0p4YBknIOB3vi64=; b=XjOXQo2w9NEOuqXXM80zQ2FdrnvXkCiAksTBHT634qb7gURY89A5+7Ywb6UMdbme33hUbz StsVoOh99VkZ8iCw== To: Dave Hansen , LKML Cc: x86@kernel.org, Andrew Cooper , "Edgecombe, Rick P" , Tom Lendacky Subject: Re: [patch 3/3] x86/fpu/xsave: Optimize XSAVEC/S when XGETBV1 is supported In-Reply-To: <87ee1t9oka.ffs@tglx> References: <20220404103741.809025935@linutronix.de> <20220404104820.713066297@linutronix.de> <87ee1t9oka.ffs@tglx> Date: Tue, 19 Apr 2022 23:22:49 +0200 Message-ID: <878rs0vkd2.ffs@tglx> MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Apr 19 2022 at 15:43, Thomas Gleixner wrote: > On Thu, Apr 14 2022 at 10:24, Dave Hansen wrote: >> On 4/4/22 05:11, Thomas Gleixner wrote: >>> which is suboptimal. Prefetch works better when the access is linear. But >>> what's worse is that PKRU can be located in a different page which >>> obviously affects dTLB. >> >> The numbers don't lie, but I'm still surprised by this. Was this in a >> VM that isn't backed with large pages? task_struct.thread.fpu is >> kmem_cache_alloc()'d and is in the direct map, which should be 2M/1G >> pages almost all the time. > > Hmm. Indeed, that's weird. > > That was bare metal and I just checked that this was a production config > and not some weird debug muck which breaks large pages. I'll look deeper > into that. I can't find any reasonable explanation. The pages are definitely large pages, so yes the dTLB miss count does not make sense, but it's consistently faster and it's always the dTLB miss count which makes the big difference according to perf. For enhanced fun, I ran the lot on a AMD Zen3 machine and with the same test case (hackbench -l 10000) repeated 10 times by perf stat this is consistently slower than the non optimized variant. There is at least an explanation for that. A tight loop of 1 Mio xgetbv(1) invocations takes 9 Mio cycles on a SKL-X and 50 Mio cycles on a AMD Zen3. XSAVE is wonderful, isn't it? Thanks, tglx