Received: by 2002:a19:f614:0:0:0:0:0 with SMTP id x20csp63612lfe; Fri, 15 Apr 2022 19:40:39 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzP8U6cqkr0o/o0h23zw9blDqCmV7PWcKzMJJ4TFy2sxfVze0Y2EZ1zzuPCExlZOiF32dUa X-Received: by 2002:a17:902:f789:b0:156:5f56:ddff with SMTP id q9-20020a170902f78900b001565f56ddffmr1881625pln.116.1650076838937; Fri, 15 Apr 2022 19:40:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1650076838; cv=none; d=google.com; s=arc-20160816; b=mlXEXIFVTpI/fhzV6rEc8rajkWul0WdkpSVk4YUzaslMiZ/DtOwIoww2X7S/hu0aE5 VGXTLoT6cmkC9/N92I/HvGy8tD86uUEV7I694/UMT+WjGNeMIcH1U3BaPbnHPAUhfWX3 wmljAdQtxFARS95m4wEJeVS3lqVZDBDbsXYXNFYs+tXdddbOmKU5Va8/DMEK386QvGsO v0jjK6s3BHGs+0KqUDeZsa/pSp7scVVYvPk/9wEAUp7l21XMM7Ym/plvyQmXQAoJE6LJ SZR+q0LKRSjURSJkEnCDZup3UzljnZPBCe4Kq4JbK4Plq6MFhzlShiDo7GSAM0e9d3v9 s/5Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:subject :from:references:cc:to:content-language:user-agent:mime-version:date :message-id:dkim-signature; bh=eiYTa95aNhXgceNupEdpTetg6e/kEL7vSbSTKOr8NyY=; b=Fdf8Smv+69lbbkXK2JAK1bg9ejSji0iRA7g9z4M/I2AOc1dwNpnOr+aFJLPDR+tEsc TCNLIpyyKGnfFmDK2dvjMKUJ927szpiHfj1oUvzLzz9WhcN1broc2+mTXJEoxJF+KWZj hdDajLRV0yV/G+oeD/aDHewR3XP3iOsCijic9Z7vt+wXwVf0DqgVLlrJkS13OZUId76O njFngiwSynojkL/J4Y0SCwMIEdwdhvD70hZKz+1quZtn2y05M48zIwofSb/PfwBeBxAW AnvJyWTyA2iWUbv7iDEu60SPZytK2IkaeIW/UGtcdvFCxu0u1pYP+ILG4UzulanRz9Vv d3Yw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=NEkABJcB; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id fz21-20020a17090b025500b001c976ed0feesi3073823pjb.102.2022.04.15.19.40.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 15 Apr 2022 19:40:38 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=NEkABJcB; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id A2FD618007A; Fri, 15 Apr 2022 18:47:47 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245645AbiDNR1I (ORCPT + 99 others); Thu, 14 Apr 2022 13:27:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33702 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245627AbiDNR1G (ORCPT ); Thu, 14 Apr 2022 13:27:06 -0400 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4A5139D4C6 for ; Thu, 14 Apr 2022 10:24:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1649957080; x=1681493080; h=message-id:date:mime-version:to:cc:references:from: subject:in-reply-to:content-transfer-encoding; bh=0zLW2H67Vu3q710WNvk8HEeL0XmQc0GrU44QmfVgE2g=; b=NEkABJcBC3zwjxy+I18n4bPPp52cupjgYnzehln2HK11r4uTh9j7umeG KfitzxZTRgzWvREheR2HB1qoWXI1HjpQJbqK/3zKCbQkyfuL1kUPAl3P9 BnuEJP130uLO3rEREBKHu8MHRJ1r3UKJg68KHI4mafoxlI8wCmJsP2WiY Sa4PancRyMMFSlBLrWR36AEvopMra8VjmqZtULztgzFRbUSrmHC7l4pdo 7rJUlinqk1yAUDhNHZpeo2njwh94YpQQLfqWrnIOjoEuug6ufIYDvLiS1 Irx/JfeEJ0J8t6lF3QNs5O/mUXMwiBt/eG+mkVi1bNSDyUXT0YLSE3++6 A==; X-IronPort-AV: E=McAfee;i="6400,9594,10317"; a="262429728" X-IronPort-AV: E=Sophos;i="5.90,260,1643702400"; d="scan'208";a="262429728" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Apr 2022 10:24:39 -0700 X-IronPort-AV: E=Sophos;i="5.90,260,1643702400"; d="scan'208";a="612409801" Received: from msahoo-mobl1.amr.corp.intel.com (HELO [10.212.62.78]) ([10.212.62.78]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Apr 2022 10:24:38 -0700 Message-ID: Date: Thu, 14 Apr 2022 10:24:45 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.7.0 Content-Language: en-US To: Thomas Gleixner , LKML Cc: x86@kernel.org, Andrew Cooper , "Edgecombe, Rick P" References: <20220404103741.809025935@linutronix.de> <20220404104820.713066297@linutronix.de> From: Dave Hansen Subject: Re: [patch 3/3] x86/fpu/xsave: Optimize XSAVEC/S when XGETBV1 is supported In-Reply-To: <20220404104820.713066297@linutronix.de> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-5.8 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,NICE_REPLY_A,RDNS_NONE,SPF_HELO_NONE, T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 4/4/22 05:11, Thomas Gleixner wrote: > A typical scenario is an active set of 0x202 (PKRU + SSE) out of the full > supported set of 0x2FF. That means XSAVEC/S writes and XRSTOR[S] reads: It might be worth reminding folks why PKRU is a special snowflake: The default PKRU enforced by the kernel is its most restrictive possible value (0xfffffffc). This means that PKRU defaults to being in its non-init state even for tasks which do nothing protection-keys-related. > which is suboptimal. Prefetch works better when the access is linear. But > what's worse is that PKRU can be located in a different page which > obviously affects dTLB. The numbers don't lie, but I'm still surprised by this. Was this in a VM that isn't backed with large pages? task_struct.thread.fpu is kmem_cache_alloc()'d and is in the direct map, which should be 2M/1G pages almost all the time. > --- a/arch/x86/kernel/fpu/xstate.c > +++ b/arch/x86/kernel/fpu/xstate.c > @@ -86,6 +86,8 @@ static unsigned int xstate_flags[XFEATUR > #define XSTATE_FLAG_SUPERVISOR BIT(0) > #define XSTATE_FLAG_ALIGNED64 BIT(1) > > +DEFINE_STATIC_KEY_FALSE(__xsave_use_xgetbv1); > + > /* > * Return whether the system supports a given xfeature. > * > @@ -1481,7 +1483,7 @@ void xfd_validate_state(struct fpstate * > } > #endif /* CONFIG_X86_DEBUG_FPU */ > > -static int __init xfd_update_static_branch(void) > +static int __init fpu_update_static_branches(void) > { > /* > * If init_fpstate.xfd has bits set then dynamic features are > @@ -1489,9 +1491,13 @@ static int __init xfd_update_static_bran > */ > if (init_fpstate.xfd) > static_branch_enable(&__fpu_state_size_dynamic); > + > + if (cpu_feature_enabled(X86_FEATURE_XGETBV1) && > + cpu_feature_enabled(X86_FEATURE_XCOMPACTED)) > + static_branch_enable(&__xsave_use_xgetbv1); > return 0; > } > -arch_initcall(xfd_update_static_branch) > +arch_initcall(fpu_update_static_branches) > > void fpstate_free(struct fpu *fpu) > { > --- a/arch/x86/kernel/fpu/xstate.h > +++ b/arch/x86/kernel/fpu/xstate.h > @@ -10,7 +10,12 @@ > DECLARE_PER_CPU(u64, xfd_state); > #endif > > -static inline bool xsave_use_xgetbv1(void) { return false; } > +DECLARE_STATIC_KEY_FALSE(__xsave_use_xgetbv1); > + > +static __always_inline __pure bool xsave_use_xgetbv1(void) > +{ > + return static_branch_likely(&__xsave_use_xgetbv1); > +} > > static inline void __xstate_init_xcomp_bv(struct xregs_state *xsave, u64 mask) > { > @@ -185,13 +190,18 @@ static inline int __xfd_enable_feature(u > static inline void os_xsave(struct fpstate *fpstate) > { > u64 mask = fpstate->xfeatures; > - u32 lmask = mask; > - u32 hmask = mask >> 32; > + u32 lmask, hmask; > int err; > > WARN_ON_FPU(!alternatives_patched); > xfd_validate_state(fpstate, mask, false); > > + if (xsave_use_xgetbv1()) > + mask &= xgetbv(1); How about this comment for the masking operation: /* * Remove features in their init state from the mask. This * makes the XSAVE{S,C} writes less sparse and quicker for * the CPU. */ > + lmask = mask; > + hmask = mask >> 32; > + > XSTATE_XSAVE(&fpstate->regs.xsave, lmask, hmask, err); > > /* We should never fault when copying to a kernel buffer: */