Received: by 2002:a05:6a10:d5a5:0:0:0:0 with SMTP id gn37csp3905049pxb; Mon, 4 Oct 2021 12:16:58 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyJr7T9htEqNb+HrlcURALZhjaS26D81JB/BgW0KD2Wvom1lRbYpbvBSFRgLdQaBLNnjK0/ X-Received: by 2002:a63:1d10:: with SMTP id d16mr11984831pgd.156.1633375017994; Mon, 04 Oct 2021 12:16:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1633375017; cv=none; d=google.com; s=arc-20160816; b=q+bnKJFJ84W9O4Qe23sUYdc26oCKbrlzDvsPkEpQ9tyxse4SdgfE1R4FG2fLjGzvJW ioEFZecIyNgN2rLfDF7E2nEoVz8udldQkAncbrcYCXx6oMbeldXxUr2Vo33qTjlaErzb FtqGAHyPiVtyoYc2YSNEspGo3DpOsASpOXEFgfCJ9jpa/7e4ZA7Cx+fRil76jjYjrh0t RRRkF4lErRb5I/xBb2Aik9s9pqnlQFntJyLJN470lEBfwVRRhJuUjKGgnzJqEqTXPWlP csfothVuMrcRwl1hv7Q99qwcaHBsu6Ot3HNzGahRa2POrRiMnF6BLU+3zOCI1nO8wLyi 8KXw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:message-id:date:references :in-reply-to:subject:cc:to:dkim-signature:dkim-signature:from; bh=SO+cZ+YFKZSpGsGCCDK9F64VBp/Hum1Pxx9B3QO8XG8=; b=giffdg7Go0lRLSjYBPRE0+6x6xuebOdwV2zzrbmW6GjaCKv/i8DOUQlCZmkd3zcjQ4 ECKiFhjA6dwj0/8JhYIJD0kPa63uYUvM8khZo8W7soLVEk84+JknMZg4Vlm4TFVaHUp7 BP9Zahe9FrXRQJO/qLXiYJ7PwQvPbzF7HvHfykUlGo+Px24HpQD2qlh2ZtVhgZkcKx+k JhJMhrGC9YnDZatQkdwnMHYxqvnLREvQ15UXtVaYnN/fV5wbH6TLJ/O8ELVq8Da7JJNH VjS66tixhu4hd0eWfbKzb1rFgBlbRcUv5EqWKqI5jkQ1aYDjqqbSVcACAnwwX7QydM0L SXCw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=W786jS21; dkim=neutral (no key) header.i=@linutronix.de header.b=tIrZf+dS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id z6si1694858pfg.195.2021.10.04.12.16.44; Mon, 04 Oct 2021 12:16:57 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=W786jS21; dkim=neutral (no key) header.i=@linutronix.de header.b=tIrZf+dS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238877AbhJDTFw (ORCPT + 99 others); Mon, 4 Oct 2021 15:05:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59126 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238861AbhJDTFu (ORCPT ); Mon, 4 Oct 2021 15:05:50 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 15A4CC061753 for ; Mon, 4 Oct 2021 12:04:01 -0700 (PDT) From: Thomas Gleixner DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1633374238; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=SO+cZ+YFKZSpGsGCCDK9F64VBp/Hum1Pxx9B3QO8XG8=; b=W786jS21HN7ePvT4f2ng3uCvWD/7WzMq+Q6s5mg48DuafLvhXBC37frG6lQRqXxE6E/pAD aQAUTkr5XljUY9CnTLl7ais8fe6mOACuK49M8mQT6j2j90ICdLslsqPONLNxkFwAn8MWgt 7RrkKE4i5unnl3UjCq4PIiRzR7wVmutdcQXTcwKiy5w/wbYhhNnFv3/oxoO04ESMrgP4M9 JTj0B0hOyg0Xv9UFdowiHN41ByTDIH9o7JiwwK6ziljLQ5aclCscdM1LdUJgmjzmo4OO0y UiDq0svqTuc42X3l9NLFqlubb5eNvrv/RrW2qNN82FXc41uI+rcqM/sCS02JXw== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1633374238; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=SO+cZ+YFKZSpGsGCCDK9F64VBp/Hum1Pxx9B3QO8XG8=; b=tIrZf+dSR/+GoK1wiYTnBurxPwbF1kVp2ZVf87o+QpkaK75qk1QDvejd1DLVgznwe8dWyH cs3G7l4FVOqV1oCg== To: "Bae, Chang Seok" Cc: "bp@suse.de" , "Lutomirski, Andy" , "mingo@kernel.org" , "x86@kernel.org" , "Brown, Len" , "lenb@kernel.org" , "Hansen, Dave" , "Macieira, Thiago" , "Liu, Jing2" , "Shankar, Ravi V" , "linux-kernel@vger.kernel.org" Subject: Re: [PATCH v10 13/28] x86/fpu/xstate: Use feature disable (XFD) to protect dynamic user state In-Reply-To: <66A19E8A-11BF-4532-878F-A8D0935FDBC7@intel.com> References: <20210825155413.19673-1-chang.seok.bae@intel.com> <20210825155413.19673-14-chang.seok.bae@intel.com> <871r546b52.ffs@tglx> <87ee944hvj.ffs@tglx> <66A19E8A-11BF-4532-878F-A8D0935FDBC7@intel.com> Date: Mon, 04 Oct 2021 21:03:58 +0200 Message-ID: <87zgrofw81.ffs@tglx> MIME-Version: 1.0 Content-Type: text/plain Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Oct 03 2021 at 22:39, Chang Seok Bae wrote: > On Oct 1, 2021, at 13:20, Thomas Gleixner wrote: > > Looking at the changelog of the patch to delay XSTATE [1] load: > > This gives the kernel the potential to skip loading FPU state for tasks > that stay in kernel mode, or for tasks that end up with repeated > invocations of kernel_fpu_begin() & kernel_fpu_end(). Correct. > But I think XFD state is different from XSTATE. There is no use case for > XFD-enabled features in kernel mode. Correct, but your patch does not ensure that XFD features are disabled on context switch. You write the XFD mask of the next task when it differs frome the XFD mask of the previous task. So we have the following: prev XFD next XFD DISABLED DISABLED XFD features stay disabled ENABLED DISABLED XFD features are disabled DISABLED ENABLED XFD features are enabled ENABLED ENABLED XFD features stay enabled So it still runs in the kernel with XFD features enabled including interrupts, soft interupts, exceptions and NMIs. So what's the problem when it does a user -> kernel -> user transition with the user XFD on? > So, XFD state was considered to be switched under switch_to() just > like other user states. E.g. user FSBASE is switched here as kernel > does not use it. That's not really a justification. > But user GSBASE is loaded at returning to userspace. And so is XSTATE > Potentially, it is also beneficial as XFD-armed states will hold > INIT-state [3]: > > If XSAVE, XSAVEC, XSAVEOPT, or XSAVES is saving the state component i, the > instruction does not generate #NM when XCR0[i] = IA32_XFD[i] = 1; instead, > it saves bit i of XSTATE_BV field of the XSAVE header as 0 (indicating > that the state component is in its initialized state). How does that matter? The point is that if the FPU registers are unmodified then a task can return to user space without doing anything even if it went through five context switches. So how is XFD any different? Where is the kernel doing XSAVE / XSAVES: 1) On context switch which sets TIF_NEED_FPU_LOAD Once TIF_NEED_FPU_LOAD is set the kernel does not do XSAVES in the context of the task simply because it knows that the content is in the memory buffer. 2) In signal handling Only happens when TIF_NEED_FPU_LOAD == 0 Where is the kernel doing XRSTOR / XRSTORS: 1) On return to user space if the FPU registers are not up to date So this can restore XFD as well 2) In signal handling and related functions Only happens when TIF_NEED_FPU_LOAD == 0 So what's the win? No wrmsrl() on context switch, which means for the user -> kthread -> user context switch scenario for which the register preserving is optimized you spare two wrmsrl() invocations, run less code with less conditionals. What's the price? A few trivial XFD sanity checks for debug enabled kernels to ensure that XFD is correct on XSAVE[S] and XRSTOR[S], which have no runtime overhead on production systems. Even if we decide that these checks should be permanent then they happen in code pathes which are doing a slow X* operation anyway. Thanks, tglx