Received: by 2002:a05:6a10:6744:0:0:0:0 with SMTP id w4csp329815pxu; Wed, 14 Oct 2020 02:25:57 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzxwyYosqHhchsZgYFEJSIIpvntJDGoofY3lrnTjZDneWRakCuEjXZWOkL7WfMH0/HL8VG7 X-Received: by 2002:a17:906:7a0f:: with SMTP id d15mr4471912ejo.533.1602667557303; Wed, 14 Oct 2020 02:25:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1602667557; cv=none; d=google.com; s=arc-20160816; b=LWKKBZkAN/xI4MuOFUqVBRQzXDozruEK+Nk/VzrruBuZ3BHh8/brMBQR96vQJBfWiz zPFe4xBn5y/jWEobdKsNPkMphJ1hzX2zC91NyYXTHIZkLcW0Wtavl9IfkRF8yNYJ+vbh 30VGX2DxDiB/4OUOefZVYJDPEvJSGNqF/2pOecx8IjOc6q2G6V7t2O9NEcF4/8JKrrPc 8By5sZMecufGIDr4YSjy2axyNny4XYGq9JmXDzEi0xJiyxPOSWZdA6/OtWUwVN1kYmOl sWFEv8UxJ3bF//2Krg1tCmpx535w0pEMEUBWG2As0TCv3CCotEFzLfMViG4M0mtEHRB4 IkzA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:to:in-reply-to:cc:references:message-id:date :subject:mime-version:from:content-transfer-encoding:dkim-signature; bh=qCsKokdPjnp8TP3wf5PnST42XjOymf3yeUGYkyZbV2o=; b=oVkWgwP5u423zFoQ/IRsjrewCzW1yNqcsGzUfI8NRP/w+7XkXeq1qCyDD0o+Dx/Yhd G6UIsYPWtFSWsq7TD5f8p5CgZtROs+tO8hPBWCD3aX9clH+xQMdqxwQvD+m/7iZ0DFZq Jv7ldi11m+PwJhUl7PVQTZkldoCBs/yq3TxeRRvtBlA15UU/d56i71itw+cMUH3UZOXa OaZjk3B1oNW16nomsgv+sc2KSzMeKk7NYTKPc41X+3ATRK0kh1fgZwo7A+BKsZ4Rc1g1 GQifzgwtPlACsYhBTaImbrB5M2hh5dxpwwQiKbKJhkZNLVfrB7hwZWd2zAEIfThKS6Zk 1FlA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@amacapital-net.20150623.gappssmtp.com header.s=20150623 header.b=mFTokw5R; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id 92si1740818edp.262.2020.10.14.02.25.35; Wed, 14 Oct 2020 02:25:57 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@amacapital-net.20150623.gappssmtp.com header.s=20150623 header.b=mFTokw5R; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728184AbgJNJX2 (ORCPT + 99 others); Wed, 14 Oct 2020 05:23:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39970 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731272AbgJNJW4 (ORCPT ); Wed, 14 Oct 2020 05:22:56 -0400 Received: from mail-pj1-x1044.google.com (mail-pj1-x1044.google.com [IPv6:2607:f8b0:4864:20::1044]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F3151C002154 for ; Tue, 13 Oct 2020 18:47:15 -0700 (PDT) Received: by mail-pj1-x1044.google.com with SMTP id p3so471590pjd.0 for ; Tue, 13 Oct 2020 18:47:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amacapital-net.20150623.gappssmtp.com; s=20150623; h=content-transfer-encoding:from:mime-version:subject:date:message-id :references:cc:in-reply-to:to; bh=qCsKokdPjnp8TP3wf5PnST42XjOymf3yeUGYkyZbV2o=; b=mFTokw5RApYPGMT3T3Oj6BSU5f1paNh0rMNn9VMxLctVwG7K4kzAye0YxmmL1987lO X5uwRptWm8DJ/cqGRKx9lP9Z7OvbLL8/V5bPSQC6iFWOltDanRtxWLrZFlKALpB78CeZ w8Al5qXs4mtlg9lyP/rmN395NQ8x6CcioHEzU6zLIJk5OtDu9nJWaaO+zf4aQtDfBT8U EkKzJL87BYuoPELORN3+ZvaPbnprKAxwGAqpwLo1+0L4UZrt2hoPLNyqDKN86hYqlAzh c8Vg5nmsh6pS4uFZdmwCSuiZFjbfBIOYDNCoDk4B3smpx0ezA+in5LMjtwkaSwIBi8L8 qlLA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:content-transfer-encoding:from:mime-version :subject:date:message-id:references:cc:in-reply-to:to; bh=qCsKokdPjnp8TP3wf5PnST42XjOymf3yeUGYkyZbV2o=; b=O5fRo5Gewk0J7r0TxP4kghfxK1Ugv64ZyHF8y8tLytCBCYPyNiHvqvhxqugLPHH4f6 KjxujsVdWzlvvsoFYPNwkTO2+rg5c5EszC5iQp4ZZQx66gqIoyEUQE0GYIKW4n5+jAuX +IVoFumQefknupKPUn2Ogz1sFShk4cslcT+FeG1hQ1/423hF+SUv4QYjsVxpna0JjySw X1J5GEqbdDvEJRyeeDl35UN0+luJvzojbOnKNd2Z1hC9ca3imYRfiQyjBCg9+O8inR0C YgY+z2QXZ7M0P6i6Vs+e9ELR4c0vjFQp+UvSwPlIxMqKzTdnZUkusew2fXWuLaMtkIsv flWA== X-Gm-Message-State: AOAM532bEvCCJe5ql0iltv1g+47X8YCRozPJ+Vzp2KwLO1H5dOY4JMO+ 9jArTtP6yIH/UikJcc8m/4+r1g== X-Received: by 2002:a17:902:8698:b029:d3:b362:7342 with SMTP id g24-20020a1709028698b02900d3b3627342mr2454658plo.50.1602640035390; Tue, 13 Oct 2020 18:47:15 -0700 (PDT) Received: from ?IPv6:2601:646:c200:1ef2:6de6:2b7a:f0be:2adc? ([2601:646:c200:1ef2:6de6:2b7a:f0be:2adc]) by smtp.gmail.com with ESMTPSA id c17sm996998pfj.220.2020.10.13.18.47.14 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 13 Oct 2020 18:47:14 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable From: Andy Lutomirski Mime-Version: 1.0 (1.0) Subject: Re: [RFC PATCH 13/22] x86/fpu/xstate: Expand dynamic user state area on first use Date: Tue, 13 Oct 2020 18:47:13 -0700 Message-Id: <89AB5807-E295-4AB2-9568-9B6306E896F8@amacapital.net> References: Cc: Andy Lutomirski , "Bae, Chang Seok" , Thomas Gleixner , Ingo Molnar , Borislav Petkov , X86 ML , "Hansen, Dave" , "Liu, Jing2" , "Shankar, Ravi V" , LKML In-Reply-To: To: "Brown, Len" X-Mailer: iPhone Mail (18A393) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > On Oct 13, 2020, at 3:31 PM, Brown, Len wrote: >=20 > =EF=BB=BF >>=20 >> From: Andy Lutomirski =20 >=20 >>> diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c >>> @@ -518,3 +518,40 @@ int fpu__exception_code(struct fpu *fpu, int trap_n= r) > .. >>> +bool xfirstuse_event_handler(struct fpu *fpu) >>> +{ >>> + bool handled =3D false; >>> + u64 event_mask; >>> + >>> + /* Check whether the first-use detection is running. */ >>> + if (!static_cpu_has(X86_FEATURE_XFD) || !xfirstuse_enabled()) >>> + return handled; >>> + >=20 >> MSR_IA32_XFD_ERR needs to be wired up in the exception handler, not in >> some helper called farther down the stack >=20 > xfirstuse_event_handler() is called directly from the IDTENTRY exc_device_= not_available(): >=20 >>> @@ -1028,6 +1028,9 @@ DEFINE_IDTENTRY(exc_device_not_available) >>> { >>> unsigned long cr0 =3D read_cr0(); >>>=20 >>> + if (xfirstuse_event_handler(¤t->thread.fpu)) >>> + return; >=20 > Are you suggesting we should instead open code it inside that routine? MSR_IA32_XFD_ERR is like CR2 and DR6 =E2=80=94 it=E2=80=99s functionally a p= art of the exception. Let=E2=80=99s handle it as such. (And, a bit like DR6= , it=E2=80=99s a bit broken.) >=20 >> But this raises an interesting point -- what happens if allocation >> fails? I think that, from kernel code, we simply cannot support this >> exception mechanism. If kernel code wants to use AMX (and that would >> be very strange indeed), it should call x86_i_am_crazy_amx_begin() and >> handle errors, not rely on exceptions. =46rom user code, I assume we >> send a signal if allocation fails. >=20 > The XFD feature allows us to transparently expand the kernel context switc= h buffer > for a user task, when that task first touches this associated hardware. > It allows applications to operate as if the kernel had allocated the backi= ng AMX > context switch buffer at initialization time. However, since we expect on= ly > a sub-set of tasks to actually use AMX, we instead defer allocation until > we actually see first use for that task, rather than allocating for all ta= sks. I sure hope that not all tasks use it. Context-switching it will be absurdly= expensive. >=20 > While we currently don't propose AMX use inside the kernel, it is conceiva= ble > that could be done in the future, just like AVX is used by the RAID code; > and it would be done the same way -- kernel_fpu_begin()/kernel_fpu_end(). > Such future Kernel AMX use would _not_ arm XFD, and would _not_ provoke th= is fault. > (note that we context switch the XFD-armed state per task) How expensive is *that*? Can you give approximate cycle counts for saving, r= estoring, arming and disarming? This reminds me of TS. Writing TS was more expensive than saving the whole = FPU state. And, for kernel code, we can=E2=80=99t just =E2=80=9Cnot arm=E2=80= =9D the XFD =E2=80=94 we would have to disarm it. >=20 > vmalloc() does not fail, and does not return an error, and so there is no c= oncept > of returning a signal. If we got to the point where vmalloc() sleeps, the= n the system > has bigger OOM issues, and the OOM killer would be on the prowl. >=20 > If we were concerned about using vmalloc for a couple of pages in the task= structure, > Then we could implement a routine to harvest unused buffers and free them Kind of like we vmalloc a couple pages for kernel stacks, and we carefully c= ache them. And we handle failure gracefully.=