Received: by 10.213.65.68 with SMTP id h4csp1044755imn; Wed, 21 Mar 2018 00:48:49 -0700 (PDT) X-Google-Smtp-Source: AG47ELs175d543gjhIF/YuJm+LxLEl9E5WyqoPlvTQa1Da+4FLZRV8OF5SC9u4W00Vbwp+79NB4K X-Received: by 2002:a17:902:bd8e:: with SMTP id q14-v6mr19767360pls.19.1521618529869; Wed, 21 Mar 2018 00:48:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1521618529; cv=none; d=google.com; s=arc-20160816; b=WBLGjWtFyYLDIGwrKSOXeeV6UA7qfqgzMEfr10TWnXB4WmoDaN6d9VzYd8d40RNkha 3Zz4mW6WYi8BGHkAp3pYzlYp9auSvmofxpEke+jaWB6dli+nhXIACTR/N6a/Hay6ayKI YzPD8NnPSTf6ruclj/m/l6cWG+eKEwFrDef23vc5iRnO6cimg8xxK60yK3DVhr0MXjxr Go7PrZOYKUJ/oiz7VtxO2gz5Y5WQIuyaVS1qLchxXiCzFZb/DDNqEr08vx+ciFJi9I4g rd/E+Adp8DoycG7L1Qa98zoAoRPfljPnecpZX+n85NrDuYmkqH4jI8AQ8/tGosvbvpJl 8HBQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=m+EuftlQiuKtrj92Ao1sArBzlKUG1aRRC6dklrN6Nn0=; b=T6nBwN4tp1+SgKplHlNcfYbnTm3Yr62Be3BWH6gInyuOTiAdqgV/ZQ1oAPHSREPveD PCsaAtNRPtHBZubHX5h9AQMreOJ46lerYRXaUFb5KY/uwksC5Q9+KoTIS1P3rs57DuH4 6d+l5q/XJ/sInIMwmcGOfLsT6jLc61i8lfwSgAtJx7tPFwQCnI/HIRdYMn+YYaY8FSds R7p1q6rDKoFumvmBvEuNtu7qq+njwcFC40jEX6s56+vslU2Cg1tNck3H//mdtrIcZyr5 nHG3Zz219PkWEMwiXsoDhvvcA+MpRap7n3aQce3xIlStq9f+WZCtGiqrnBRs25LFtbFO GOOQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b=PaneUfj6; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h126si2374050pgc.817.2018.03.21.00.48.34; Wed, 21 Mar 2018 00:48:49 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b=PaneUfj6; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752105AbeCUHqn (ORCPT + 99 others); Wed, 21 Mar 2018 03:46:43 -0400 Received: from mail-wm0-f65.google.com ([74.125.82.65]:54771 "EHLO mail-wm0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751633AbeCUHqj (ORCPT ); Wed, 21 Mar 2018 03:46:39 -0400 Received: by mail-wm0-f65.google.com with SMTP id h76so7870856wme.4; Wed, 21 Mar 2018 00:46:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=m+EuftlQiuKtrj92Ao1sArBzlKUG1aRRC6dklrN6Nn0=; b=PaneUfj6wkGLGk66nPhRUa87/P3yW6rbE0YiRZFb8JTaiSjFyuPyV9N6fw+NP4DtSF c8dEXXW/HGWPHSJ5nHb5UM/7RsYEuje0Mvh0CeF5iIfVhyFrzA3nUg2J3MZREclO98S4 t4C+JLNPtGtDrvVRsAXrTrji47kLRtGMvDtWBddb9qvMUJT8ieT2KfiDiIp2l45SSDs8 nsKPWwr/NZttk4XoJNE+w6MWOow15827FguZ0dnBixDVgILfmExudHijUl5slJ2n9jQV 26AzCLYJiA3JiLyaBn0OSjGSilWwANKHhmKEWTG+aem4FmtkPhTelKR3wxFMxu3LzdEt w1XQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to:user-agent; bh=m+EuftlQiuKtrj92Ao1sArBzlKUG1aRRC6dklrN6Nn0=; b=SrSTtzu216e3ktRfnU3OfZBFEpBWh2YPNhBpMGXc2B5kIyEoQsG6yEAY3e3ySxhy1I rBoETkWXwRP/tHH8dpvarkTOXN2N+bYK6ZVHKsSsWA4LGqoJ6ZYKy5Bd5315uBxBNg9L sJ1SPn1UvWMY7B5r+CKVh5DPwInxKu7GffJbNIews+HwVit/0lUiFEcXTLZfEG/cJdol l0ugEWji5sE3cl1wE2DRMG+9zfBjaR9KHn6pbB9QTx/s8+k5Jmc24js+dKrqotmh+sUw GIiROxjqtyQBeiYaN4B+cLrC5+P7+g+1wL8sNT7v+iQGTJPHVOaMitzTF882pcfzLuP/ NcWw== X-Gm-Message-State: AElRT7GnjkZm87rs1ROsI4WHpzC4PgEl9GthfxKviSuP8DcCHJaq6jLF wvuxa+8/UIDm0GhAnxPXrmE= X-Received: by 10.28.130.1 with SMTP id e1mr1672459wmd.45.1521618397705; Wed, 21 Mar 2018 00:46:37 -0700 (PDT) Received: from gmail.com (2E8B0CD5.catv.pool.telekom.hu. [46.139.12.213]) by smtp.gmail.com with ESMTPSA id 69sm3217420wmp.36.2018.03.21.00.46.36 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 21 Mar 2018 00:46:36 -0700 (PDT) Date: Wed, 21 Mar 2018 08:46:34 +0100 From: Ingo Molnar To: Linus Torvalds Cc: Thomas Gleixner , David Laight , Rahul Lakkireddy , "x86@kernel.org" , "linux-kernel@vger.kernel.org" , "netdev@vger.kernel.org" , "mingo@redhat.com" , "hpa@zytor.com" , "davem@davemloft.net" , "akpm@linux-foundation.org" , "ganeshgr@chelsio.com" , "nirranjan@chelsio.com" , "indranil@chelsio.com" , Andy Lutomirski , Peter Zijlstra , Fenghua Yu , Eric Biggers Subject: Re: [RFC PATCH 0/3] kernel: add support for 256-bit IO access Message-ID: <20180321074634.dzpyjz3ia46snodh@gmail.com> References: <7f0ddb3678814c7bab180714437795e0@AcuMS.aculab.com> <7f8d811e79284a78a763f4852984eb3f@AcuMS.aculab.com> <20180320082651.jmxvvii2xvmpyr2s@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20170609 (1.8.3) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org So I poked around a bit and I'm having second thoughts: * Linus Torvalds wrote: > On Tue, Mar 20, 2018 at 1:26 AM, Ingo Molnar wrote: > > > > So assuming the target driver will only load on modern FPUs I *think* it should > > actually be possible to do something like (pseudocode): > > > > vmovdqa %ymm0, 40(%rsp) > > vmovdqa %ymm1, 80(%rsp) > > > > ... > > # use ymm0 and ymm1 > > ... > > > > vmovdqa 80(%rsp), %ymm1 > > vmovdqa 40(%rsp), %ymm0 > > > > ... without using the heavy XSAVE/XRSTOR instructions. > > No. The above is buggy. It may *work*, but it won't work in the long run. > > Pretty much every single vector extension has traditionally meant that > touching "old" registers breaks any new register use. Even if you save > the registers carefully like in your example code, it will do magic > and random things to the "future extended" version. This should be relatively straightforward to solve via a proper CPU features check: for example by only patching in the AVX routines for 'known compatible' fpu_kernel_xstate_size values. Future extensions of register width will extend the XSAVE area. It's not fool-proof: in theory there could be semantic extensions to the vector unit that does not increase the size of the context - but the normal pattern is to increase the number of XINUSE bits and bump up the maximum context area size. If that's a worry then an even safer compatibility check would be to explicitly list CPU models - we do track them pretty accurately anyway these days, mostly due to perf PMU support defaulting to a safe but dumb variant if a CPU model is not specifically listed. That method, although more maintenance-intense, should be pretty fool-proof AFAICS. > So I absolutely *refuse* to have anything to do with the vector unit. > You can only touch it in the kernel if you own it entirely (ie that > "kernel_fpu_begin()/_end()" thing). Anything else is absolutely > guaranteed to cause problems down the line. > > And even if you ignore that "maintenance problems down the line" issue > ("we can fix them when they happen") I don't want to see games like > this, because I'm pretty sure it breaks the optimized xsave by tagging > the state as being dirty. So I added a bit of instrumentation and the current state of things is that on 64-bit x86 every single task has an initialized FPU, every task has the exact same, fully filled in xfeatures (XINUSE) value: [root@galatea ~]# grep -h fpu /proc/*/task/*/fpu | sort | uniq -c 504 x86/fpu: initialized : 1 504 x86/fpu: xfeatures_mask : 7 So our latest FPU model is *really* simple and user-space should not be able to observe any changes in the XINUSE bits of the XSAVE header, because (at least for the basic vector CPU features) all bits are maxed out all the time. Note that this is with an AVX (128-bit) supporting CPU: [ 0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers' [ 0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers' [ 0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers' [ 0.000000] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256 [ 0.000000] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format. But note that it probably wouldn't make sense to make use of XINUSE optimizations on most systems for the AVX space, as glibc will use the highest-bitness vector ops for its regular memcpy(), and every user task makes use of memcpy. It does make sense for some of the more optional XSAVE based features such as pkeys. But I don't have any newer Intel system with a wider xsave feature set to check. > So no. Don't use vector stuff in the kernel. It's not worth the pain. That might still be true, but still I'm torn: - Broad areas of user-space has seemlessly integrated vector ops and is using them all the time they can find an excuse to use them. - The vector registers are fundamentally callee-saved, so in most synchronous calls the vector unit registers are unused. Asynchronous interruptions of context (interrupts, faults, preemption, etc.) can still use them as well, as long as they save/restore register contents. So other than Intel not making it particularly easy to make a forwards compatible vector register granular save/restore pattern (but see above for how we could handle that) for asynchronous contexts, I don't see too many other complications. Thanks, Ingo