Received: by 10.213.65.68 with SMTP id h4csp223847imn; Tue, 20 Mar 2018 01:40:34 -0700 (PDT) X-Google-Smtp-Source: AG47ELvxeOTWsLatRc+VryDickOX8EqE78y6Qe/NqLTV4UiLnQ55zDADcKdikCOexfuAjo9AQjUc X-Received: by 2002:a17:902:b786:: with SMTP id e6-v6mr16016742pls.58.1521535234840; Tue, 20 Mar 2018 01:40:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1521535234; cv=none; d=google.com; s=arc-20160816; b=f4zrPZ/NeUoe2az+V04U5sB8nb7lk/fAGIMWESfQaFkexhgJVMt8tvPXgboRXYdXWb 5fD3U5cMDmCCf+UcdbGVO9AVaaw5WyOwb1dGaWT3RNGbtqtRe54CPcUwnwSfNSzAQ3F8 npK8L0jZJRGwdclf/w5IGPHSOOGnWLJTJJGVTMG2lOhVjPmm5ZbJ3FexaK1fA+KuOVXc Ab+dGjDkcHMbmcy8y3R+LlFPCUP3uqE30wLFOFcIXDf4+6pMX85UXtgP8J5xQQut5pNr i3a+YhNohJ+As682sPpIffejgyMOkn2CWcPWVeRgcW48UL/xJUUIvaBLyRs1g3q6duJ4 63WA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date :arc-authentication-results; bh=mMEGXFPo0w9Xc4wtTbBRVbuYfdzFiQGN+AupJh5yvyk=; b=uJzorzIj2H8+Bm5iwN2bghC65X6R/fueGJBTon27SCpq7M0pMDpilpUuqC+CsvMJ2V nr9fcn1zhVA26sU27rlFXjMQSKDJfVZMXrPbCBTFpZpjlKvnxbQEqyjLekyRAwpk/A0o 9O36zwa/jg9Z6BTvVyc3RYC/bzsMJZsqL5yjKmKajbUjdTrPOWk0SOjE9r9xrhCQV/l1 9DOj3Rqu8P7zEyX4ZJrjjWQWBcTOu+6afGii8hDu1aRDLOxCDg7TiKI4UETu5HDOlwPx S+Pui5FXRmPO0DkcTWMiFNjCc0qKcBldjyCGlEtuiSuUeli9/LrwH7r6zVqoPAI1Vyk9 2dsQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id t3-v6si1152121ply.226.2018.03.20.01.40.18; Tue, 20 Mar 2018 01:40:34 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752330AbeCTIiv (ORCPT + 99 others); Tue, 20 Mar 2018 04:38:51 -0400 Received: from Galois.linutronix.de ([146.0.238.70]:33990 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751790AbeCTIir (ORCPT ); Tue, 20 Mar 2018 04:38:47 -0400 Received: from hsi-kbw-5-158-153-52.hsi19.kabel-badenwuerttemberg.de ([5.158.153.52] helo=nanos.tec.linutronix.de) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1eyCn1-0003X3-GL; Tue, 20 Mar 2018 09:38:39 +0100 Date: Tue, 20 Mar 2018 09:38:38 +0100 (CET) From: Thomas Gleixner To: Ingo Molnar cc: David Laight , 'Rahul Lakkireddy' , "x86@kernel.org" , "linux-kernel@vger.kernel.org" , "netdev@vger.kernel.org" , "mingo@redhat.com" , "hpa@zytor.com" , "davem@davemloft.net" , "akpm@linux-foundation.org" , "torvalds@linux-foundation.org" , "ganeshgr@chelsio.com" , "nirranjan@chelsio.com" , "indranil@chelsio.com" , Andy Lutomirski , Peter Zijlstra , Fenghua Yu , Eric Biggers Subject: Re: [RFC PATCH 0/3] kernel: add support for 256-bit IO access In-Reply-To: <20180320082651.jmxvvii2xvmpyr2s@gmail.com> Message-ID: References: <7f0ddb3678814c7bab180714437795e0@AcuMS.aculab.com> <7f8d811e79284a78a763f4852984eb3f@AcuMS.aculab.com> <20180320082651.jmxvvii2xvmpyr2s@gmail.com> User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 20 Mar 2018, Ingo Molnar wrote: > * Thomas Gleixner wrote: > > > > Useful also for code that needs AVX-like registers to do things like CRCs. > > > > x86/crypto/ has a lot of AVX optimized code. > > Yeah, that's true, but the crypto code is processing fundamentally bigger blocks > of data, which amortizes the cost of using kernel_fpu_begin()/_end(). Correct. > So assuming the target driver will only load on modern FPUs I *think* it should > actually be possible to do something like (pseudocode): > > vmovdqa %ymm0, 40(%rsp) > vmovdqa %ymm1, 80(%rsp) > > ... > # use ymm0 and ymm1 > ... > > vmovdqa 80(%rsp), %ymm1 > vmovdqa 40(%rsp), %ymm0 > > ... without using the heavy XSAVE/XRSTOR instructions. > > Note that preemption probably still needs to be disabled and possibly there are > other details as well, but there should be no 'heavy' FPU operations. Emphasis on should :) > I think this should still preserve all user-space FPU state and shouldn't muck up > any 'weird' user-space FPU state (such as pending exceptions, legacy x87 running > code, NaN registers or weird FPU control word settings) we might have interrupted > either. > > But I could be wrong, it should be checked whether this sequence is safe. > Worst-case we might have to save/restore the FPU control and tag words - but those > operations should still be much faster than a full XSAVE/XRSTOR pair. Fair enough. > So I do think we could do more in this area to improve driver performance, if the > code is correct and if there's actual benchmarks that are showing real benefits. If it's about hotpath performance I'm all for it, but the use case here is a debug facility... And if we go down that road then we want a AVX based memcpy() implementation which is runtime conditional on the feature bit(s) and length dependent. Just slapping a readqq() at it and use it in a loop does not make any sense. Thanks, tglx