Received: by 10.213.65.68 with SMTP id h4csp3274526imn; Tue, 3 Apr 2018 01:51:11 -0700 (PDT) X-Google-Smtp-Source: AIpwx49Q8x0EYlnOYrfY58aKzBebCz/KmtH/KmFO8DpUOBa7Z1aQRQ8mGBnOYkbY6dneKICqwh8y X-Received: by 2002:a17:902:4545:: with SMTP id m63-v6mr13251741pld.149.1522745471340; Tue, 03 Apr 2018 01:51:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1522745471; cv=none; d=google.com; s=arc-20160816; b=EfP6N1yKwnDMC9FODOrmBq/lnS0fySuov5VdxfH/XS8qAYlG6teLVmSsYFjkh9zUS8 7DG0CTpd8Uy/UvjyD3F43m3We+DBXt0D1PngMKqJSqWcLKvrSL9qGbDYZdMLUPbQC93/ oGKDsuNud/z0ULS7bqeVe8Jg8R4I2TJKV69ioGB83WkbNjjFU+hQJItgtsfjOISFExMH RoN7V7G6MspJsO5DvGgla8K/AGtXD/e/AE+QtKnLn72e18824VXqVb6rRbhnty6yJW+p Iuji6WKC7swHHRQ3MwGGVcz7TbvjeeRyBuckZlv+3b/N/4fysL4takixy27ScZl5S7XJ cpgw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=VgfyQ2p9QgmE+DmEDae4tiy7tunryvvTbA46BGqY7wI=; b=Hyqmgx7Fb9g6+UkfVhnsCQMf2yxWXxLKCYmTOoYZKObkPanum4cdy+pTJIElTaOfCo 2QreLHRfVTx382vppL9k6OMMOVBTKm0xP+NzmSoGfEu63g1dW0yZa7aeN+04fQIqihrm 3ayK8I0D7+nsDOlbOjiQQ+61rZbmj7RaDWsbQxTeoMP5JkZtlsc8sIRS93z+31m0x3B7 P8ufpuE8VQOqN+fmQ8LmZU5TjvJSzcxqszJ5bBcwf52qgjhMcNbvUahsnbHML7t3JUS2 tfEYZlGAvZ/G39LgE2xXqO6UUsgFYTvt+18+ZXqaTD1NUxBcSnPTsHd9mMnwZbgmhfTS YntA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x3si1580096pge.733.2018.04.03.01.50.57; Tue, 03 Apr 2018 01:51:11 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755295AbeDCIth (ORCPT + 99 others); Tue, 3 Apr 2018 04:49:37 -0400 Received: from atrey.karlin.mff.cuni.cz ([195.113.26.193]:51439 "EHLO atrey.karlin.mff.cuni.cz" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755086AbeDCItf (ORCPT ); Tue, 3 Apr 2018 04:49:35 -0400 Received: by atrey.karlin.mff.cuni.cz (Postfix, from userid 512) id 679EF80421; Tue, 3 Apr 2018 10:49:33 +0200 (CEST) Date: Tue, 3 Apr 2018 10:49:32 +0200 From: Pavel Machek To: Ingo Molnar Cc: Thomas Gleixner , David Laight , 'Rahul Lakkireddy' , "x86@kernel.org" , "linux-kernel@vger.kernel.org" , "netdev@vger.kernel.org" , "mingo@redhat.com" , "hpa@zytor.com" , "davem@davemloft.net" , "akpm@linux-foundation.org" , "torvalds@linux-foundation.org" , "ganeshgr@chelsio.com" , "nirranjan@chelsio.com" , "indranil@chelsio.com" , Andy Lutomirski , Peter Zijlstra , Fenghua Yu , Eric Biggers Subject: Re: [RFC PATCH 0/3] kernel: add support for 256-bit IO access Message-ID: <20180403084932.GA3926@amd> References: <7f0ddb3678814c7bab180714437795e0@AcuMS.aculab.com> <7f8d811e79284a78a763f4852984eb3f@AcuMS.aculab.com> <20180320082651.jmxvvii2xvmpyr2s@gmail.com> <20180320090802.qw4tqjmhy6yfd6sf@gmail.com> <20180320105427.bm4od7cpessbraag@gmail.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="bp/iNruPH9dso1Pn" Content-Disposition: inline In-Reply-To: <20180320105427.bm4od7cpessbraag@gmail.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --bp/iNruPH9dso1Pn Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hi! > > On Tue, 20 Mar 2018, Ingo Molnar wrote: > > > * Thomas Gleixner wrote: > > >=20 > > > > > So I do think we could do more in this area to improve driver per= formance, if the=20 > > > > > code is correct and if there's actual benchmarks that are showing= real benefits. > > > >=20 > > > > If it's about hotpath performance I'm all for it, but the use case = here is > > > > a debug facility... > > > >=20 > > > > And if we go down that road then we want a AVX based memcpy() > > > > implementation which is runtime conditional on the feature bit(s) a= nd > > > > length dependent. Just slapping a readqq() at it and use it in a lo= op does > > > > not make any sense. > > >=20 > > > Yeah, so generic memcpy() replacement is only feasible I think if the= most=20 > > > optimistic implementation is actually correct: > > >=20 > > > - if no preempt disable()/enable() is required > > >=20 > > > - if direct access to the AVX[2] registers does not disturb legacy F= PU state in=20 > > > any fashion > > >=20 > > > - if direct access to the AVX[2] registers cannot raise weird except= ions or have > > > weird behavior if the FPU control word is modified to non-standard= values by=20 > > > untrusted user-space > > >=20 > > > If we have to touch the FPU tag or control words then it's probably o= nly good for=20 > > > a specialized API. > >=20 > > I did not mean to have a general memcpy replacement. Rather something l= ike > > magic_memcpy() which falls back to memcpy when AVX is not usable or the > > length does not justify the AVX stuff at all. >=20 > OK, fair enough. >=20 > Note that a generic version might still be worth trying out, if and only = if it's=20 > safe to access those vector registers directly: modern x86 CPUs will do t= heir=20 > non-constant memcpy()s via the common memcpy_erms() function - which coul= d in=20 > theory be an easy common point to be (cpufeatures-) patched to an AVX2 va= riant, if=20 > size (and alignment, perhaps) is a multiple of 32 bytes or so. How is AVX2 supposed to help the memcpy speed? If the copy is small, constant overhead will dominate, and I don't think AVX2 is going to be win there. If the copy is big, well, the copy loop will likely run out of L1 and maybe even out of L2, and at that point speed of the loop does not matter because memory is slow...? Best regards, Pavel --=20 (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blo= g.html --bp/iNruPH9dso1Pn Content-Type: application/pgp-signature; name="signature.asc" Content-Description: Digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iEYEARECAAYFAlrDQBwACgkQMOfwapXb+vJC7wCdFJ9ASTxZjRfHbQWQbbXhKcJo ADQAoIoi+qXwoXzoQ/BPF+rnCSbNrhT0 =Nuw2 -----END PGP SIGNATURE----- --bp/iNruPH9dso1Pn--