Received: by 10.213.65.68 with SMTP id h4csp420262imn; Tue, 20 Mar 2018 06:34:01 -0700 (PDT) X-Google-Smtp-Source: AG47ELtQMAeaDfDDeC6FMN+xCF9/CMNIfxPaGlwcNKdrpiBQpKwVAVBFaiS9eUcA3TWDpG6Th/iu X-Received: by 2002:a17:902:2c83:: with SMTP id n3-v6mr5193320plb.317.1521552841645; Tue, 20 Mar 2018 06:34:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1521552841; cv=none; d=google.com; s=arc-20160816; b=lUeI5whBdeAakhQQNl9ssmBJiLzaoM0QyJQ0Rn174UM5ES1SGpPho7VNbr2rS4LuK0 QP4oHxaYlvru0WC/brihKxZktAdCYgWjGY5rjHEf9PwHN+TiOT0Rt/QUjxbvXJHunpwj 9ZY6e057S+9DCPQe0e6yQE14m5Rv4UOy4Lmp8ujIggAvzc/pPr/PkBxtOSXquy8uGxPg xA42T2W9KjfQ0rmF3jYrq+qBF8/wbgE6xhtXpB4y1qVnPnBZrl9KxjgLFVMZ47pMpF2j tOBLJ5MsR41NBWKdUIJbiLhXbQ7QSjGRoZV0fYnLE3rO4Jvf5zv22xHb5ufPQ5ZssPTu EL/g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:content-transfer-encoding :content-language:accept-language:in-reply-to:references:message-id :date:thread-index:thread-topic:subject:cc:to:from :arc-authentication-results; bh=dRgBwdB+3W8avXdjm7gVpobnF9gX2VQImNL2Ipc1vl4=; b=YC7ThS18qDtf2gpH9F/XNNVFhTBliii8aJyzzNSKXoFhQnlk1lU5WjC2WCCUUxGKwm MsED/xMQzlKaVni4rKMNJyFmEt+J2Qmb5KnOiWeC0xsOtP8aZv2ajrPsYSMbFib16TVK 0bn/Y2pUyeRlFPCfjiL6i6k5QfU/VBnsMufkyfP1Bg8YPB5gmqKuo9wWpXn+7ocHH+cd mm/3O4G1hsCEpphb79tmgXeutkJgBU/tVLCvlsJcZ13AKTs6oNd630wfT42mQZe0G1RU HTtQNDBXMAnWCpjmW/3zuytUsZLKYULqJDB6h6QBB+xoaOAux1vp9MfLaDuk72xxlC4t MRNg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j15si1181810pgf.822.2018.03.20.06.33.46; Tue, 20 Mar 2018 06:34:01 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753583AbeCTNcc convert rfc822-to-8bit (ORCPT + 99 others); Tue, 20 Mar 2018 09:32:32 -0400 Received: from smtp-out4.electric.net ([192.162.216.195]:65048 "EHLO smtp-out4.electric.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753354AbeCTNaR (ORCPT ); Tue, 20 Mar 2018 09:30:17 -0400 Received: from 1eyHKy-000B5j-VX by out4d.electric.net with emc1-ok (Exim 4.90_1) (envelope-from ) id 1eyHL1-000BV5-Vy; Tue, 20 Mar 2018 06:30:03 -0700 Received: by emcmailer; Tue, 20 Mar 2018 06:30:03 -0700 Received: from [156.67.243.126] (helo=AcuMS.aculab.com) by out4d.electric.net with esmtps (TLSv1.2:ECDHE-RSA-AES256-SHA384:256) (Exim 4.90_1) (envelope-from ) id 1eyHKy-000B5j-VX; Tue, 20 Mar 2018 06:30:00 -0700 Received: from AcuMS.Aculab.com (fd9f:af1c:a25b:0:43c:695e:880f:8750) by AcuMS.aculab.com (fd9f:af1c:a25b:0:43c:695e:880f:8750) with Microsoft SMTP Server (TLS) id 15.0.1347.2; Tue, 20 Mar 2018 13:30:59 +0000 Received: from AcuMS.Aculab.com ([fe80::43c:695e:880f:8750]) by AcuMS.aculab.com ([fe80::43c:695e:880f:8750%12]) with mapi id 15.00.1347.000; Tue, 20 Mar 2018 13:30:59 +0000 From: David Laight To: 'Ingo Molnar' , Thomas Gleixner CC: 'Rahul Lakkireddy' , "x86@kernel.org" , "linux-kernel@vger.kernel.org" , "netdev@vger.kernel.org" , "mingo@redhat.com" , "hpa@zytor.com" , "davem@davemloft.net" , "akpm@linux-foundation.org" , "torvalds@linux-foundation.org" , "ganeshgr@chelsio.com" , "nirranjan@chelsio.com" , "indranil@chelsio.com" , "Andy Lutomirski" , Peter Zijlstra , Fenghua Yu , Eric Biggers Subject: RE: [RFC PATCH 0/3] kernel: add support for 256-bit IO access Thread-Topic: [RFC PATCH 0/3] kernel: add support for 256-bit IO access Thread-Index: AQHTv43TjMVMzNQoikSg1VH837bpVaPXnqXggAAJp4CAAAH+gIABSo1ogAApUxA= Date: Tue, 20 Mar 2018 13:30:59 +0000 Message-ID: References: <7f0ddb3678814c7bab180714437795e0@AcuMS.aculab.com> <7f8d811e79284a78a763f4852984eb3f@AcuMS.aculab.com> <20180320082651.jmxvvii2xvmpyr2s@gmail.com> <20180320090802.qw4tqjmhy6yfd6sf@gmail.com> <20180320105427.bm4od7cpessbraag@gmail.com> In-Reply-To: <20180320105427.bm4od7cpessbraag@gmail.com> Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [10.202.205.33] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 X-Outbound-IP: 156.67.243.126 X-Env-From: David.Laight@ACULAB.COM X-Proto: esmtps X-Revdns: X-HELO: AcuMS.aculab.com X-TLS: TLSv1.2:ECDHE-RSA-AES256-SHA384:256 X-Authenticated_ID: X-PolicySMART: 3396946, 3397078 X-Virus-Status: Scanned by VirusSMART (c) X-Virus-Status: Scanned by VirusSMART (s) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Ingo Molnar > Sent: 20 March 2018 10:54 ... > Note that a generic version might still be worth trying out, if and only if it's > safe to access those vector registers directly: modern x86 CPUs will do their > non-constant memcpy()s via the common memcpy_erms() function - which could in > theory be an easy common point to be (cpufeatures-) patched to an AVX2 variant, if > size (and alignment, perhaps) is a multiple of 32 bytes or so. > > Assuming it's correct with arbitrary user-space FPU state and if it results in any > measurable speedups, which might not be the case: ERMS is supposed to be very > fast. > > So even if it's possible (which it might not be), it could end up being slower > than the ERMS version. Last I checked memcpy() was implemented as 'rep movsb' on the latest Intel cpus. Since memcpy_to/fromio() get aliased to memcpy() this generates byte copies. The previous 'fastest' version of memcpy() was ok for uncached locations. For PCIe I suspect that the actual instructions don't make a massive difference. I'm not even sure interleaving two transfers makes any difference. What makes a huge difference for memcpy_fromio() is the size of the register. The time taken for a read will be largely independent of the width of the register used. David