Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752751AbdCFHC4 (ORCPT ); Mon, 6 Mar 2017 02:02:56 -0500 Received: from ale.deltatee.com ([207.54.116.67]:44698 "EHLO ale.deltatee.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752558AbdCFHBY (ORCPT ); Mon, 6 Mar 2017 02:01:24 -0500 To: Borislav Petkov , Linus Torvalds References: <20170304224341.zfp4fl37ypt57amg@pd.tnic> <5CCEF10D-5647-4503-A398-0681DF2C8847@zytor.com> <20170305001447.kcxignj3nsq35vci@pd.tnic> <20170305003349.6kgq4ovj7ipezfxu@pd.tnic> <20170305095059.l4od2yjqm5yxx6ln@pd.tnic> <20170305195432.6occvwaujq3l4ejl@pd.tnic> Cc: Peter Anvin , Thomas Gleixner , Ingo Molnar , Tony Luck , Al Viro , the arch/x86 maintainers , Linux Kernel Mailing List From: Logan Gunthorpe Message-ID: <5be40886-b468-d828-f948-2ad99b95a230@deltatee.com> Date: Mon, 6 Mar 2017 00:01:10 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Icedove/45.6.0 MIME-Version: 1.0 In-Reply-To: <20170305195432.6occvwaujq3l4ejl@pd.tnic> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-SA-Exim-Connect-IP: 50.66.97.235 X-SA-Exim-Rcpt-To: linux-kernel@vger.kernel.org, x86@kernel.org, viro@zeniv.linux.org.uk, tony.luck@intel.com, mingo@redhat.com, tglx@linutronix.de, hpa@zytor.com, torvalds@linux-foundation.org, bp@suse.de X-SA-Exim-Mail-From: logang@deltatee.com Subject: Re: Question Regarding ERMS memcpy X-SA-Exim-Version: 4.2.1 (built Mon, 26 Dec 2011 16:24:06 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1272 Lines: 31 On Sun, Mar 05, 2017 at 11:19:42AM -0800, Linus Torvalds wrote: >> But it is *not* the right thing to use on IO memory, because the CPU >> only does the magic cacheline access optimizations on cacheable >> memory! Yes, and actually this is where I started. I thought my memcpy was using byte accesses on purpose and I needed to create a patch for a different IO memcpy because obviously byte accesses over the PCI bus would be very un-ideal. However, when I found my system wasn't intentionally using that implementation that was no longer my focus. So, I have no way to test this, but it sounds like any Ivy bridge system using the ERMS version of memcpy could have the same slow PCI memcpy performance I've been seeing (unless the microcode fixes it up?). So it sounds like it would be a good idea to revert the change Linus is talking about. >> So I think we should re-introduce that old "__inline_memcpy()" as that >> special "safe memcpy" thing. Not just for KMEMCHECK, and not just for >> 64-bit. On 05/03/17 12:54 PM, Borislav Petkov wrote: > Logan, wanna give that a try, see if it takes care of your issue? Well honestly my issue was solved by fixing my kernel config. I have no idea why I had optimize for size in there in the first place. Thanks, Logan