To: Borislav Petkov <bp@suse.de>,
        Linus Torvalds <torvalds@linux-foundation.org>
References: <e3e805ee-dc81-55f3-46e6-e3c7430096c3@deltatee.com>
 <20170304224341.zfp4fl37ypt57amg@pd.tnic>
 <c675a6c8-e87a-ece2-2350-cdd079b1b610@deltatee.com>
 <5CCEF10D-5647-4503-A398-0681DF2C8847@zytor.com>
 <20170305001447.kcxignj3nsq35vci@pd.tnic>
 <BF3D2B38-BC19-4B66-840E-6E9DF9C3E649@zytor.com>
 <20170305003349.6kgq4ovj7ipezfxu@pd.tnic>
 <C2B520E2-A4B8-4A90-BAA0-AAA44185837E@zytor.com>
 <20170305095059.l4od2yjqm5yxx6ln@pd.tnic>
 <CA+55aFykpsWoZeHKV1Bnr=ok9ocBhiSR5gNJepjFXStO1NGuoA@mail.gmail.com>
 <20170305195432.6occvwaujq3l4ejl@pd.tnic>
Cc: Peter Anvin <hpa@zytor.com>, Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Tony Luck <tony.luck@intel.com>,
        Al Viro <viro@zeniv.linux.org.uk>,
        the arch/x86 maintainers <x86@kernel.org>,
        Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
From: Logan Gunthorpe <logang@deltatee.com>
Message-ID: <5be40886-b468-d828-f948-2ad99b95a230@deltatee.com>
Date: Mon, 6 Mar 2017 00:01:10 -0700
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
 Icedove/45.6.0
MIME-Version: 1.0
In-Reply-To: <20170305195432.6occvwaujq3l4ejl@pd.tnic>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Subject: Re: Question Regarding ERMS memcpy
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 1272
Lines: 31


On Sun, Mar 05, 2017 at 11:19:42AM -0800, Linus Torvalds wrote:
>> But it is *not* the right thing to use on IO memory, because the CPU
>> only does the magic cacheline access optimizations on cacheable
>> memory!

Yes, and actually this is where I started. I thought my memcpy was using
byte accesses on purpose and I needed to create a patch for a different
IO memcpy because obviously byte accesses over the PCI bus would be very
un-ideal. However, when I found my system wasn't intentionally using
that implementation that was no longer my focus.

So, I have no way to test this, but it sounds like any Ivy bridge system
using the ERMS version of memcpy could have the same slow PCI memcpy
performance I've been seeing (unless the microcode fixes it up?). So it
sounds like it would be a good idea to revert the change Linus is
talking about.

>> So I think we should re-introduce that old "__inline_memcpy()" as that
>> special "safe memcpy" thing. Not just for KMEMCHECK, and not just for
>> 64-bit.

On 05/03/17 12:54 PM, Borislav Petkov wrote:
> Logan, wanna give that a try, see if it takes care of your issue?

Well honestly my issue was solved by fixing my kernel config. I have no
idea why I had optimize for size in there in the first place.

Thanks,

Logan