MIME-Version: 1.0
In-Reply-To: <20170305095059.l4od2yjqm5yxx6ln@pd.tnic>
References: <e3e805ee-dc81-55f3-46e6-e3c7430096c3@deltatee.com>
 <20170304224341.zfp4fl37ypt57amg@pd.tnic> <c675a6c8-e87a-ece2-2350-cdd079b1b610@deltatee.com>
 <5CCEF10D-5647-4503-A398-0681DF2C8847@zytor.com> <20170305001447.kcxignj3nsq35vci@pd.tnic>
 <BF3D2B38-BC19-4B66-840E-6E9DF9C3E649@zytor.com> <20170305003349.6kgq4ovj7ipezfxu@pd.tnic>
 <C2B520E2-A4B8-4A90-BAA0-AAA44185837E@zytor.com> <20170305095059.l4od2yjqm5yxx6ln@pd.tnic>
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Sun, 5 Mar 2017 11:19:42 -0800
Message-ID: <CA+55aFykpsWoZeHKV1Bnr=ok9ocBhiSR5gNJepjFXStO1NGuoA@mail.gmail.com>
Subject: Re: Question Regarding ERMS memcpy
To: Borislav Petkov <bp@suse.de>
Cc: Peter Anvin <hpa@zytor.com>, Logan Gunthorpe <logang@deltatee.com>,
        Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>,
        Tony Luck <tony.luck@intel.com>, Al Viro <viro@zeniv.linux.org.uk>,
        "the arch/x86 maintainers" <x86@kernel.org>,
        Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Content-Type: text/plain; charset=UTF-8
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 1792
Lines: 55

On Sun, Mar 5, 2017 at 1:50 AM, Borislav Petkov <bp@suse.de> wrote:
>
> gcc can't possibly know on what targets is that kernel going to be
> booted on. So it probably does some universally optimal things, like in
> the dmi_scan_machine() case:
>
>         memcpy_fromio(buf, p, 32);
>
> turns into:
>
>         .loc 3 219 0
>         movl    $8, %ecx        #, tmp79
>         movq    %rax, %rsi      # p, p
>         movq    %rsp, %rdi      #, tmp77
>         rep movsl
>
> Apparently it thinks it is fine to do 8*4-byte MOVS. But why not
> 4*8-byte MOVS?

Actually, the "fromio/toio" code should never use regular memcpy().
There used to be devices that literally broke on 64-bit accesses due
to broken PCI crud.

We seem to have broken this *really* long ago, though. On x86-64 we
used to have a special __inline_memcpy() that copies our historical
32-bit thing, and was used for memcpy_fromio() and memcpy_toio(). That
was then undone by commit 6175ddf06b61 ("x86: Clean up mem*io
functions")

That commit says

   "Iomem has no special significance on x86"

but that's not strictly true. iomem is in the same address space and
uses the same access instructions as regular memory, but iomem _is_
special.

And I think it's a bug that we use "memcpy()" on it. Not because of
any gcc issues, but simply because our own memcpy() optimizations are
not appropriate for iomem.

For example, "rep movsb" really is the right thing to use on normal
memory on modern CPU's.

But it is *not* the right thing to use on IO memory, because the CPU
only does the magic cacheline access optimizations on cacheable
memory!

So I think we should re-introduce that old "__inline_memcpy()" as that
special "safe memcpy" thing. Not just for KMEMCHECK, and not just for
64-bit.

Hmm?

                   Linus