MIME-Version: 1.0
In-Reply-To: <CA+55aFw3P++jGbRhkrM3MyqwZZ3mxmxQTC1BfYS2rhw6P3LKNA@mail.gmail.com>
References: <cover.1480638597.git.luto@kernel.org> <cover.1480536936.git.luto@kernel.org>
 <0a21157c2233ba7d0781bbf07866b3f2d7e7c25d.1480638597.git.luto@kernel.org> <CA+55aFw3P++jGbRhkrM3MyqwZZ3mxmxQTC1BfYS2rhw6P3LKNA@mail.gmail.com>
From: Andy Lutomirski <luto@kernel.org>
Date: Fri, 2 Dec 2016 09:38:38 -0800
Message-ID: <CALCETrXCLLW9pd=h7dZ0u2oBSiRBJOBkxvEq2iz=Z8aTVtrZ=w@mail.gmail.com>
Subject: Re: [PATCH v2 5/6] x86/xen: Add a Xen-specific sync_core() implementation
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>, Peter Anvin <hpa@zytor.com>,
        "the arch/x86 maintainers" <x86@kernel.org>,
        One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>,
        Borislav Petkov <bp@alien8.de>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        Brian Gerst <brgerst@gmail.com>,
        Matthew Whitehead <tedheadster@gmail.com>,
        Henrique de Moraes Holschuh <hmh@hmh.eng.br>,
        Peter Zijlstra <peterz@infradead.org>,
        Andrew Cooper <andrew.cooper3@citrix.com>
Content-Type: text/plain; charset=UTF-8
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2062
Lines: 46

On Fri, Dec 2, 2016 at 9:32 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Thu, Dec 1, 2016 at 4:35 PM, Andy Lutomirski <luto@kernel.org> wrote:
>>
>> On my laptop, CPUID(eax=1, ecx=0) is ~83ns and IRET-to-self is
>> ~110ns.  But Xen PV will trap CPUID if possible, so IRET-to-self
>> should end up being a nice speedup.
>
> So if we care deeply about the performance of this, we should really
> ask ourselves how much we need this...
>
> There are *very* few places where we really need to do a full
> serializing instruction, and I'd worry that we really don't need it in
> many of the places we do this.
>
> The only real case I'm aware of is modifying code that is modified
> through a different linear address than it's executed.

TBH, I didn't start down this path for performance.  I did it because
I wanted to kill off a CPUID that was breaking on old CPUs that don't
have CPUID.  So I propose MOV-to-CR2 followed by an unconditional
jump.  My goal here is to make the #*!& thing work reliably and not be
ludicrously slow.  Borislav and I mulled over using an alternative to
use CPUID if and only if we have CPUID, but that doesn't work because
we call sync_core() before we're done applying alternatives.

>
> Is there anything else where we _really_ need this sync-core thing?
> Sure, the microcode loader looks fine, but that doesn't look
> particularly performance-critical either.
>
> So I'd like to know which sync_core is actually so
> performance-critical that w e care about it, and then I'd like to
> understand why it's needed at all, because I suspect a number of them
> has been added with the model of "sprinkle random things around and
> hope".

apply_alternatives, unfortunately.  It's performance-critical because
it's intensely stupid and does sync_core() for every single patch.
Fixing that would be nice, too.

> Adding Peter Anvin to the participants list, because iirc he was the
> one who really talked to hardwre engineers about the synchronization
> issues with serializing kernel code.
>
>                 Linus