Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754241AbcLBRjn (ORCPT ); Fri, 2 Dec 2016 12:39:43 -0500 Received: from mail.kernel.org ([198.145.29.136]:59406 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751072AbcLBRjl (ORCPT ); Fri, 2 Dec 2016 12:39:41 -0500 MIME-Version: 1.0 In-Reply-To: References: <0a21157c2233ba7d0781bbf07866b3f2d7e7c25d.1480638597.git.luto@kernel.org> From: Andy Lutomirski Date: Fri, 2 Dec 2016 09:38:38 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v2 5/6] x86/xen: Add a Xen-specific sync_core() implementation To: Linus Torvalds Cc: Andy Lutomirski , Peter Anvin , "the arch/x86 maintainers" , One Thousand Gnomes , Borislav Petkov , "linux-kernel@vger.kernel.org" , Brian Gerst , Matthew Whitehead , Henrique de Moraes Holschuh , Peter Zijlstra , Andrew Cooper Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2062 Lines: 46 On Fri, Dec 2, 2016 at 9:32 AM, Linus Torvalds wrote: > On Thu, Dec 1, 2016 at 4:35 PM, Andy Lutomirski wrote: >> >> On my laptop, CPUID(eax=1, ecx=0) is ~83ns and IRET-to-self is >> ~110ns. But Xen PV will trap CPUID if possible, so IRET-to-self >> should end up being a nice speedup. > > So if we care deeply about the performance of this, we should really > ask ourselves how much we need this... > > There are *very* few places where we really need to do a full > serializing instruction, and I'd worry that we really don't need it in > many of the places we do this. > > The only real case I'm aware of is modifying code that is modified > through a different linear address than it's executed. TBH, I didn't start down this path for performance. I did it because I wanted to kill off a CPUID that was breaking on old CPUs that don't have CPUID. So I propose MOV-to-CR2 followed by an unconditional jump. My goal here is to make the #*!& thing work reliably and not be ludicrously slow. Borislav and I mulled over using an alternative to use CPUID if and only if we have CPUID, but that doesn't work because we call sync_core() before we're done applying alternatives. > > Is there anything else where we _really_ need this sync-core thing? > Sure, the microcode loader looks fine, but that doesn't look > particularly performance-critical either. > > So I'd like to know which sync_core is actually so > performance-critical that w e care about it, and then I'd like to > understand why it's needed at all, because I suspect a number of them > has been added with the model of "sprinkle random things around and > hope". apply_alternatives, unfortunately. It's performance-critical because it's intensely stupid and does sync_core() for every single patch. Fixing that would be nice, too. > Adding Peter Anvin to the participants list, because iirc he was the > one who really talked to hardwre engineers about the synchronization > issues with serializing kernel code. > > Linus