Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752923AbdHKMWb (ORCPT ); Fri, 11 Aug 2017 08:22:31 -0400 Received: from mx2.suse.de ([195.135.220.15]:48420 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1752196AbdHKMW3 (ORCPT ); Fri, 11 Aug 2017 08:22:29 -0400 Subject: Re: [tip:x86/platform] x86/hyper-v: Use hypercall for remote TLB flush To: Peter Zijlstra , Vitaly Kuznetsov Cc: Jork Loeser , KY Srinivasan , Simon Xiao , Haiyang Zhang , Stephen Hemminger , "torvalds@linux-foundation.org" , "luto@kernel.org" , "hpa@zytor.com" , "linux-kernel@vger.kernel.org" , "rostedt@goodmis.org" , "andy.shevchenko@gmail.com" , "tglx@linutronix.de" , "mingo@kernel.org" , "linux-tip-commits@vger.kernel.org" , boris.ostrovsky@oracle.com, xen-devel@lists.xenproject.org References: <20170802160921.21791-8-vkuznets@redhat.com> <20170810185646.GI6524@worktop.programming.kicks-ass.net> <20170810192742.GJ6524@worktop.programming.kicks-ass.net> <87lgmqqwzl.fsf@vitty.brq.redhat.com> <20170811105625.hmdfnp3yh72zut33@hirez.programming.kicks-ass.net> From: Juergen Gross Message-ID: Date: Fri, 11 Aug 2017 14:22:25 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: <20170811105625.hmdfnp3yh72zut33@hirez.programming.kicks-ass.net> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2803 Lines: 66 On 11/08/17 12:56, Peter Zijlstra wrote: > On Fri, Aug 11, 2017 at 11:23:10AM +0200, Vitaly Kuznetsov wrote: >> Peter Zijlstra writes: >> >>> On Thu, Aug 10, 2017 at 07:08:22PM +0000, Jork Loeser wrote: >>> >>>>>> Subject: Re: [tip:x86/platform] x86/hyper-v: Use hypercall for remote TLB flush >>>> >>>>>> Hold on.. if we don't IPI for TLB invalidation. What serializes our >>>>>> software page table walkers like fast_gup() ? >>>>> >>>>> Hypervisor may implement this functionality via an IPI. >>>>> >>>>> K. Y >>>> >>>> HvFlushVirtualAddressList() states: >>>> This call guarantees that by the time control returns back to the >>>> caller, the observable effects of all flushes on the specified virtual >>>> processors have occurred. >>>> >>>> HvFlushVirtualAddressListEx() refers to HvFlushVirtualAddressList() as adding sparse target VP lists. >>>> >>>> Is this enough of a guarantee, or do you see other races? >>> >>> That's nowhere near enough. We need the remote CPU to have completed any >>> guest IF section that was in progress at the time of the call. >>> >>> So if a host IPI can interrupt a guest while the guest has IF cleared, >>> and we then process the host IPI -- clear the TLBs -- before resuming the >>> guest, which still has IF cleared, we've got a problem. >>> >>> Because at that point, our software page-table walker, that relies on IF >>> being clear to guarantee the page-tables exist, because it holds off the >>> TLB invalidate and thereby the freeing of the pages, gets its pages >>> ripped out from under it. >> >> Oh, I see your concern. Hyper-V, however, is not the first x86 >> hypervisor trying to avoid IPIs on remote TLB flush, Xen does this >> too. Briefly looking at xen_flush_tlb_others() I don't see anything >> special, do we know how serialization is achieved there? > > No idea on how Xen works, I always just hope it goes away :-) But lets > ask some Xen folks. Wait - the TLB can be cleared at any time, as Andrew was pointing out. No cpu can rely on an address being accessible just because IF is being cleared. All that matters is the existing and valid page table entry. So clearing IF on a cpu isn't meant to secure the TLB from being cleared, but just to avoid interrupts (as the name of the flag is suggesting). In the Xen case the hypervisor does the following: - it checks whether any of the vcpus specified in the cpumask of the flush request is running on any physical cpu - if any running vcpu is found an IPI will be sent to the physical cpu and the hypervisor will do the TLB flush there - any vcpu addressed by the flush and not running will be flagged to flush its TLB when being scheduled the next time This ensures no TLB entry to be flushed can be used after return of xen_flush_tlb_others(). Juergen