Subject: Re: [tip:x86/platform] x86/hyper-v: Use hypercall for remote TLB
 flush
To: Peter Zijlstra <peterz@infradead.org>,
        Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Jork Loeser <Jork.Loeser@microsoft.com>,
        KY Srinivasan <kys@microsoft.com>, Simon Xiao <sixiao@microsoft.com>,
        Haiyang Zhang <haiyangz@microsoft.com>,
        Stephen Hemminger <sthemmin@microsoft.com>,
        "torvalds@linux-foundation.org" <torvalds@linux-foundation.org>,
        "luto@kernel.org" <luto@kernel.org>, "hpa@zytor.com" <hpa@zytor.com>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        "rostedt@goodmis.org" <rostedt@goodmis.org>,
        "andy.shevchenko@gmail.com" <andy.shevchenko@gmail.com>,
        "tglx@linutronix.de" <tglx@linutronix.de>,
        "mingo@kernel.org" <mingo@kernel.org>,
        "linux-tip-commits@vger.kernel.org" 
        <linux-tip-commits@vger.kernel.org>,
        boris.ostrovsky@oracle.com, xen-devel@lists.xenproject.org
References: <20170802160921.21791-8-vkuznets@redhat.com>
 <tip-2ffd9e33ce4af4e8cfa3e17bf493defe8474e2eb@git.kernel.org>
 <20170810185646.GI6524@worktop.programming.kicks-ass.net>
 <DM5PR21MB0476915D204F850F7F7C1475A0880@DM5PR21MB0476.namprd21.prod.outlook.com>
 <CY4PR21MB06313B9D59F8846CDDE443F0F1880@CY4PR21MB0631.namprd21.prod.outlook.com>
 <20170810192742.GJ6524@worktop.programming.kicks-ass.net>
 <87lgmqqwzl.fsf@vitty.brq.redhat.com>
 <20170811105625.hmdfnp3yh72zut33@hirez.programming.kicks-ass.net>
From: Juergen Gross <jgross@suse.com>
Message-ID: <b369bcb0-49e6-2f50-574e-4a66ced9e05d@suse.com>
Date: Fri, 11 Aug 2017 14:22:25 +0200
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
 Thunderbird/52.2.1
MIME-Version: 1.0
In-Reply-To: <20170811105625.hmdfnp3yh72zut33@hirez.programming.kicks-ass.net>
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2803
Lines: 66

On 11/08/17 12:56, Peter Zijlstra wrote:
> On Fri, Aug 11, 2017 at 11:23:10AM +0200, Vitaly Kuznetsov wrote:
>> Peter Zijlstra <peterz@infradead.org> writes:
>>
>>> On Thu, Aug 10, 2017 at 07:08:22PM +0000, Jork Loeser wrote:
>>>
>>>>>> Subject: Re: [tip:x86/platform] x86/hyper-v: Use hypercall for remote TLB flush
>>>>
>>>>>> Hold on.. if we don't IPI for TLB invalidation. What serializes our
>>>>>> software page table walkers like fast_gup() ?
>>>>>
>>>>> Hypervisor may implement this functionality via an IPI.
>>>>>
>>>>> K. Y
>>>>
>>>> HvFlushVirtualAddressList() states:
>>>> This call guarantees that by the time control returns back to the
>>>> caller, the observable effects of all flushes on the specified virtual
>>>> processors have occurred.
>>>>
>>>> HvFlushVirtualAddressListEx() refers to HvFlushVirtualAddressList() as adding sparse target VP lists.
>>>>
>>>> Is this enough of a guarantee, or do you see other races?
>>>
>>> That's nowhere near enough. We need the remote CPU to have completed any
>>> guest IF section that was in progress at the time of the call.
>>>
>>> So if a host IPI can interrupt a guest while the guest has IF cleared,
>>> and we then process the host IPI -- clear the TLBs -- before resuming the
>>> guest, which still has IF cleared, we've got a problem.
>>>
>>> Because at that point, our software page-table walker, that relies on IF
>>> being clear to guarantee the page-tables exist, because it holds off the
>>> TLB invalidate and thereby the freeing of the pages, gets its pages
>>> ripped out from under it.
>>
>> Oh, I see your concern. Hyper-V, however, is not the first x86
>> hypervisor trying to avoid IPIs on remote TLB flush, Xen does this
>> too. Briefly looking at xen_flush_tlb_others() I don't see anything
>> special, do we know how serialization is achieved there?
> 
> No idea on how Xen works, I always just hope it goes away :-) But lets
> ask some Xen folks.

Wait - the TLB can be cleared at any time, as Andrew was pointing out.
No cpu can rely on an address being accessible just because IF is being
cleared. All that matters is the existing and valid page table entry.

So clearing IF on a cpu isn't meant to secure the TLB from being
cleared, but just to avoid interrupts (as the name of the flag is
suggesting).

In the Xen case the hypervisor does the following:

- it checks whether any of the vcpus specified in the cpumask of the
  flush request is running on any physical cpu
- if any running vcpu is found an IPI will be sent to the physical cpu
  and the hypervisor will do the TLB flush there
- any vcpu addressed by the flush and not running will be flagged to
  flush its TLB when being scheduled the next time

This ensures no TLB entry to be flushed can be used after return of
xen_flush_tlb_others().


Juergen