Received: by 2002:a25:f815:0:0:0:0:0 with SMTP id u21csp1012188ybd; Wed, 26 Jun 2019 09:38:18 -0700 (PDT) X-Google-Smtp-Source: APXvYqz98sEEkZIODnTfZaWEb8wqpjRRoe/CTeDr53qqF02Uih6z78pQdOAcbJSp6abewD9beSuQ X-Received: by 2002:a63:1657:: with SMTP id 23mr3619818pgw.98.1561567097997; Wed, 26 Jun 2019 09:38:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1561567097; cv=none; d=google.com; s=arc-20160816; b=dZvzHfOF5zaLp/8TjtD0I0/8SNPNJBb2cLuaS4esqZmCeespG1RRj177D6iSwG3FKk /jIkkdKO8Quq7UQsl3oaRuWRoNYQwcXI6lB3MsYZ8bvdxDKtz3B58Ps1GUbugRA9U4l8 CeDQprnP7yy80zL16vxpHThwZVFOjtXR9Jrn6e3r78aNK7ekV9ZkDVj6vf+TbwTuxkrr ve9/J+/y94a9rz3PnkcasbViP56cfmy4M3x9uAvpsj8m7xNnLitBUN+rCEgG+yV4fG9N LCXUMZpX0Zs6PiuNLHzgRsROKjH9EIeZ5+/fwcz8IC4P6LzX1nvlMPMza0MJlnPJzZH+ oabw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=kSY8bosHzeOCqbpmBsQM2u8jx+EfCeuZ0XPAMbQL9q4=; b=mV4ZosOqSiTNYHixB0E68Kbhrnjk+BB7IYT3fSOqTEd3xS4GBh8CkjVziDJvnSScQi xWkHjhX/6MWIkVKANT0cKa1uF0++BWcrZN9rsIGRFhvthhZ3w7lABYYwaIlD4r4YB0Ko NFaoi8/eimsf0RminB4oXnZYVDklRNeAdgeGilKf/XYTAXKorgHiGFEBMeLCg4F3tJjL mtwdUxbdPEViynYPYSt4WghuM8XgzdAXROKRpYvhQ49iE+XPzIrYKSc2xdUSbPUzNyGd O9UG7I4szJGyaKM6P/FQoy726igJiVnXmBt/DAuvUoW8SYGEUrBDXsrFMyDlRn2eOcni /V2g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=TUyISNDx; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m10si2521005pjl.77.2019.06.26.09.38.01; Wed, 26 Jun 2019 09:38:17 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=TUyISNDx; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726407AbfFZQhd (ORCPT + 99 others); Wed, 26 Jun 2019 12:37:33 -0400 Received: from mail.kernel.org ([198.145.29.99]:34488 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726227AbfFZQhd (ORCPT ); Wed, 26 Jun 2019 12:37:33 -0400 Received: from mail-wm1-f48.google.com (mail-wm1-f48.google.com [209.85.128.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 5374F217F5 for ; Wed, 26 Jun 2019 16:37:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1561567052; bh=Bwoi/52zO9ILP6ACaFflaMC9eRq2pCusKuY50s//+DY=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=TUyISNDxdFqIEgj3On1gDi9E1NP3HQUJ5IyOaYpw+o0hjYu8F0CAfEa4YKLYjVGRP 4jDZEyFOiVZvdZ+V/2oAS2z+/t/pU/Wh0n0NNHmhd4+f9xF98meomjXsPt+aLxb4HE 55pOOm2B0L7f0xQg6cjPdhqwBAW2pYEEkgzhCeXA= Received: by mail-wm1-f48.google.com with SMTP id v19so2765311wmj.5 for ; Wed, 26 Jun 2019 09:37:32 -0700 (PDT) X-Gm-Message-State: APjAAAUUhWFAXrL2RUOLSN/yawk6DhF4xXezluI20qoAbPdwLf95SZxj thuyL3VnuLzLbBD++S3UqVvU2kMkTZZqYbBMLc8NHw== X-Received: by 2002:a1c:1a56:: with SMTP id a83mr3567922wma.161.1561567050755; Wed, 26 Jun 2019 09:37:30 -0700 (PDT) MIME-Version: 1.0 References: <20190613064813.8102-1-namit@vmware.com> <20190613064813.8102-7-namit@vmware.com> <401C4384-98A1-4C27-8F71-4848F4B4A440@vmware.com> <35755C67-E8EB-48C3-8343-BB9ABEB4E32C@vmware.com> In-Reply-To: From: Andy Lutomirski Date: Wed, 26 Jun 2019 09:37:19 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH 6/9] KVM: x86: Provide paravirtualized flush_tlb_multi() To: Nadav Amit Cc: Andy Lutomirski , Dave Hansen , Peter Zijlstra , LKML , Ingo Molnar , Borislav Petkov , "the arch/x86 maintainers" , Thomas Gleixner , Dave Hansen , Paolo Bonzini , "kvm@vger.kernel.org" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jun 25, 2019 at 11:30 PM Nadav Amit wrote: > > > On Jun 25, 2019, at 8:56 PM, Andy Lutomirski wrote: > > > > On Tue, Jun 25, 2019 at 8:41 PM Nadav Amit wrote: > >>> On Jun 25, 2019, at 8:35 PM, Andy Lutomirski wrote: > >>> > >>> On Tue, Jun 25, 2019 at 7:39 PM Nadav Amit wrote: > >>>>> On Jun 25, 2019, at 2:40 PM, Dave Hansen wr= ote: > >>>>> > >>>>> On 6/12/19 11:48 PM, Nadav Amit wrote: > >>>>>> Support the new interface of flush_tlb_multi, which also flushes t= he > >>>>>> local CPU's TLB, instead of flush_tlb_others that does not. This > >>>>>> interface is more performant since it parallelize remote and local= TLB > >>>>>> flushes. > >>>>>> > >>>>>> The actual implementation of flush_tlb_multi() is almost identical= to > >>>>>> that of flush_tlb_others(). > >>>>> > >>>>> This confused me a bit. I thought we didn't support paravirtualize= d > >>>>> flush_tlb_multi() from reading earlier in the series. > >>>>> > >>>>> But, it seems like that might be Xen-only and doesn't apply to KVM = and > >>>>> paravirtualized KVM has no problem supporting flush_tlb_multi(). I= s > >>>>> that right? It might be good to include some of that background in= the > >>>>> changelog to set the context. > >>>> > >>>> I=E2=80=99ll try to improve the change-logs a bit. There is no inher= ent reason for > >>>> PV TLB-flushers not to implement their own flush_tlb_multi(). It is = left > >>>> for future work, and here are some reasons: > >>>> > >>>> 1. Hyper-V/Xen TLB-flushing code is not very simple > >>>> 2. I don=E2=80=99t have a proper setup > >>>> 3. I am lazy > >>> > >>> In the long run, I think that we're going to want a way for one CPU t= o > >>> do a remote flush and then, with appropriate locking, update the > >>> tlb_gen fields for the remote CPU. Getting this right may be a bit > >>> nontrivial. > >> > >> What do you mean by =E2=80=9Cdo a remote flush=E2=80=9D? > > > > I mean a PV-assisted flush on a CPU other than the CPU that started > > it. If you look at flush_tlb_func_common(), it's doing some work that > > is rather fancier than just flushing the TLB. By replacing it with > > just a pure flush on Xen or Hyper-V, we're losing the potential CR3 > > switch and this bit: > > > > /* Both paths above update our state to mm_tlb_gen. */ > > this_cpu_write(cpu_tlbstate.ctxs[loaded_mm_asid].tlb_gen, mm_tlb= _gen); > > > > Skipping the former can hurt idle performance, although we should > > consider just disabling all the lazy optimizations on systems with PV > > flush. (And I've asked Intel to help us out here in future hardware. > > I have no idea what the result of asking will be.) Skipping the > > cpu_tlbstate write means that we will do unnecessary flushes in the > > future, and that's not doing us any favors. > > > > In principle, we should be able to do something like: > > > > flush_tlb_multi(...); > > for(each CPU that got flushed) { > > spin_lock(something appropriate?); > > per_cpu_write(cpu, cpu_tlbstate.ctxs[loaded_mm_asid].tlb_gen, f->new_t= lb_gen); > > spin_unlock(...); > > } > > > > with the caveat that it's more complicated than this if the flush is a > > partial flush, and that we'll want to check that the ctx_id still > > matches, etc. > > > > Does this make sense? > > Thanks for the detailed explanation. Let me check that I got it right. > > You want to optimize cases in which: > > 1. A virtual machine Yes. > > 2. Which issues mtultiple (remote) TLB shootdowns Yes. Or just one followed by a context switch. Right now it's suboptimal with just two vCPUs and a single remote flush. If CPU 0 does a remote PV flush of CPU1 and then CPU1 context switches away from the running mm and back, it will do an unnecessary flush on the way back because the tlb_gen won't match. > > 2. To remote vCPU which is preempted by the hypervisor Yes, or even one that isn't preempted. > > 4. And unlike KVM, the hypervisor does not provide facilities for the VM = to > know which vCPU is preempted, and atomically request TLB flush when the v= CPU > is scheduled. > I'm not sure this makes much difference to the case I'm thinking of. All this being said, do we currently have any system that supports PCID *and* remote flushes? I guess KVM has some mechanism, but I'm not that familiar with its exact capabilities. If I remember right, Hyper-V doesn't expose PCID yet. > Right? >