Received: by 2002:a25:6193:0:0:0:0:0 with SMTP id v141csp871223ybb; Wed, 8 Apr 2020 11:30:48 -0700 (PDT) X-Google-Smtp-Source: APiQypIuCM8ti3ogpe4tdU1PhetR2ieAhvq3HXDFZjLm2vEWjpwB3/gnndD2OlOzhkR4DBsdU8BZ X-Received: by 2002:aca:4082:: with SMTP id n124mr3381208oia.112.1586370648521; Wed, 08 Apr 2020 11:30:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1586370648; cv=none; d=google.com; s=arc-20160816; b=A+bu0p1t429ZB6RuyKwJ69GzoCQbVCo5murEcb3RKWO63IhjBn71zlHQN65iijh3bD rs9IwrOidaFZWZQbiU1ajiu2t2KJAvYC99osiiucp5ZJ6Rsr1cUGrNsHt5qEg+9Nvook me4Ne0vPqGg/QA6xFTVYipNtq25rC590Pd9ecWPNcKdxt1paJok+gN+Z631sw05cEm+G wU4cAuC8YnPB2st9COjS0BnRUm2cCwW9oFEsKSpqePOft6Z3hwhkpVYDqucdlKkRKcHB RjjXKbyPNiAPLzMZIGsgPMtRQbJ/Gxnh2uMBubX+HR9FpE9KzUO7GVjnVahnx+uQkgfr znSA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:message-id:date:references :in-reply-to:subject:cc:to:from; bh=Ch7Fux4mC9+T3BJOVccBnCVLRW4C1dddZJ2dlrGsZ7o=; b=rAljg8m+32xYnXBbrCx9j2ioIUJ7t54nAjyzTgP00yxg7juNgaVXaqU+gLnauETbcO Au3WfpZyAaifZkKx0/kO/JyOde86+X71hb6KkODzp+zmYVRc4/W1TY/np0MWwbdIcn8L lApI4dyml8A8/GKLxyHRJD20+aK4DZRlOYP/4Xdbxthqz7hTDD7goZH6NmFnyVqYC3bV KRAfjGwLihYu1IlmMQPEEKagFTyumYztNnCPWdduE6miTgnlyVLfMcuYuDaoIYbB7LEO A+RrN3LadtrRrt3oSLaLXv1Exl4HSpjrQqOO+oFblww5GCrLh+bq93T00ZaRTW1G50Lk ue6Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w28si3067881oth.317.2020.04.08.11.30.35; Wed, 08 Apr 2020 11:30:48 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730351AbgDHQlw (ORCPT + 99 others); Wed, 8 Apr 2020 12:41:52 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:50183 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728209AbgDHQlw (ORCPT ); Wed, 8 Apr 2020 12:41:52 -0400 Received: from p5de0bf0b.dip0.t-ipconnect.de ([93.224.191.11] helo=nanos.tec.linutronix.de) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1jMDlp-0001ln-OL; Wed, 08 Apr 2020 18:41:45 +0200 Received: by nanos.tec.linutronix.de (Postfix, from userid 1000) id E278510069D; Wed, 8 Apr 2020 18:41:44 +0200 (CEST) From: Thomas Gleixner To: Peter Zijlstra Cc: Paolo Bonzini , Andy Lutomirski , Vivek Goyal , Andy Lutomirski , LKML , X86 ML , kvm list , stable Subject: Re: [PATCH v2] x86/kvm: Disable KVM_ASYNC_PF_SEND_ALWAYS In-Reply-To: <20200408153824.GO20730@hirez.programming.kicks-ass.net> References: <20200407172140.GB64635@redhat.com> <772A564B-3268-49F4-9AEA-CDA648F6131F@amacapital.net> <87eeszjbe6.fsf@nanos.tec.linutronix.de> <874ktukhku.fsf@nanos.tec.linutronix.de> <274f3d14-08ac-e5cc-0b23-e6e0274796c8@redhat.com> <87pncib06x.fsf@nanos.tec.linutronix.de> <20200408153824.GO20730@hirez.programming.kicks-ass.net> Date: Wed, 08 Apr 2020 18:41:44 +0200 Message-ID: <87h7xuaq0n.fsf@nanos.tec.linutronix.de> MIME-Version: 1.0 Content-Type: text/plain X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Peter Zijlstra writes: > On Wed, Apr 08, 2020 at 03:01:58PM +0200, Thomas Gleixner wrote: >> And it comes with restrictions: >> >> The Do Other Stuff event can only be delivered when guest IF=1. >> >> If guest IF=0 then the host has to suspend the guest until the >> situation is resolved. >> >> The 'Situation resolved' event must also wait for a guest IF=1 slot. > > Moo, can we pretty please already kill that ALWAYS and IF nonsense? That > results in that terrifyingly crap HLT loop. That needs to die with > extreme prejudice. > > So the host only inject these OMFG_DOS things when the guest is in > luserspace -- which it can see in the VMCS state IIRC. And then using > #VE for the make-it-go signal is far preferred over the currentl #PF > abuse. Yes, but this requires software based injection. >> If you just want to solve Viveks problem, then its good enough. I.e. the >> file truncation turns the EPT entries into #VE convertible entries and >> the guest #VE handler can figure it out. This one can be injected >> directly by the hardware, i.e. you don't need a VMEXIT. > > That sounds like something that doesn't actually need the whole > 'async'/do-something-else-for-a-while crap, right? It's a #PF trap from > kernel space where we need to report fail. Fail or fixup via extable. >> If you want the opportunistic do other stuff mechanism, then #VE has >> exactly the same problems as the existing async "PF". It's not magicaly >> making that go away. > > We need to somehow have the guest teach the host how to tell if it can > inject that OMFG_DOS thing or not. Injecting it only to then instantly > exit again is stupid and expensive. Not if the injection is actually done by the hardware. Then the guest handles #VE and tells the host what to do. > Clearly we don't want to expose preempt_count and make that ABI, but is > there some way we can push a snippet of code to the host that instructs > the host how to determine if it can sleep or not? I realize that pushing > actual x86 .text is a giant security problem, so perhaps a snipped of > BPF that the host can verify, which it can run on the guest image ? *SHUDDER* > Make it a hard error (guest cpu dies) to inject the OMFG_DOS signal on a > context that cannot put the task to sleep. With the hardware based #VE and a hypercall which tells the host how to handle the EPT fixup (suspend the vcpu or let it continue and do the completion later) you don't have to play any games on the host. If the guest tells the host the wrong thing, then the guest will have to mop up the pieces. Thanks, tglx