Received: by 2002:a05:7412:8521:b0:e2:908c:2ebd with SMTP id t33csp1058882rdf; Sat, 4 Nov 2023 05:13:28 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFHJIE1bO7pc/NxRNGXi6s5YZcgOi7S2pt+PZ6C3+CE7NVu5Gza1LGt14LCXEJnxyDThP1y X-Received: by 2002:a05:6a20:7f8a:b0:183:e7ba:8a8a with SMTP id d10-20020a056a207f8a00b00183e7ba8a8amr165781pzj.30.1699100007959; Sat, 04 Nov 2023 05:13:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1699100007; cv=none; d=google.com; s=arc-20160816; b=cLBCxZxlyQaXEJGMEeGcZk8GFqdicpJFnxotXsK7f4pSPMaXkJ4JHNvEoOQqRnf8Af ZEZNqCyqJBZ8XV5TMMzUU1NY6hkgqEKxEdmRm4wJlaguzcIfDpvRfg9XXIQPtuZENjTR Cj9UbSOu0UYxAFLyKvP6vpO7W65Riju2Q0uSUy817+0LlJfucU778UvY1diCzTzdyQzf n69ZychYw/6LwXp5j0JpM+EYDKha8JJiXU1mMQcp+wE6NGS0+jXDalldHEnjz9OUEuoC LBdQ/9aEjcPWF2IquPo1asqBAxsDQXwNi3yUNio3JaR73oqLm4srnvts0h+Yyu9TSUqb O+qQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:subject:cc:to:from:message-id :date:dkim-signature; bh=CWkFgJWIOKA45hfRnds3UgKb0Jd9ISfxYjBzCZ1EosM=; fh=Zqa9pQKdV1ogoxefhu2MEbaWEHbvjsGDNzwWyUJeGq4=; b=xrJxYDXDwCBBX7z8/IhkNt1VvpCwDtWe87EuX7C6bvEyCu+ZQx+VLE0HUhar/air+9 iaxY0dxPuPBEipDfwgH/VHA4IE78k+nqA/jEEJDXq/glZMHwaBBW/90hHao+sS6pJIoy DR8B+10EykC4LxsfyaBkqtq18c4fIy64Q+bMuTsr84oLljlved1NDDCPfPU8akQkSrvS nVDKQMNVC9+M857QHCeDIN7s3AesApklshS17NB1ZYWnCZcNo3ChfNLC6qYBkza/Y5Q0 lH13SXMtvVGkEPBqK0K+bBlk4idbEvtArwMm9ZvpNhleZOBGhymjV/gc8GasdFnoMn7E SVng== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=jXeaRUe4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from snail.vger.email (snail.vger.email. [23.128.96.37]) by mx.google.com with ESMTPS id f23-20020a17090ace1700b002774985e8b7si3834053pju.168.2023.11.04.05.13.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 04 Nov 2023 05:13:27 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) client-ip=23.128.96.37; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=jXeaRUe4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id D28408040EA3; Sat, 4 Nov 2023 05:13:26 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229505AbjKDMN0 (ORCPT + 99 others); Sat, 4 Nov 2023 08:13:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49868 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229456AbjKDMNZ (ORCPT ); Sat, 4 Nov 2023 08:13:25 -0400 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6A9A69D for ; Sat, 4 Nov 2023 05:13:22 -0700 (PDT) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 86801C433C7; Sat, 4 Nov 2023 12:13:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1699100001; bh=6oVaN3BzvpsPu9dUUFCNp1qKEANCS1fIs6Vl23jVrqE=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=jXeaRUe47YAHUXjzev5/Zhb1wcm5zk3IxuQvzkk9+6ZezK54J3j2VvBQ6C9SmTqaj CIYfYlEjgXBC5kacT18ermBNHZh97I0W/A1FdRivlSE2cnIy6pDP7Fe1/DSRqXn5EU Y/RAuUCX43at9eWn+Z7IOGRrfbT4BdG8SINXlAqfgThGOqVhT+0OAbQdg0kP9GIwBv WMxz46XZVS1lqtWrHVDwByzk55XU0OaCSgYfMkt5r0zPKVFlIQYRjUXya4BT0wxElr 0QyBsC1cdHTJ8qONa0cXpjcsF8maRLgBIvSO1NKCLQnA9G4p7IrYpa+72+ffIfSA5m QcEX37QeGufKA== Received: from sofa.misterjones.org ([185.219.108.64] helo=goblin-girl.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1qzFWg-00AIQB-O1; Sat, 04 Nov 2023 12:13:18 +0000 Date: Sat, 04 Nov 2023 12:13:18 +0000 Message-ID: <86il6h1ztt.wl-maz@kernel.org> From: Marc Zyngier To: Jan Henrik Weinstock Cc: oliver.upton@linux.dev, james.morse@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com, catalin.marinas@arm.com, will@kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org, Lukas =?UTF-8?B?SsO8bmdlcg==?= Subject: Re: KVM exit to userspace on WFI In-Reply-To: References: <87ttql5aq7.wl-maz@kernel.org> <86cyx250w9.wl-maz@kernel.org> <86msw01e4m.wl-maz@kernel.org> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/29.1 (aarch64-unknown-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: jan@mwa.re, oliver.upton@linux.dev, james.morse@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com, catalin.marinas@arm.com, will@kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org, lukas@mwa.re X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false X-Spam-Status: No, score=-5.0 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Sat, 04 Nov 2023 05:13:27 -0700 (PDT) On Tue, 31 Oct 2023 19:21:16 +0000, Jan Henrik Weinstock wrote: >=20 > Am Mo., 30. Okt. 2023 um 13:36 Uhr schrieb Marc Zyngier : > > > > [please make an effort not to top-post] > > > > On Fri, 27 Oct 2023 18:41:44 +0100, > > Jan Henrik Weinstock wrote: > > > > > > Hi Marc, > > > > > > the basic idea behind this is to have a (single-threaded) execution l= oop, > > > something like this: > > > > > > vcpu-thread: vcpu-run | process-io-devices | vcpu-run | process-io= ... > > > ^ > > > WFX or timeout > > > > > > We switch to simulating IO devices whenever the vcpu is idle (wfi) or= exceeds > > > a certain budget of instructions (counted via pmu). Our fallback curr= ently is > > > to kick the vcpu out of its execution using a signal (via a timeout/a= larm). But > > > of course, if the cpu is stuck at a wfi, we are wasting a lot of time. > > > > > > I understand that the proposed behavior is not desirable for most use= cases, > > > which is why I suggest locking it behind a flag, e.g. > > > KVM_ARCH_FLAG_WFX_EXIT_TO_USER. > > > > But how do you reconcile the fact that exposing this to userspace > > breaks fundamental expectations that the guest has, such as getting > > its timer interrupts and directly injected LPIs? Implementing WFI in > > userspace breaks it. What about the case where we don't trap WFx and > > let the *guest* wait for an interrupt? >=20 > Timer interrupts etc. will be injected into the vcpu during the > io-phases. When there are no interrupts present and the guest performs > a WFI, we can just skip forward to the next timer event. Skip forward? What does that mean? Compress time and move along? >=20 > > Honestly, what you are describing seems to be a use model that doesn't > > fit KVM, which is a general purpose hypervisor, but more a simulation > > environment. Yes, the primitives are the same, but the plumbing is > > wildly different. >=20 > Agreed. >=20 > > *If* that's the stuff you're looking at, then I'm afraid you'll have > > to do it in different way, because what you are suggesting is > > fundamentally incompatible with the guarantees that KVM gives to guest > > and userspace. Because your KVM_ARCH_FLAG_WFX_EXIT_TO_USER is really a > > lie. It should really be named something more along the lines of > > KVM_ARCH_FLAG_WFX_EXIT_TO_USER_SOMETIME_AND_I_DONT_EVEN_KNOW_WHEN > > (probably with additional clauses related to breaking things). >=20 > I have attached a reworked version of the patch as a reference (based > on my 5.15 kernel). It puts the modified behavior behind a new > capability so as to not interfere with the current expectations > towards handling WFI/WFE. > I think it should now trap all blocking calls to WFx on the vcpu and > reliably return to the userspace. If I have missed something that > would cause the vcpu to not trap on a WFI kindly let me know. Oh FFS. Please read my previous emails, the architecture spec, and understand that WFx is a *hint*. Given your line of work, I would hope you understand the implications of this. >=20 > > Overall, you are still asking for something that is not guaranteed at > > the architecture level, even less in KVM, and I'm not going to add > > support for something that can only work "sometime". >=20 > I am not quite sure what you mean with "sometime". Are you referring > to WFIs as NOPs? Or WFIs that do not yield because of pending > interrupts? NOP is a valid implementation of WFx. WFx doesn't have to trap. Its only requirements are not to lose state. Nothing else. Trapping is a 'quality of implementation' feature, and doesn't affect correctness. And yes, there are machines out there that will absolutely ignore any request for trapping. =46rom the architecture spec (ARM DDI 0487J.a, D19.2.48, TWI): Since a WFI can complete at any time, even without a Wakeup event, the traps on WFI are not guaranteed to be taken, even if the WFI is executed when there is no Wakeup event. The only guarantee is that if the instruction does not complete in finite time in the absence of a Wakeup event, the trap will be taken. Similar verbiage exists for WFE. Do you now see why your proposal makes little sense? >=20 > The point of my patch is not to accurately count every single WFI. The > point is to prevent the host cpu from sleeping just because my vcpu > executed a WFI somewhere in the guest software. If a WFI is executed > by the guest and that does not result in my vcpu thread to block (in > other words: the vcpu continues executing instructions beyond the WFI) > then it also should not exit to userspace. So instead of > "KVM_ARCH_FLAG_WFX_EXIT_TO_USER_SOMETIME_AND_I_DONT_EVEN_KNOW_WHEN" it > is really "KVM_ARCH_FLAG_WFX_EXIT_TO_USER_WHENEVER_YOU_WOULD_OTHERWISE_YI= ELD_AND_I_CANNOT_GET_MY_THREAD_BACK". You already must be able to handle a guest spinning in a loop without a WFI. So why would WFI be of interest more than anything else? You can always make an interrupt pending at any point, without having to wait for WFI to occur. Just make the interrupt pending (which, if you emulate everything in userspace, is just giving the vcpu thread a signal). My hunch is that your SW is trying to do the interrupt injection from the vcpu thread, which is a pretty broken model (it would badly model the concept of an interrupt being an asynchronous event). Honestly, if there was one thing I would add to the kernel, it would be an option to *prevent* any trap of WFx, because that at least is something we can universally enforce and guarantee to userspace. Anything else is only wishful thinking. M. --=20 Without deviation from the norm, progress is not possible.