Received: by 2002:a05:7412:3784:b0:e2:908c:2ebd with SMTP id jk4csp1999691rdb; Tue, 3 Oct 2023 07:30:38 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFYyeer6suh8m8Jp/LWuh/t9Ja3DmYDmq12PGT8LDc09GRgLlNPeAZolY49dCk5IGjQXUDT X-Received: by 2002:aa7:88cb:0:b0:68f:f6dd:e78b with SMTP id k11-20020aa788cb000000b0068ff6dde78bmr14320189pff.17.1696343437638; Tue, 03 Oct 2023 07:30:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1696343437; cv=none; d=google.com; s=arc-20160816; b=Ks0MdJfpsh3RU9afMUSkzWdMCxUYXDOg19dpUs9z+Wr8YaA21m+4qnXAri5dEsxhMf czhTn+33cJiUITw451HHKaqRgMY7YJlhftTR2nY4nQYAhOdKdL80w78Q08DmK4UnnbIR BCBXBqEnO65oGmOSjaLuZx5EiY1Z1ylbOBQU2NCeIlgjt4UdrU+x/CcUUqBY76mEY59M sN1TEWkuyti8suhPhZPQ3hEBP+XfvwETxm8RhTr7PaVSVH1jCAKqj6KjEA/1OP6hURRw lbBV0NU+bXp+e65RfxY7K50sdakkAa564QNyVI87G1Fpj49995Cti1NSlamCjIGISdiP YeVA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=sO9gnFw16TA2Ruq4xJKD6AG5QNpd5isUt37zXX/2vE0=; fh=xmUYb7S8uhOCPnujCdImwwd870DRjdI9TMX8DwBIfus=; b=ibgtI6dtB6bdT8M8PDJk1V17U18anWhbN5BVSLcteJRbhpflO6w4wNCFF8jm7cLljr fAAq719LngA0+8/AgvBXYCXqRNBthpgv2skumP5YAnUkoCdQWoxKhe9T8eKinGqbgDtS e4s+r+zmmLCQY5dEJe/ynpPjmsTZzvi3FqZeioTKnWJ8f/x2rQfeEC/nu7ALbooa5tJd HRN8hIijA+J6bGy9brv79gK8k8le9tiv0ZjowwTsvKZmy2E0mRqKhnpf+hwgQbKk0ENb O3PECNYz1Ynzar61Gad7a9w3ExSnE2EBcdrCIbbe0rrRuI1oM5Ed9mbuEoWsw0e0chEn xcIQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from agentk.vger.email (agentk.vger.email. [23.128.96.32]) by mx.google.com with ESMTPS id cd24-20020a056a00421800b0068fb5ca50cdsi1536954pfb.126.2023.10.03.07.30.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 03 Oct 2023 07:30:37 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) client-ip=23.128.96.32; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id AA2E380944FE; Tue, 3 Oct 2023 07:30:14 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240058AbjJCO3v (ORCPT + 99 others); Tue, 3 Oct 2023 10:29:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49622 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240083AbjJCO3u (ORCPT ); Tue, 3 Oct 2023 10:29:50 -0400 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 66BF5AC for ; Tue, 3 Oct 2023 07:29:47 -0700 (PDT) Received: by smtp.kernel.org (Postfix) with ESMTPSA id BE72EC433C8; Tue, 3 Oct 2023 14:29:44 +0000 (UTC) Date: Tue, 3 Oct 2023 15:29:42 +0100 From: Catalin Marinas To: Marc Zyngier Cc: Kristina Martsenko , kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org, James Morse , Suzuki K Poulose , Zenghui Yu , Will Deacon , Vladimir Murzin , Colton Lewis , linux-kernel@vger.kernel.org, Oliver Upton Subject: Re: [PATCH v2 1/2] KVM: arm64: Add handler for MOPS exceptions Message-ID: References: <20230922112508.1774352-1-kristina.martsenko@arm.com> <20230922112508.1774352-2-kristina.martsenko@arm.com> <87sf734ofv.wl-maz@kernel.org> <9f731870-ed36-d2e4-378b-f7fbf338ebd6@arm.com> <87h6ndmixh.wl-maz@kernel.org> <0f99fa65-c8c1-5d5c-d9b0-5436b7592656@arm.com> <86ttr9nkey.wl-maz@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <86ttr9nkey.wl-maz@kernel.org> X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Tue, 03 Oct 2023 07:30:14 -0700 (PDT) On Mon, Oct 02, 2023 at 03:55:33PM +0100, Marc Zyngier wrote: > On Mon, 02 Oct 2023 15:06:33 +0100, > Kristina Martsenko wrote: > > On 29/09/2023 10:23, Marc Zyngier wrote: > > > On Wed, 27 Sep 2023 09:28:20 +0100, > > > Oliver Upton wrote: > > >> On Mon, Sep 25, 2023 at 04:16:06PM +0100, Kristina Martsenko wrote: > > >>>> What is the rationale for advancing the state machine? Shouldn't we > > >>>> instead return to the guest and immediately get the SS exception, > > >>>> which in turn gets reported to userspace? Is it because we rollback > > >>>> the PC to a previous instruction? > > >>> > > >>> Yes, because we rollback the PC to the prologue instruction. We advance the > > >>> state machine so that the SS exception is taken immediately upon returning to > > >>> the guest at the prologue instruction. If we didn't advance it then we would > > >>> return to the guest, execute the prologue instruction, and then take the SS > > >>> exception on the middle instruction. Which would be surprising as userspace > > >>> would see the middle and epilogue instructions executed multiple times but not > > >>> the prologue. > > >> > > >> I agree with Kristina that taking the SS exception on the prologue is > > >> likely the best course of action. Especially since it matches the > > >> behavior of single-stepping an EL0 MOPS sequence with an intervening CPU > > >> migration. > > >> > > >> This behavior might throw an EL1 that single-steps itself for a loop, > > >> but I think it is impossible for a hypervisor to hide the consequences > > >> of vCPU migration with MOPS in the first place. > > >> > > >> Marc, I'm guessing you were most concerned about the former case where > > >> the VMM was debugging the guest. Is there something you're concerned > > >> about I missed? > > > > > > My concern is not only the VMM, but any userspace that perform > > > single-stepping. Imagine the debugger tracks PC by itself, and simply > > > increments it by 4 on a non-branch, non-fault instruction. > > > > > > Move the vcpu or the userspace around, rewind PC, and now the debugger > > > is out of whack with what is executing. While I agree that there is > > > not much a hypervisor can do about that, I'm a bit worried that we are > > > going to break existing SW with this. > > > > > > Now the obvious solution is "don't do that"... > > > > If the debugger can handle the PC changing on branching or faulting > > instructions, then why can't it handle it on MOPS instructions? Wouldn't > > such a debugger need to be updated any time the architecture adds new > > branching or faulting instructions? What's different here? > > What is different is that we *go back* in the instruction stream, > which is a first. I'm not saying that the debugger I describe above > would be a very clever piece of SW, quite the opposite. But the way > the architecture works results in some interesting side-effects, and > I'm willing to bet that some SW will break (rr?). The way the architecture works, either with or without Kristina's single-step change, a debugger would get confused. At least for EL0, I find the proposed (well, upstreamed) approach more predictable - it always restarts from the prologue in case of migration between CPUs with different MOPS implementation (which is not just theoretical AFAIK). It's more like these three instructions are a bigger CISC one ;) (though the CPU can step through its parts). A more transparent approach would have been to fully emulate the instructions in the kernel and advance the PC as expected but I don't think that's even possible. An implementation may decide to leave some bytes to be copied by the epilogue but we can't know that in software, it's a microarchitecture thing. There is the case of EL1 debugging itself (kgdb) and it triggers a MOPS exception to EL2. It would look weird for the guest but I guess the only other option is to disable MCE2 and let EL1 handle the mismatch MOPS option itself (assuming it knows how to; it should be fine for Linux). I think I still prefer Kristina's proposal for KVM as more generic, with the downside of breaking less usual cases like the kernel single-stepping itself. -- Catalin