Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E167EC4332F for ; Tue, 11 Jan 2022 17:55:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344918AbiAKRzH (ORCPT ); Tue, 11 Jan 2022 12:55:07 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48804 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344889AbiAKRzE (ORCPT ); Tue, 11 Jan 2022 12:55:04 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 11A18C06173F; Tue, 11 Jan 2022 09:55:04 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 952D7616FB; Tue, 11 Jan 2022 17:55:03 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9C027C36AE3; Tue, 11 Jan 2022 17:55:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1641923703; bh=qezUaobilfCiOozB6fpA9CQIiMRB3TzObZP8k7pqalE=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=P1aIGWNrDE5Pq9i2iIiuOAZE5eE3hnBSg+wBToShqWMdbtZANA/XmyTi2E/ptXOzw puCH6/yxHJOzfjhVhk7On4OG14TVNMVOs1Q4pgsMiEwATzqLjqh71MQVxOMgMZZw09 XenKeM6wvN9hj1WxBqv2WRwsVAGpBXRjJdXZoEv0QjzBuCRB8Ak7XrEInTfZg7frjl Y5xIC8f5Uv9uzQKImvsYp5o6Rq1wa/Wc5jmiUcpG6OJFeeW9eh7zhiAZ9XiHU32dqt EW7L1LQKHkBH7XcPBpxpHEGA+n/6KrK+gpyTCOR7z9dSE17Ul0DZLw+apHRJwtftVt P8nfxf3tz0jDA== Received: from sofa.misterjones.org ([185.219.108.64] helo=why.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1n7LMJ-00HSaI-Oj; Tue, 11 Jan 2022 17:55:00 +0000 Date: Tue, 11 Jan 2022 17:54:59 +0000 Message-ID: <87v8yqrwcs.wl-maz@kernel.org> From: Marc Zyngier To: Mark Rutland Cc: linux-kernel@vger.kernel.org, aleksandar.qemu.devel@gmail.com, alexandru.elisei@arm.com, anup.patel@wdc.com, aou@eecs.berkeley.edu, atish.patra@wdc.com, benh@kernel.crashing.org, borntraeger@linux.ibm.com, bp@alien8.de, catalin.marinas@arm.com, chenhuacai@kernel.org, dave.hansen@linux.intel.com, david@redhat.com, frankja@linux.ibm.com, frederic@kernel.org, gor@linux.ibm.com, hca@linux.ibm.com, imbrenda@linux.ibm.com, james.morse@arm.com, jmattson@google.com, joro@8bytes.org, kvm@vger.kernel.org, mingo@redhat.com, mpe@ellerman.id.au, nsaenzju@redhat.com, palmer@dabbelt.com, paulmck@kernel.org, paulus@samba.org, paul.walmsley@sifive.com, pbonzini@redhat.com, seanjc@google.com, suzuki.poulose@arm.com, tglx@linutronix.de, tsbogend@alpha.franken.de, vkuznets@redhat.com, wanpengli@tencent.com, will@kernel.org Subject: Re: [PATCH 1/5] kvm: add exit_to_guest_mode() and enter_from_guest_mode() In-Reply-To: <20220111153539.2532246-2-mark.rutland@arm.com> References: <20220111153539.2532246-1-mark.rutland@arm.com> <20220111153539.2532246-2-mark.rutland@arm.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/27.1 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: mark.rutland@arm.com, linux-kernel@vger.kernel.org, aleksandar.qemu.devel@gmail.com, alexandru.elisei@arm.com, anup.patel@wdc.com, aou@eecs.berkeley.edu, atish.patra@wdc.com, benh@kernel.crashing.org, borntraeger@linux.ibm.com, bp@alien8.de, catalin.marinas@arm.com, chenhuacai@kernel.org, dave.hansen@linux.intel.com, david@redhat.com, frankja@linux.ibm.com, frederic@kernel.org, gor@linux.ibm.com, hca@linux.ibm.com, imbrenda@linux.ibm.com, james.morse@arm.com, jmattson@google.com, joro@8bytes.org, kvm@vger.kernel.org, mingo@redhat.com, mpe@ellerman.id.au, nsaenzju@redhat.com, palmer@dabbelt.com, paulmck@kernel.org, paulus@samba.org, paul.walmsley@sifive.com, pbonzini@redhat.com, seanjc@google.com, suzuki.poulose@arm.com, tglx@linutronix.de, tsbogend@alpha.franken.de, vkuznets@redhat.com, wanpengli@tencent.com, will@kernel.org X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Mark, On Tue, 11 Jan 2022 15:35:35 +0000, Mark Rutland wrote: > > When transitioning to/from guest mode, it is necessary to inform > lockdep, tracing, and RCU in a specific order, similar to the > requirements for transitions to/from user mode. Additionally, it is > necessary to perform vtime accounting for a window around running the > guest, with RCU enabled, such that timer interrupts taken from the guest > can be accounted as guest time. > > Most architectures don't handle all the necessary pieces, and a have a > number of common bugs, including unsafe usage of RCU during the window > between guest_enter() and guest_exit(). > > On x86, this was dealt with across commits: > > 87fa7f3e98a1310e ("x86/kvm: Move context tracking where it belongs") > 0642391e2139a2c1 ("x86/kvm/vmx: Add hardirq tracing to guest enter/exit") > 9fc975e9efd03e57 ("x86/kvm/svm: Add hardirq tracing on guest enter/exit") > 3ebccdf373c21d86 ("x86/kvm/vmx: Move guest enter/exit into .noinstr.text") > 135961e0a7d555fc ("x86/kvm/svm: Move guest enter/exit into .noinstr.text") > 160457140187c5fb ("KVM: x86: Defer vtime accounting 'til after IRQ handling") > bc908e091b326467 ("KVM: x86: Consolidate guest enter/exit logic to common helpers") > > ... but those fixes are specific to x86, and as the resulting logic > (while correct) is split across generic helper functions and > x86-specific helper functions, it is difficult to see that the > entry/exit accounting is balanced. > > This patch adds generic helpers which architectures can use to handle > guest entry/exit consistently and correctly. The guest_{enter,exit}() > helpers are split into guest_timing_{enter,exit}() to perform vtime > accounting, and guest_context_{enter,exit}() to perform the necessary > context tracking and RCU management. The existing guest_{enter,exit}() > heleprs are left as wrappers of these. > > Atop this, new exit_to_guest_mode() and enter_from_guest_mode() helpers > are added to handle the ordering of lockdep, tracing, and RCU manageent. > These are named to align with exit_to_user_mode() and > enter_from_user_mode(). > > Subsequent patches will migrate architectures over to the new helpers, > following a sequence: > > guest_timing_enter_irqoff(); > > exit_to_guest_mode(); > < run the vcpu > > enter_from_guest_mode(); > > < take any pending IRQs > > > guest_timing_exit_irqoff(); > > This sequences handles all of the above correctly, and more clearly > balances the entry and exit portions, making it easier to understand. > > The existing helpers are marked as deprecated, and will be removed once > all architectures have been converted. > > There should be no functional change as a result of this patch. > > Signed-off-by: Mark Rutland Thanks a lot for looking into this and writing this up. I have a couple of comments below, but that's pretty much cosmetic and is only there to ensure that I actually understand this stuff. FWIW: Reviewed-by: Marc Zyngier > --- > include/linux/kvm_host.h | 108 +++++++++++++++++++++++++++++++++++++-- > 1 file changed, 105 insertions(+), 3 deletions(-) > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > index c310648cc8f1..13fcf7979880 100644 > --- a/include/linux/kvm_host.h > +++ b/include/linux/kvm_host.h > @@ -29,6 +29,8 @@ > #include > #include > #include > +#include > +#include > #include > > #include > @@ -362,8 +364,11 @@ struct kvm_vcpu { > int last_used_slot; > }; > > -/* must be called with irqs disabled */ > -static __always_inline void guest_enter_irqoff(void) > +/* > + * Start accounting time towards a guest. > + * Must be called before entering guest context. > + */ > +static __always_inline void guest_timing_enter_irqoff(void) > { > /* > * This is running in ioctl context so its safe to assume that it's the > @@ -372,7 +377,17 @@ static __always_inline void guest_enter_irqoff(void) > instrumentation_begin(); > vtime_account_guest_enter(); > instrumentation_end(); > +} > > +/* > + * Enter guest context and enter an RCU extended quiescent state. > + * > + * This should be the last thing called before entering the guest, and must be > + * called after any potential use of RCU (including any potentially > + * instrumented code). nit: "the last thing called" is terribly ambiguous. Any architecture obviously calls a ****load of stuff after this point. Should this be 'the last thing involving RCU' instead? > + */ > +static __always_inline void guest_context_enter_irqoff(void) > +{ > /* > * KVM does not hold any references to rcu protected data when it > * switches CPU into a guest mode. In fact switching to a guest mode > @@ -388,16 +403,77 @@ static __always_inline void guest_enter_irqoff(void) > } > } > > -static __always_inline void guest_exit_irqoff(void) > +/* > + * Deprecated. Architectures should move to guest_timing_enter_irqoff() and > + * exit_to_guest_mode(). > + */ > +static __always_inline void guest_enter_irqoff(void) > +{ > + guest_timing_enter_irqoff(); > + guest_context_enter_irqoff(); > +} > + > +/** > + * exit_to_guest_mode - Fixup state when exiting to guest mode > + * > + * This is analagous to exit_to_user_mode(), and ensures we perform the > + * following in order: > + * > + * 1) Trace interrupts on state > + * 2) Invoke context tracking if enabled to adjust RCU state > + * 3) Tell lockdep that interrupts are enabled nit: or rather, are about to be enabled? Certainly on arm64, the enable happens much later, right at the point where we enter the guest for real. Thanks, M. -- Without deviation from the norm, progress is not possible.