Received: by 2002:a05:6358:5282:b0:b5:90e7:25cb with SMTP id g2csp3946432rwa; Tue, 23 Aug 2022 13:06:18 -0700 (PDT) X-Google-Smtp-Source: AA6agR6aBdpIWxWt77+hzNBAJcVj+er7sJX28j3Xcfwy71q6x5byI6QvvMCv0xNn2DKyCO6xXDra X-Received: by 2002:a65:4bc6:0:b0:428:90d4:b410 with SMTP id p6-20020a654bc6000000b0042890d4b410mr21123255pgr.529.1661285178425; Tue, 23 Aug 2022 13:06:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1661285178; cv=none; d=google.com; s=arc-20160816; b=qpTzzSz86qZpyCT+E+56OaR+2aehIn/xBOkSa9i+jpnFU6ivtAmg8+DKIKhYuGhRPk xlo0Tx5SwQzEIdwFtNtyHHt0dAgp2l0PYejV+tLMoHqQpCMSmW9v5DoOJ8zIkv0i1/82 kcwm7tNBVLpoLTjsgKZThw02C+Zb5Myog1ZIJq0XtntRm8WVE/RIHHt9cK8X1KuvtdoX FeNmAdNQFqGV0tsdqHOnKswW/rpe6HUkhK+2bSLzhWSSm7LZ7jZnXmBKtZ8YKyllLILk ZqNwoqSj5zIICCubnYnc4DcwetvhGoDlOmJCcSECH5dVSnNxBoF2Jd5qqMEhie+wmXNE rGuA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:user-agent:references:in-reply-to :subject:cc:to:from:message-id:date:dkim-signature; bh=Hrqjy32x4+00U8VLP8IfluUc++WnnMwB45YKrLh+REg=; b=bz/7kcTrHi1IL1rpPkK6Jvbcss6cLTFvTMvKislL/lfiHgPw39MXjhIQLMo9ir3S8E jYbQ06J3PWMHlSGEsqfW+8XnH7q9e13rfXq6Xp53dRG9HlIf2ZgfcMvKa3NtMjU0RPW5 XavOvbX2gAVz7ZoIOdlHkFodu1B2IeMVZMPN9sIvj8lQ778Bj2KXn7zd5P6+32MAm5xM TT6WM4TzC4+TI1XM7RMn7duV/ki7fOJQ89GQu/WeH0Ym6NXvAXxiFfIc9FN3cW7emyoa 4495CP9O0lHdWXJIlOyVnVxt/mKnHDNKwcnd222mzno9ZyoL2F1I1o1J1lxRCNfqmKn4 Jdsw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=QnVz4fG5; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q5-20020a170902eb8500b00172be445d0dsi13101395plg.414.2022.08.23.13.06.06; Tue, 23 Aug 2022 13:06:18 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=QnVz4fG5; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233130AbiHWUED (ORCPT + 99 others); Tue, 23 Aug 2022 16:04:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33170 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233715AbiHWUDo (ORCPT ); Tue, 23 Aug 2022 16:03:44 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6CCAAA722B; Tue, 23 Aug 2022 12:17:09 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 203E5B820D5; Tue, 23 Aug 2022 19:17:08 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id C9B64C433C1; Tue, 23 Aug 2022 19:17:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1661282226; bh=fq3Ha77vc5Ypa6cpCTtel7vNoAw8OmOKU6kZ246w/xA=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=QnVz4fG5gwuIFr0FHxxiw7OBZ520UriTeNfUA/74w7LcZdpRdu/hI5E4MHcHAxQiu 0zWksqekczGyB35pvjWZso76Uc3iAtXkEe91P307IsHQSCPSoYBNFgIJ4p39EtlVbN vPRIaAE969DC53EyT/bd3nylA9e2gd+EShA2CJPVwpPgNrbEtDIW9jQq8KhXKNok/S oJ5buZ1be8Hhdk+teR5+DN5L4H5szV2POnl3ozEPm5slAsDr2XmBLmiLv/32qIPUj7 hddaaIsiz6TTfy7/3L9txOsfY39vDqH6oHa62u9cMH/u//qhBPYR2gzTrPyc0avr0P RO0tr6rirDnMw== Received: from sofa.misterjones.org ([185.219.108.64] helo=why.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1oQZOa-005HGi-Ai; Tue, 23 Aug 2022 20:17:04 +0100 Date: Tue, 23 Aug 2022 20:17:03 +0100 Message-ID: <87bksawz0w.wl-maz@kernel.org> From: Marc Zyngier To: Peter Xu Cc: Gavin Shan , kvmarm@lists.cs.columbia.edu, linux-arm-kernel@lists.infradead.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, pbonzini@redhat.com, corbet@lwn.net, james.morse@arm.com, alexandru.elisei@arm.com, suzuki.poulose@arm.com, oliver.upton@linux.dev, catalin.marinas@arm.com, will@kernel.org, shuah@kernel.org, seanjc@google.com, drjones@redhat.com, dmatlack@google.com, bgardon@google.com, ricarkol@google.com, zhenyzha@redhat.com, shan.gavin@gmail.com Subject: Re: [PATCH v1 1/5] KVM: arm64: Enable ring-based dirty memory tracking In-Reply-To: References: <20220819005601.198436-1-gshan@redhat.com> <20220819005601.198436-2-gshan@redhat.com> <87lerkwtm5.wl-maz@kernel.org> <41fb5a1f-29a9-e6bb-9fab-4c83a2a8fce5@redhat.com> <87fshovtu0.wl-maz@kernel.org> <171d0159-4698-354b-8b2f-49d920d03b1b@redhat.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/27.1 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: peterx@redhat.com, gshan@redhat.com, kvmarm@lists.cs.columbia.edu, linux-arm-kernel@lists.infradead.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, pbonzini@redhat.com, corbet@lwn.net, james.morse@arm.com, alexandru.elisei@arm.com, suzuki.poulose@arm.com, oliver.upton@linux.dev, catalin.marinas@arm.com, will@kernel.org, shuah@kernel.org, seanjc@google.com, drjones@redhat.com, dmatlack@google.com, bgardon@google.com, ricarkol@google.com, zhenyzha@redhat.com, shan.gavin@gmail.com X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 23 Aug 2022 14:58:19 +0100, Peter Xu wrote: > > On Tue, Aug 23, 2022 at 03:22:17PM +1000, Gavin Shan wrote: > > > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c > > > index 986cee6fbc7f..0b41feb6fb7d 100644 > > > --- a/arch/arm64/kvm/arm.c > > > +++ b/arch/arm64/kvm/arm.c > > > @@ -747,6 +747,12 @@ static int check_vcpu_requests(struct kvm_vcpu *vcpu) > > > if (kvm_check_request(KVM_REQ_SUSPEND, vcpu)) > > > return kvm_vcpu_suspend(vcpu); > > > + > > > + if (kvm_check_request(KVM_REQ_RING_SOFT_FULL, vcpu)) { > > > + vcpu->run->exit_reason = KVM_EXIT_DIRTY_RING_FULL; > > > + trace_kvm_dirty_ring_exit(vcpu); > > > + return 0; > > > + } > > > } > > > return 1; > > > diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c > > > index f4c2a6eb1666..08b2f01164fa 100644 > > > --- a/virt/kvm/dirty_ring.c > > > +++ b/virt/kvm/dirty_ring.c > > > @@ -149,6 +149,7 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring) > > > void kvm_dirty_ring_push(struct kvm_dirty_ring *ring, u32 slot, u64 offset) > > > { > > > + struct kvm_vcpu *vcpu = container_of(ring, struct kvm_vcpu, dirty_ring); > > > struct kvm_dirty_gfn *entry; > > > /* It should never get full */ > > > @@ -166,6 +167,9 @@ void kvm_dirty_ring_push(struct kvm_dirty_ring *ring, u32 slot, u64 offset) > > > kvm_dirty_gfn_set_dirtied(entry); > > > ring->dirty_index++; > > > trace_kvm_dirty_ring_push(ring, slot, offset); > > > + > > > + if (kvm_dirty_ring_soft_full(vcpu)) > > > + kvm_make_request(KVM_REQ_RING_SOFT_FULL, vcpu); > > > } > > > struct page *kvm_dirty_ring_get_page(struct kvm_dirty_ring *ring, u32 offset) > > > > > > > Ok, thanks for the details, Marc. I will adopt your code in next revision :) > > Note that there can be a slight difference with the old/new code, in that > an (especially malicious) userapp can logically ignore the DIRTY_RING_FULL > vmexit and keep kicking VCPU_RUN with the new code. > > Unlike the old code, the 2nd/3rd/... KVM_RUN will still run in the new code > until the next dirty pfn being pushed to the ring, then it'll request ring > full exit again. > > Each time it exits the ring grows 1. > > At last iiuc it can easily hit the ring full and trigger the warning at the > entry of kvm_dirty_ring_push(): > > /* It should never get full */ > WARN_ON_ONCE(kvm_dirty_ring_full(ring)); Hmmm, yes. Well spotted. > We did that because kvm_dirty_ring_push() was previously designed to not be > able to fail at all (e.g., in the old bitmap world we never will fail too). > We can't because we can't lose any dirty page or migration could silently > fail too (consider when we do user exit due to ring full and migration just > completed; there could be unsynced pages on src/dst). > > So even though the old approach will need to read kvm->dirty_ring_size for > every entrance which is a pity, it will avoid issue above. I don't think we really need this check on the hot path. All we need is to make the request sticky until userspace gets their act together and consumes elements in the ring. Something like: diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index 986cee6fbc7f..e8ed5e1af159 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -747,6 +747,14 @@ static int check_vcpu_requests(struct kvm_vcpu *vcpu) if (kvm_check_request(KVM_REQ_SUSPEND, vcpu)) return kvm_vcpu_suspend(vcpu); + + if (kvm_check_request(KVM_REQ_RING_SOFT_FULL, vcpu) && + kvm_dirty_ring_soft_full(vcpu)) { + kvm_make_request(KVM_REQ_RING_SOFT_FULL, vcpu); + vcpu->run->exit_reason = KVM_EXIT_DIRTY_RING_FULL; + trace_kvm_dirty_ring_exit(vcpu); + return 0; + } } return 1; However, I'm a bit concerned by the reset side of things. It iterates over the vcpus and expects the view of each ring to be consistent, even if userspace is hacking at it from another CPU. For example, I can't see what guarantees that the kernel observes the writes from userspace in the order they are being performed (the documentation provides no requirements other than "it must collect the dirty GFNs in sequence", which doesn't mean much from an ordering perspective). I can see that working on a strongly ordered architecture, but on something as relaxed as ARM, the CPUs may^Wwill aggressively reorder stuff that isn't explicitly ordered. I have the feeling that a CAS operation on both sides would be enough, but someone who actually understands how this works should have a look... Thanks, M. -- Without deviation from the norm, progress is not possible.