Received: by 2002:ac0:e34a:0:0:0:0:0 with SMTP id g10csp322856imn; Thu, 28 Jul 2022 01:31:17 -0700 (PDT) X-Google-Smtp-Source: AGRyM1vlM3rPWbX19AcWPn2OCUZfz8wbORvDatfZ7kl2GjnbrejFD4/U/6RZ56bvRRcs60j1FEEX X-Received: by 2002:a17:907:2d23:b0:72b:7c6a:24c with SMTP id gs35-20020a1709072d2300b0072b7c6a024cmr19861305ejc.44.1658997077164; Thu, 28 Jul 2022 01:31:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1658997077; cv=none; d=google.com; s=arc-20160816; b=wm7fgUOjpnL/c6prNKj4I06n6wQIRGg6Sq7gL0qm8devG7tm63W2RxBETemO9c97JR 7luf5gFT9W2B68qubFy8UFrBXVfOZy6l5c6atQRrwc0cGK80OISAyS0dSgiwGRM0KxeG B9aQGMYKFB7PZEj+N7nnSjyO9KoN20mJkUCdubQx7hHpv8XndVkT8o3mtMmO9y+sd4Wh +JKIfwvGu0KSzSSYp09IYaQTpzphgFmQAFHHsnqTdXnBIzR8Hh2dA11dzmv4PDW6aB1i wbX4T+zyPMo2bU4cMfVwsl6A0qCP3qO4NGKwQZeXfg/P7M5RDEgRizYi43zCFV4ml69d c0XQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:date:cc:to:from:subject :message-id:dkim-signature; bh=JxD0SjhCn3MzLYFy6Mt2UGeOLFZapeIwuPiohQNsjYI=; b=S0nDByXpBsgYAfESqnqLZkFlQ2PB4hcxhco62R741aojwPPCyFL9uA+hYBjxqYZOyg IOYiB4moBTjlZLsWcBypNXizWIbQt8cvtNOFtnlo2b6g9YIDYpQC2HM7bqMZflGdhT1y dYXPolpYkj+a+6Ryvw7plBA7txFVRjhgZETMpdd8ZOCyvrzZzdo34gwDFdSf+CHwavFI k9LJV1GXKifTAnqXvAaKkhtTXv6yoWmUQLvXIJJfa46H+6atq+xlLgK73KCOVakuV/+O jO/kWdmIehV3Tf3dg+oCIvFSqBFLcmM0GLyr6dVVK4nNXaz6hlvKKKUODjYkqAkl7cRJ /BEw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=ZcuRNIej; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id y11-20020a170906470b00b006ff11ed7162si193712ejq.535.2022.07.28.01.30.52; Thu, 28 Jul 2022 01:31:17 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=ZcuRNIej; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234746AbiG1HqU (ORCPT + 99 others); Thu, 28 Jul 2022 03:46:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38866 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233440AbiG1HqS (ORCPT ); Thu, 28 Jul 2022 03:46:18 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id D671F4F670 for ; Thu, 28 Jul 2022 00:46:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1658994376; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=JxD0SjhCn3MzLYFy6Mt2UGeOLFZapeIwuPiohQNsjYI=; b=ZcuRNIejqYlRacGe8ZGcb1nurgmIHQ//qZp0KBKDJio2VIZD86Zd0UcEhU5oQFELU6c/GC 8Mm7sdrZ0RCjoNR3XQwSFsz0Xbp0kO57c3H4QHDctdvH7zUxcuk4u0uQ483EaQnU8KxHm6 JXijoK4GZChpfqzzb9UOuHaBw42nmbc= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-223-tU15Gl_zNy21qkwgOyDVpQ-1; Thu, 28 Jul 2022 03:46:11 -0400 X-MC-Unique: tU15Gl_zNy21qkwgOyDVpQ-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 44F361C05ABC; Thu, 28 Jul 2022 07:46:10 +0000 (UTC) Received: from starship (unknown [10.40.192.46]) by smtp.corp.redhat.com (Postfix) with ESMTP id 6D7C5C27D95; Thu, 28 Jul 2022 07:46:02 +0000 (UTC) Message-ID: <7c4cf32dca42ab84bdb427a9e4862dbf5509f961.camel@redhat.com> Subject: Re: [RFC PATCH v3 04/19] KVM: x86: mmu: allow to enable write tracking externally From: Maxim Levitsky To: Sean Christopherson Cc: kvm@vger.kernel.org, Wanpeng Li , Vitaly Kuznetsov , Jani Nikula , Paolo Bonzini , Tvrtko Ursulin , Rodrigo Vivi , Zhenyu Wang , Joonas Lahtinen , Tom Lendacky , Ingo Molnar , David Airlie , Thomas Gleixner , Dave Hansen , x86@kernel.org, intel-gfx@lists.freedesktop.org, Daniel Vetter , Borislav Petkov , Joerg Roedel , linux-kernel@vger.kernel.org, Jim Mattson , Zhi Wang , Brijesh Singh , "H. Peter Anvin" , intel-gvt-dev@lists.freedesktop.org, dri-devel@lists.freedesktop.org Date: Thu, 28 Jul 2022 10:46:01 +0300 In-Reply-To: References: <20220427200314.276673-1-mlevitsk@redhat.com> <20220427200314.276673-5-mlevitsk@redhat.com> <5ed0d0e5a88bbee2f95d794dbbeb1ad16789f319.camel@redhat.com> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.36.5 (3.36.5-2.fc32) MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.85 on 10.11.54.8 X-Spam-Status: No, score=-3.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 2022-07-25 at 16:08 +0000, Sean Christopherson wrote: > On Wed, Jul 20, 2022, Maxim Levitsky wrote: > > On Sun, 2022-05-22 at 13:22 +0300, Maxim Levitsky wrote: > > > On Thu, 2022-05-19 at 16:37 +0000, Sean Christopherson wrote: > > > > On Wed, Apr 27, 2022, Maxim Levitsky wrote: > > > > > @@ -5753,6 +5752,10 @@ int kvm_mmu_init_vm(struct kvm *kvm) > > Now for nested AVIC, this is what I would like to do: > > > > - just like mmu, I prefer to register the write tracking notifier, when the > > VM is created. > > > > - just like mmu, write tracking should only be enabled when nested AVIC is > > actually used first time, so that write tracking is not always enabled when > > you just boot a VM with nested avic supported, since the VM might not use > > nested at all. > > > > Thus I either need to use the __kvm_page_track_register_notifier too for AVIC > > (and thus need to export it) or I need to have a boolean > > (nested_avic_was_used_once) and register the write tracking notifier only > > when false and do it not on VM creation but on first attempt to use nested > > AVIC. > > > > Do you think this is worth it? I mean there is some value of registering the > > notifier only when needed (this way it is not called for nothing) but it does > > complicate things a bit. > > Compared to everything else that you're doing in the nested AVIC code, refcounting > the shared kvm_page_track_notifier_node object is a trivial amount of complexity. Makes sense. > > And on that topic, do you have performance numbers to justify using a single > shared node? E.g. if every table instance has its own notifier, then no additional > refcounting is needed. The thing is that KVM goes over the list of notifiers and calls them for every write from the emulator in fact even just for mmio write, and when you enable write tracking on a page, you just write protect the page and add a mark in the page track array, which is roughly 'don't install spte, don't install mmio spte, but just emulate the page fault if it hits this page' So adding more than a bare minimum to this list, seems just a bit wrong. > It's not obvious that a shared node will provide better > performance, e.g. if there are only a handful of AVIC tables being shadowed, then > a linear walk of all nodes is likely fast enough, and doesn't bring the risk of > a write potentially being stalled due to having to acquire a VM-scoped mutex. The thing is that if I register multiple notifiers, they all will be called anyway, but yes I can use container_of, and discover which table the notifier belongs to, instead of having a hash table where I lookup the GFN of the fault. The above means practically that all the shadow physid tables will be in a linear list of notifiers, so I could indeed avoid per vm mutex on the write tracking, however for simplicity I probably will still need it because I do modify the page, and having per physid table mutex complicates things. Currently in my code the locking is very simple and somewhat dumb, but the performance is very good because the code isn't executed often, most of the time the AVIC hardware works alone without any VM exits. Once the code is accepted upstream, it's one of the things that can be improved. Note though that I still need a hash table and a mutex because on each VM entry, the guest can use a different physid table, so I need to lookup it, and create it, if not found, which would require read/write of the hash table and thus a mutex. > > > I can also stash this boolean (like 'bool registered;') into the 'struct > > kvm_page_track_notifier_node', and thus allow the > > kvm_page_track_register_notifier to be called more that once - then I can > > also get rid of __kvm_page_track_register_notifier. > > No, allowing redundant registration without proper refcounting leads to pain, > e.g. X registers, Y registers, X unregisters, kaboom. > True, but then what about adding a refcount to 'struct kvm_page_track_notifier_node' instead of a boolean, and allowing redundant registration? Probably not worth it, in which case I am OK to add a refcount to my avic code. Or maybe just scrap the whole thing and just leave registration and activation of the write tracking as two separate things? Honestly now that looks like the most clean solution. Best regards, Maxim Levitsky