Received: by 2002:a05:6a10:5bc5:0:0:0:0 with SMTP id os5csp992709pxb; Wed, 27 Oct 2021 16:57:06 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzbgmYiPhil5saFWZyKqBzEkqDW5Q94RxEakiJRSMBtXzBWL2ePtdxA/O/mSuLEP4+hd5qG X-Received: by 2002:a17:907:6217:: with SMTP id ms23mr930253ejc.459.1635379025817; Wed, 27 Oct 2021 16:57:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1635379025; cv=none; d=google.com; s=arc-20160816; b=pQ0eAj7bTV6APArl91ifZsyaZDNitvOJmYWfD3pwLxic67O5ZAiC0GlvfdQimmmnoF pdWmk7+a7vEs1Mj/auVOXZYQrFTNdtovFi9mHWTZHP+u+9OCrnjSgCgLA9/ziUq4Lmya Onj66eZ0MOJuiubcIQKQYZ24rat23NzVYVqk926iz/y9d37nmxGAPtb6xNmvwYX7utMc 3JUIZjcVsWO3/oB3yW3cvymvCKT0omW9mSX8z4ePqi4RUxHTwl/ZujhXwVagbhbzaf6g 9XfcFLDw830FwN72wmoJWSTXYfaeg6eDEEuUdJPVFIuDVy9/v+bHGsdu60u7g1tgvvxc 1ztw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=xhttgZaRcXKhN8bT5rGKDM9G1nNGHoVSJvvEjSamdmo=; b=Ub9psj9HGeZ9Hyl5CrhYBztqAS9oTc+EI+nm2B4fsFe1W6koa7a0iEX2VpK4hmxyB+ MA2Ef33/XMRSA+ww4qRdYxz7+kFuuuqX42fpSzY809dD6piDQ7UebYdyMn+IOdA1z3vg jMIvlbn3vOaJkpYb+rI5cJGeJfigUdJgk5iN0A+0M2TYlbT4pncVvZQOCiGKCOU2ZHtu 6pB5Gow4g4Nc5SgJPpnOSnTb2QGjit54gTEGFbbmMmUaPlQnuoPlMwMujE03h/D7KDEp PTZHzdsHB4hepJYhL8zJx3OaPoTDKtJYBR5ckpXTF32mSZLfsd8eyjNFAz2F+pdKIm52 CuHw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=F1aF0RJf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id b4si1777700ejb.354.2021.10.27.16.56.41; Wed, 27 Oct 2021 16:57:05 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=F1aF0RJf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229554AbhJ0X4y (ORCPT + 99 others); Wed, 27 Oct 2021 19:56:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60608 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229437AbhJ0X4x (ORCPT ); Wed, 27 Oct 2021 19:56:53 -0400 Received: from mail-pg1-x532.google.com (mail-pg1-x532.google.com [IPv6:2607:f8b0:4864:20::532]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C7441C061570 for ; Wed, 27 Oct 2021 16:54:27 -0700 (PDT) Received: by mail-pg1-x532.google.com with SMTP id l186so4552331pge.7 for ; Wed, 27 Oct 2021 16:54:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=xhttgZaRcXKhN8bT5rGKDM9G1nNGHoVSJvvEjSamdmo=; b=F1aF0RJfL4T5AfgY+pQ8Y9pf3c/UZ0dt/vfbynnJstQRjRvVdVcB6p0ENa0xiB88YU 1xpUaYFd7v0Dg3QAVzBzyQGhk71h/xPvyDaJqOoZTOlcyLYWWt4YDBpLIkW5JAIwuT9d qNg5sr4YYLYm+GndjsfzlutqXqUviuqqju+2R2r1mfyYFUZzjRcv2e4Yz/1KJfvmIOXx wlrKG8QG+bfnX1TXnG394wXIxFrqhOPgPAcMmzKbV4u6K2bfsij68FlW2NhxFmZLxBBQ xUWJulra+q698mHmhQ0Bw7h7QYELYq+W7kZ8DQOspY8kJG6zI1ZFgMb2y0Vl0XUkTFzf GxIA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=xhttgZaRcXKhN8bT5rGKDM9G1nNGHoVSJvvEjSamdmo=; b=nN4ZFCEg/oqvvCsseX2CFxpOrPPzS29n8ZVzPsbgd0VD0UYvvS0anOhK09VzxQwKi0 olL+Qg3JGEEqHZ4k8ASV4iBHdU1eNbF3mqaBoHFE2Ntk9YO8w9ylW9+DtKLH16mO8++N LKY0WPrt+UNnn0rZuzxn/EJitQ7kqaXbNKXW81lWWR4q/RMJEaPs1UV8DMiNl3ELdjv1 O6v349YhRXIq5qyeR2EJfeesYY82B8Ax5boJRrDqyU9oSr0vHuIWITOy9nwrITsQ9FwI 4jAD78k56JVUVB1vRLkYdWax6kONV5meLwLDfPrbxSFiA58EnBCS2pOFUoH4j0tkuL09 Spzw== X-Gm-Message-State: AOAM532DGMpQQ+ptx20b1z8UACF8Aw48CmfFF6eA1c+PiyctwTBxK7Ml DIjUU3spIp1vtGCh9Rik1fGvKA== X-Received: by 2002:a63:9550:: with SMTP id t16mr641809pgn.318.1635378867042; Wed, 27 Oct 2021 16:54:27 -0700 (PDT) Received: from google.com (157.214.185.35.bc.googleusercontent.com. [35.185.214.157]) by smtp.gmail.com with ESMTPSA id f7sm1048335pfv.152.2021.10.27.16.54.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Oct 2021 16:54:26 -0700 (PDT) Date: Wed, 27 Oct 2021 23:54:22 +0000 From: Sean Christopherson To: "Maciej S. Szmigiero" Cc: Paolo Bonzini , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Igor Mammedov , Marc Zyngier , James Morse , Julien Thierry , Suzuki K Poulose , Huacai Chen , Aleksandar Markovic , Paul Mackerras , Christian Borntraeger , Janosch Frank , David Hildenbrand , Cornelia Huck , Claudio Imbrenda , Joerg Roedel , kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v5 11/13] KVM: Keep memslots in tree-based structures instead of array-based ones Message-ID: References: <0c35c60b524dd264cc6abb6a48bc253958f99673.1632171479.git.maciej.szmigiero@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Oct 27, 2021, Sean Christopherson wrote: > I'll get a series for the fix posted tomorrow, and hopefully reply with my thoughts > for this patch too. Knew I should have hedged. I got a series compiling, but it's completed untested otheriwse. I do have input for this patch though. The cleanups I have prepped allow for a few things: 1. Dropping the copy of the old memslot, thus freeing up "old" to be a pointer to the actual old memslot. 2. Folding the existing kvm_delete_memslot() into __kvm_set_memory_region() to free up the name for a helper to kvm_set_memslot(). 3. Handling the dirty bitmap and memslot freeing in generic versions of prepare and commit. This is related to #1. And then for this patch, there is a hard invariant that we can rely on to make the code easier to follow, and that is: KVM can never directly modify a slot that is in the active memslots tree(s). That invariant means there is no need to track the active vs. inactive memslots, because all helpers simply retrieve the inactive memslot on-demand. And doling out the work to helpers also mostly eliminates the need to track the inactive vs. active slot. There's still some slightly twisty logic, especially for the MOVE case (which nobody uses, *sigh*). If we really want, even that mess can be cleaned up by pre-allocating an extra memslot, but I don't think it's worth the effort so long as things are well documented. Anyway, the twisty logic is mostly a matter of documentation; IMO it's actually advantageous to not swap pointers as the different naming and behavior highlights the differences between each type of change, especially with respect to side effects. I'll post the full series tomorrow (this part is unlikely to be tested, but the prep work will be tested). Below is the meat of kvm_set_memslot(). Again, compile tested only, there's a 99.999999% chance it's buggy. As for naming, "working" was the least awful name I could come up with. I also considered "scratch" and "tmp". I'm definitely open to other names. And finally, ignore my comments on the other patch about using memcpy(). KVM copies memslot structs all over the place without memcpy(), i.e. I was being paranoid and/or delusional :-) static void kvm_invalidate_memslot(struct kvm *kvm, struct kvm_memory_slot *old, struct kvm_memory_slot *working_slot) { /* * Mark the current slot INVALID. As with all memslot modifications, * this must be done on an unreachable slot to avoid modifying the * current slot in the active tree. */ kvm_copy_memslot(working_slot, old); working_slot->flags |= KVM_MEMSLOT_INVALID; kvm_replace_memslot(kvm, old, working_slot); /* * Activate the slot that is now marked INVALID, but don't propagate * the slot to the now inactive slots. The slot is either going to be * deleted or recreated as a new slot. */ kvm_swap_active_memslots(kvm, old->as_id); /* * From this point no new shadow pages pointing to a deleted, or moved, * memslot will be created. Validation of sp->gfn happens in: * - gfn_to_hva (kvm_read_guest, gfn_to_pfn) * - kvm_is_visible_gfn (mmu_check_root) */ kvm_arch_flush_shadow_memslot(kvm, old); /* Was released by kvm_swap_active_memslots, reacquire. */ mutex_lock(&kvm->slots_arch_lock); /* * Copy the arch-specific field of the newly-installed slot back to the * old slot as the arch data could have changed between releasing * slots_arch_lock in install_new_memslots() and re-acquiring the lock * above. Writers are required to retrieve memslots *after* acquiring * slots_arch_lock, thus the active slot's data is guaranteed to be fresh. */ old->arch = working_slot->arch; } static void kvm_create_memslot(struct kvm *kvm, const struct kvm_memory_slot *new, struct kvm_memory_slot *working) { /* * Add the new memslot to the inactive set as a copy of the * new memslot data provided by userspace. */ kvm_copy_memslot(working, new); kvm_replace_memslot(kvm, NULL, working); kvm_activate_memslot(kvm, NULL, working); } static void kvm_delete_memslot(struct kvm *kvm, struct kvm_memory_slot *old, struct kvm_memory_slot *invalid_slot) { /* * Remove the old memslot (in the inactive memslots) by passing NULL as * the "new" slot. */ kvm_replace_memslot(kvm, old, NULL); /* And do the same for the invalid version in the active slot. */ kvm_activate_memslot(kvm, invalid_slot, NULL); /* Free the invalid slot, the caller will clean up the old slot. */ kfree(invalid_slot); } static struct kvm_memory_slot *kvm_move_memslot(struct kvm *kvm, struct kvm_memory_slot *old, const struct kvm_memory_slot *new, struct kvm_memory_slot *invalid_slot) { struct kvm_memslots *slots = kvm_get_inactive_memslots(kvm, old->as_id); /* * The memslot's gfn is changing, remove it from the inactive tree, it * will be re-added with its updated gfn. Because its range is * changing, an in-place replace is not possible. */ kvm_memslot_gfn_erase(slots, old); /* * The old slot is now fully disconnected, reuse its memory for the * persistent copy of "new". */ kvm_copy_memslot(old, new); /* Re-add to the gfn tree with the updated gfn */ kvm_memslot_gfn_insert(slots, old); /* Replace the current INVALID slot with the updated memslot. */ kvm_activate_memslot(kvm, invalid_slot, old); /* * Clear the INVALID flag so that the invalid_slot is now a perfect * copy of the old slot. Return it for cleanup in the caller. */ WARN_ON_ONCE(!(invalid_slot->flags & KVM_MEMSLOT_INVALID)); invalid_slot->flags &= ~KVM_MEMSLOT_INVALID; return invalid_slot; } static void kvm_update_flags_memslot(struct kvm *kvm, struct kvm_memory_slot *old, const struct kvm_memory_slot *new, struct kvm_memory_slot *working_slot) { /* * Similar to the MOVE case, but the slot doesn't need to be * zapped as an intermediate step. Instead, the old memslot is * simply replaced with a new, updated copy in both memslot sets. * * Since the memslot gfn is unchanged, kvm_copy_replace_memslot() * and kvm_memslot_gfn_replace() can be used to switch the node * in the gfn tree instead of removing the old and inserting the * new as two separate operations. Replacement is a single O(1) * operation versus two O(log(n)) operations for remove+insert. */ kvm_copy_memslot(working_slot, new); kvm_replace_memslot(kvm, old, working_slot); kvm_activate_memslot(kvm, old, working_slot); } static int kvm_set_memslot(struct kvm *kvm, struct kvm_memory_slot *old, struct kvm_memory_slot *new, enum kvm_mr_change change) { struct kvm_memory_slot *working; int r; /* * Modifications are done on an unreachable slot. Any changes are then * (eventually) propagated to both the active and inactive slots. This * allocation would ideally be on-demand (in helpers), but is done here * to avoid having to handle failure after kvm_prepare_memory_region(). */ working = kzalloc(sizeof(*working), GFP_KERNEL_ACCOUNT); if (!working) return -ENOMEM; /* * Released in kvm_swap_active_memslots. * * Must be held from before the current memslots are copied until * after the new memslots are installed with rcu_assign_pointer, * then released before the synchronize srcu in kvm_swap_active_memslots. * * When modifying memslots outside of the slots_lock, must be held * before reading the pointer to the current memslots until after all * changes to those memslots are complete. * * These rules ensure that installing new memslots does not lose * changes made to the previous memslots. */ mutex_lock(&kvm->slots_arch_lock); /* * Invalidate the old slot if it's being deleted or moved. This is * done prior to actually deleting/moving the memslot to allow vCPUs to * continue running by ensuring there are no mappings or shadow pages * for the memslot when it is deleted/moved. Without pre-invalidation * (and without a lock), a window would exist between effecting the * delete/move and committing the changes in arch code where KVM or a * guest could access a non-existent memslot. */ if (change == KVM_MR_DELETE || change == KVM_MR_MOVE) kvm_invalidate_memslot(kvm, old, working); r = kvm_prepare_memory_region(kvm, old, new, change); if (r) { /* * For DELETE/MOVE, revert the above INVALID change. No * modifications required since the original slot was preserved * in the inactive slots. Changing the active memslots also * releases slots_arch_lock. */ if (change == KVM_MR_DELETE || change == KVM_MR_MOVE) kvm_activate_memslot(kvm, working, old); else mutex_unlock(&kvm->slots_arch_lock); kfree(working); return r; } /* * For DELETE and MOVE, the working slot is now active as the INVALID * version of the old slot. MOVE is particularly special as it reuses * the old slot and returns a copy of the old slot (in working_slot). * For CREATE, there is no old slot. For DELETE and FLAGS_ONLY, the * old slot is detached but otherwise preserved. */ if (change == KVM_MR_CREATE) kvm_create_memslot(kvm, new, working); else if (change == KVM_MR_DELETE) kvm_delete_memslot(kvm, old, working); if (change == KVM_MR_MOVE) old = kvm_move_memslot(kvm, old, new, working); else if (change == KVM_MR_FLAGS_ONLY) kvm_update_flags_memslot(kvm, old, new, working); else BUG(); /* * No need to refresh new->arch, changes after dropping slots_arch_lock * will directly hit the final, active memsot. Architectures are * responsible for knowing that new->arch may be stale. */ kvm_commit_memory_region(kvm, old, new, change); return 0; }