Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp954119pxj; Fri, 21 May 2021 03:06:18 -0700 (PDT) X-Google-Smtp-Source: ABdhPJygIkqzoFbXGcbmtQXSJDpepCLXeCso2LsVUtPHRWF27X6ZZWsy76LeKZaQGur8Zi6XB/Ap X-Received: by 2002:a05:6402:50b:: with SMTP id m11mr10636737edv.367.1621591578146; Fri, 21 May 2021 03:06:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1621591578; cv=none; d=google.com; s=arc-20160816; b=l0WlZK/LViidTbyEPFnYB2ErjMs3DlWZUTjfLG/gsVVts0ghlyUdiUUQLqKM8LcKaj 8ty5IPXgUnvnZ3+dW6YxI0KCi2MkN8F6qEJCkoaNqhD9X89ds3WGSYU6isl5caYWZ3Mc S9o+RRjkgGOCN+9tCE5kL9bJ8UpwxYl1o3DzbKhxJuhhSaL4SIbSntMHNM0EDiVTYGAK IsL8rfLCsofyFUg5TNarjgJKFAJ6yluNAvtWeJjS/Ngp3xJoFAPnzbem+YBIMzKd3cYt pZnfPubB6R9r5UleVKLdVRBxV+F5jHFlsHEwYJ91woopfpZjxaA7NW18doyJaIvLSiG8 1yfA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:subject:from :references:cc:to; bh=kIJMSzM6mWxMXucQ2yUx4uG3K41N0DqGCThc34u//pk=; b=sqmBzQwyXkjuHjV4cx6u4c6ag6RqNhM1uT38ZhnLIAjzP+0lTgC6XKmEmby7TuEKRn VOXSG/8v3MYP95dOmSdcOP6WF9IsGu8AgnLbpYvvY0y+CR5TAIdhxZhdMSBLDi69osRD G8r3V8wQbgWjFd2FiJmKsHD8dSeqHlEkNdd8eWnjpt0ZPPInenrvqis1AjpInXg8L0vi g25H8bqQvNtcNfDyLhSo9Nx5WWY7bgu0lfOae+uWr1w1CgRizsHxjU5oGvz8iEUMOGsR ybLXurpQNzyhjz927lCQn7wirvrH33oJNHOlEUXjoiwJz++pu4XPnK4PnvfMYAD0lELg Yb3Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id a20si4253125edj.72.2021.05.21.03.05.55; Fri, 21 May 2021 03:06:18 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235015AbhEUHFB (ORCPT + 99 others); Fri, 21 May 2021 03:05:01 -0400 Received: from vps-vb.mhejs.net ([37.28.154.113]:46894 "EHLO vps-vb.mhejs.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232634AbhEUHFA (ORCPT ); Fri, 21 May 2021 03:05:00 -0400 Received: from MUA by vps-vb.mhejs.net with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1ljzBw-00053m-OA; Fri, 21 May 2021 09:03:28 +0200 To: Sean Christopherson Cc: Paolo Bonzini , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Igor Mammedov , Marc Zyngier , James Morse , Julien Thierry , Suzuki K Poulose , Huacai Chen , Aleksandar Markovic , Paul Mackerras , Christian Borntraeger , Janosch Frank , David Hildenbrand , Cornelia Huck , Claudio Imbrenda , Joerg Roedel , kvm@vger.kernel.org, linux-kernel@vger.kernel.org References: From: "Maciej S. Szmigiero" Subject: Re: [PATCH v3 1/8] KVM: x86: Cache total page count to avoid traversing the memslot array Message-ID: Date: Fri, 21 May 2021 09:03:23 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.10.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 19.05.2021 23:00, Sean Christopherson wrote: > On Sun, May 16, 2021, Maciej S. Szmigiero wrote: >> From: "Maciej S. Szmigiero" >> >> There is no point in recalculating from scratch the total number of pages >> in all memslots each time a memslot is created or deleted. >> >> Just cache the value and update it accordingly on each such operation so >> the code doesn't need to traverse the whole memslot array each time. >> >> Signed-off-by: Maciej S. Szmigiero >> --- >> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >> index 5bd550eaf683..8c7738b75393 100644 >> --- a/arch/x86/kvm/x86.c >> +++ b/arch/x86/kvm/x86.c >> @@ -11112,9 +11112,21 @@ void kvm_arch_commit_memory_region(struct kvm *kvm, >> const struct kvm_memory_slot *new, >> enum kvm_mr_change change) >> { >> - if (!kvm->arch.n_requested_mmu_pages) >> - kvm_mmu_change_mmu_pages(kvm, >> - kvm_mmu_calculate_default_mmu_pages(kvm)); >> + if (change == KVM_MR_CREATE) >> + kvm->arch.n_memslots_pages += new->npages; >> + else if (change == KVM_MR_DELETE) { >> + WARN_ON(kvm->arch.n_memslots_pages < old->npages); > > Heh, so I think this WARN can be triggered at will by userspace on 32-bit KVM by > causing the running count to wrap. KVM artificially caps the size of a single > memslot at ((1UL << 31) - 1), but userspace could create multiple gigantic slots > to overflow arch.n_memslots_pages. > > I _think_ changing it to a u64 would fix the problem since KVM forbids overlapping > memslots in the GPA space. You are right, n_memslots_pages needs to be u64 so it does not overflow on 32-bit KVM. The memslot count is limited to 32k in each of 2 address spaces, so in the worst case the variable should hold 15-bits + 1 bit + 31-bits = 47 bit number. > Also, what about moving the check-and-WARN to prepare_memory_region() so that > KVM can error out if the check fails? Doesn't really matter, but an explicit > error for userspace is preferable to underflowing the number of pages and getting > weird MMU errors/behavior down the line. In principle this seems like a possibility, however, it is a more regression-risky option, in case something has (perhaps unintentionally) relied on the fact that kvm_mmu_zap_oldest_mmu_pages() call from kvm_mmu_change_mmu_pages() was being done only in the memslot commit function. >> + kvm->arch.n_memslots_pages -= old->npages; >> + } >> + >> + if (!kvm->arch.n_requested_mmu_pages) { > > If we're going to bother caching the number of pages then we should also skip > the update when the number pages isn't changing, e.g. > > if (change == KVM_MR_CREATE || change == KVM_MR_DELETE) { > if (change == KVM_MR_CREATE) > kvm->arch.n_memslots_pages += new->npages; > else > kvm->arch.n_memslots_pages -= old->npages; > > if (!kvm->arch.n_requested_mmu_pages) { > unsigned long nr_mmu_pages; > > nr_mmu_pages = kvm->arch.n_memslots_pages * > KVM_PERMILLE_MMU_PAGES / 1000; > nr_mmu_pages = max(nr_mmu_pages, KVM_MIN_ALLOC_MMU_PAGES); > kvm_mmu_change_mmu_pages(kvm, nr_mmu_pages); > } > } The old code did it that way (unconditionally) and, as in the case above, I didn't want to risk an regression. If we are going to change this fact then I think it should happen in a separate patch. Thanks, Maciej