Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp408969pxu; Fri, 11 Dec 2020 05:24:28 -0800 (PST) X-Google-Smtp-Source: ABdhPJyz/4sB3MHjWYf2vOM2Q8wWOqyBn8b0OyHq0BiCS7yrXhRR9SuGPQ6nBPbKtomU5ktomAup X-Received: by 2002:aa7:cd44:: with SMTP id v4mr11921435edw.156.1607693068256; Fri, 11 Dec 2020 05:24:28 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1607693068; cv=none; d=google.com; s=arc-20160816; b=SdbhVYtzflIYJY3BROzoVaDO5VMuKmEN8Vu6LtmOcMaPj3RYNtQFK/NTcocakcz44V BMvA3E6Z4FMK74UhehcTNp44OlvmKhTjOEMiiX2r2xHQuvxfdTTKC7bg29Kha8jyTc0s P0ncX9ritoQZMRkxXWLsdASImD7qEjrUP8Bonoew4lzsyUpRp0qie7h4NyvN4+R+rFWo J9JAlztaoX7s0Xn0djgJPZG5kcr+UQOcyIJBM4cnWXjWaeW0XDleS24RUGrms7D7RmoL zp8pf09k/RgMCzdV/bXSgw/tjvrtFGmnb9nVaMxCINbXsNw6VgvJiMe0wSYXx+4y2mYD BS6w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from :dkim-signature:date; bh=MuADz6r4uIIxJHQVKCTcV9zGNz4JPFCk/TOGBGkXH3g=; b=A3fbN789JVegcouPQAOVEtELspAIZ2YVSe2rT4aNuXuVw5kAiW4KIr/S/I4At+2Atl DU5skmHpqPs02zDlhtccnxbBj3ZXY1Bs2yRhiEwGeENaiw+hMUCjNkBRD2SqpYFlJkPz /KmpF9i3RVeyaERiY4RFn+y3uDfSx80txqf73S3EHYkDNSrgi2xAfMimKRKnRI5uFEuM ie9JGjR8GWKTU5cd2wcUZZcju80xO5v9EkDQFh+Jwi2l0Bnm4owWLzhKt/VqtUZenq6v mFem/+OzGcaQpWVQTK9B2wB14BLaLjeHdDH+qYInkzZCprcgzJ7v/sXcTB1SeBp0zrnj pMdQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=S6841Py2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id c1si4921158edx.275.2020.12.11.05.24.05; Fri, 11 Dec 2020 05:24:28 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=S6841Py2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2404832AbgLKJyq (ORCPT + 99 others); Fri, 11 Dec 2020 04:54:46 -0500 Received: from mail.kernel.org ([198.145.29.99]:34608 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2404840AbgLKJyZ (ORCPT ); Fri, 11 Dec 2020 04:54:25 -0500 Date: Fri, 11 Dec 2020 09:53:37 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1607680424; bh=QsIVnJ8CGRl70yVewj8Ed9nYo9b0SK7ijo6uVqyomD0=; h=From:To:Cc:Subject:References:In-Reply-To:From; b=S6841Py29DfCupVk+JTMy1wYlPbw+kneOmgnZxgAehUaZQzhYedmBD300AxCeSe3h lm4Jb9lK2IX2eEhGWL7Nrf9uiZ846OTXTxtIbb/+QqL5R6BKnJwlb+renKMjJpmbfl SdLiw5v75hCqBsBxQbqbyi8R8G9SMTNaYuisHYpJsFaQvXH24Jl/4/q6WzlVWcOtsZ f76YdRRa/Y3zj643s3F3uFjZ0GBwEnUjRBdMgFJnHSGHyGQlQ3PgGPZiczArzszw6a 7hVBb3hlHPVgcBSwFeWcmxL5wKTkGa8sn6wINCyzfExJ+xhkad8DyaLoMamrD//Deu d84P2IgaFzYXg== From: Will Deacon To: Yanan Wang Cc: linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, Marc Zyngier , Catalin Marinas , James Morse , Julien Thierry , Suzuki K Poulose , Gavin Shan , Quentin Perret , wanghaibin.wang@huawei.com, yezengruan@huawei.com, zhukeqian1@huawei.com, yuzenghui@huawei.com, jiangkunkun@huawei.com, wangjingyi11@huawei.com, lushenming@huawei.com Subject: Re: [RFC PATCH] KVM: arm64: Add prejudgement for relaxing permissions only case in stage2 translation fault handler Message-ID: <20201211095337.GA11280@willie-the-truck> References: <20201211080115.21460-1-wangyanan55@huawei.com> <20201211080115.21460-2-wangyanan55@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20201211080115.21460-2-wangyanan55@huawei.com> User-Agent: Mutt/1.10.1 (2018-07-13) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Yanan, On Fri, Dec 11, 2020 at 04:01:15PM +0800, Yanan Wang wrote: > In dirty-logging, or dirty-logging-stopped time, even normal running > time of a guest configed with huge mappings and numbers of vCPUs, > translation faults by different vCPUs on the same GPA could occur > successively almost at the same time. There are two reasons for it. > > (1) If there are some vCPUs accessing the same GPA at the same time > and the leaf PTE is not set yet, then they will all cause translation > faults and the first vCPU holding mmu_lock will set valid leaf PTE, > and the others will later choose to update the leaf PTE or not. > > (2) When changing a leaf entry or a table entry with break-before-make, > if there are some vCPUs accessing the same GPA just catch the moment > when the target PTE is set invalid in a BBM procedure coincidentally, > they will all cause translation faults and will later choose to update > the leaf PTE or not. > > The worst case can be like this: some vCPUs cause translation faults > on the same GPA with different prots, they will fight each other by > changing back access permissions of the PTE with break-before-make. > And the BBM-invalid moment might trigger more unnecessary translation > faults. As a result, some useless small loops will occur, which could > lead to vCPU stuck. > > To avoid unnecessary update and small loops, add prejudgement in the > translation fault handler: Skip updating the valid leaf PTE if we are > trying to recreate exactly the same mapping or to reduce access > permissions only(such as RW-->RO). And update the valid leaf PTE without > break-before-make if we are trying to add more permissions only. > > Signed-off-by: Yanan Wang > --- > arch/arm64/kvm/hyp/pgtable.c | 73 +++++++++++++++++++++++++----------- > 1 file changed, 52 insertions(+), 21 deletions(-) Cheers for this. Given that this patch is solving a few different problems, do you think you could split it up please? That would certainly make it much easier to review, as there's quite a lot going on here. A chunk of the changes seem to be the diff I posted previously: https://lore.kernel.org/r/20201201141632.GC26973@willie-the-truck so maybe that could be its own patch? > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c > index 23a01dfcb27a..f8b3248cef1c 100644 > --- a/arch/arm64/kvm/hyp/pgtable.c > +++ b/arch/arm64/kvm/hyp/pgtable.c > @@ -45,6 +45,8 @@ > > #define KVM_PTE_LEAF_ATTR_HI_S2_XN BIT(54) > > +#define KVM_PTE_LEAF_ATTR_PERMS (GENMASK(7, 6) | BIT(54)) You only use this on the S2 path, so how about: #define KVM_PTE_LEAF_ATTR_S2_PERMS KVM_PTE_LEAF_ATTR_LO_S2_S2AP_R | \ KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W | \ KVM_PTE_LEAF_ATTR_HI_S2_XN or something like that? > struct kvm_pgtable_walk_data { > struct kvm_pgtable *pgt; > struct kvm_pgtable_walker *walker; > @@ -170,10 +172,9 @@ static void kvm_set_table_pte(kvm_pte_t *ptep, kvm_pte_t *childp) > smp_store_release(ptep, pte); > } > > -static bool kvm_set_valid_leaf_pte(kvm_pte_t *ptep, u64 pa, kvm_pte_t attr, > - u32 level) > +static kvm_pte_t kvm_init_valid_leaf_pte(u64 pa, kvm_pte_t attr, u32 level) > { > - kvm_pte_t old = *ptep, pte = kvm_phys_to_pte(pa); > + kvm_pte_t pte = kvm_phys_to_pte(pa); > u64 type = (level == KVM_PGTABLE_MAX_LEVELS - 1) ? KVM_PTE_TYPE_PAGE : > KVM_PTE_TYPE_BLOCK; > > @@ -181,12 +182,7 @@ static bool kvm_set_valid_leaf_pte(kvm_pte_t *ptep, u64 pa, kvm_pte_t attr, > pte |= FIELD_PREP(KVM_PTE_TYPE, type); > pte |= KVM_PTE_VALID; > > - /* Tolerate KVM recreating the exact same mapping. */ > - if (kvm_pte_valid(old)) > - return old == pte; > - > - smp_store_release(ptep, pte); > - return true; > + return pte; > } > > static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data, u64 addr, > @@ -341,12 +337,17 @@ static int hyp_map_set_prot_attr(enum kvm_pgtable_prot prot, > static bool hyp_map_walker_try_leaf(u64 addr, u64 end, u32 level, > kvm_pte_t *ptep, struct hyp_map_data *data) > { > + kvm_pte_t new, old = *ptep; > u64 granule = kvm_granule_size(level), phys = data->phys; > > if (!kvm_block_mapping_supported(addr, end, phys, level)) > return false; > > - WARN_ON(!kvm_set_valid_leaf_pte(ptep, phys, data->attr, level)); > + /* Tolerate KVM recreating the exact same mapping. */ > + new = kvm_init_valid_leaf_pte(phys, data->attr, level); > + if (old != new && !WARN_ON(kvm_pte_valid(old))) > + smp_store_release(ptep, new); > + > data->phys += granule; > return true; > } > @@ -461,25 +462,56 @@ static int stage2_map_set_prot_attr(enum kvm_pgtable_prot prot, > return 0; > } > > +static bool stage2_set_valid_leaf_pte_pre(u64 addr, u32 level, > + kvm_pte_t *ptep, kvm_pte_t new, > + struct stage2_map_data *data) > +{ > + kvm_pte_t old = *ptep, old_attr, new_attr; > + > + if ((old ^ new) & (~KVM_PTE_LEAF_ATTR_PERMS)) > + return false; > + > + /* > + * Skip updating if we are trying to recreate exactly the same mapping > + * or to reduce the access permissions only. And update the valid leaf > + * PTE without break-before-make if we are trying to add more access > + * permissions only. > + */ > + old_attr = (old & KVM_PTE_LEAF_ATTR_PERMS) ^ KVM_PTE_LEAF_ATTR_HI_S2_XN; > + new_attr = (new & KVM_PTE_LEAF_ATTR_PERMS) ^ KVM_PTE_LEAF_ATTR_HI_S2_XN; > + if (new_attr <= old_attr) > + return true; I think this is a significant change in behaviour for kvm_pgtable_stage2_map() and I worry that it could catch somebody out in the future. Please can you update the kerneldoc in kvm_pgtable.h with a note about this? Will