Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp2568015pxu; Mon, 14 Dec 2020 05:56:14 -0800 (PST) X-Google-Smtp-Source: ABdhPJwyTfdCB33G7w8qXzB8PdA8juWFhH3FqHegPperhWU/b8xGgiR216gKTdTXi2B1xmDHIyHu X-Received: by 2002:aa7:c698:: with SMTP id n24mr24657190edq.277.1607954174531; Mon, 14 Dec 2020 05:56:14 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1607954174; cv=none; d=google.com; s=arc-20160816; b=NdUSNw0q2sjhkXz7+RcnGFgbvwV8VBEZkgJsSFlQbi4YIlxjac3gLHE5vhDKlgpwby x4fH7J6/tg9+IIRJtzOvbreZvlb58S9II4t0tG2VmeWF8NK0UPjxI72TyaQRnWb/FpEG rItk2/G0vFKnhh0j/jz2/g3QQaNZuI3lFe7OD/40nzTtuLzz+ZNOonuf59FEcxpVNkAy KmqNvmCeWk1MubREKtNofmxXMu2Z39LMgs6hcnhAriZAMhSc1ub/LjfXYWmsc6sGcCuO 7TaT8Lg/Jw2dho7cJkbfz6OiLUXRe+OecyD/l/+J+hFe5LRq4JZEEOneniZaxrUdLNoK 6cig== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-language:content-transfer-encoding :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject; bh=f91uCs/tqllyh8EI8E9x9L66gJ6O5hPOGq+gonHpeBA=; b=icxlbZxjPqZ5CsS3lcKvyRgiI9UTr8sFgPbjFk7yP3TMQUIGkFm54CfMDosbX74PN1 hYICRkYzwhb3N6/M4nELxXCaOrit63Pz++PFEuJ6Qkvaeh85wp7vgSUpKqmHtqBL2nZa 3m9uT4eHLWg8eeoPyGHsedw1y6IL/T9Qhp9T9IYFOkvsmNw5+g3qrs/sy+Bg6oN44fhl d4/8tWydh2vyGwbTNf8Fk/xDFdhKsc9PXGxhMsbB2O4b7vLtcEHvEAxf4hVmfLDi1IPw 4MIq3bZuX5hKUfHUBHGrhmRwBWgoJB9DdKF1RQhFeE1cSMokNUUOHeoSELjUPTqe2t3g wUag== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id nx22si9819461ejb.503.2020.12.14.05.55.50; Mon, 14 Dec 2020 05:56:14 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2438942AbgLNHVr (ORCPT + 99 others); Mon, 14 Dec 2020 02:21:47 -0500 Received: from szxga03-in.huawei.com ([45.249.212.189]:2399 "EHLO szxga03-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726305AbgLNHVc (ORCPT ); Mon, 14 Dec 2020 02:21:32 -0500 Received: from dggeme708-chm.china.huawei.com (unknown [172.30.72.54]) by szxga03-in.huawei.com (SkyGuard) with ESMTP id 4CvXpX0ysjz54jT; Mon, 14 Dec 2020 15:20:04 +0800 (CST) Received: from [10.174.187.128] (10.174.187.128) by dggeme708-chm.china.huawei.com (10.1.199.104) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1913.5; Mon, 14 Dec 2020 15:20:49 +0800 Subject: Re: [RFC PATCH] KVM: arm64: Add prejudgement for relaxing permissions only case in stage2 translation fault handler To: Marc Zyngier CC: , , Catalin Marinas , Will Deacon , James Morse , Julien Thierry , Suzuki K Poulose , Gavin Shan , Quentin Perret , , , , , , , References: <20201211080115.21460-1-wangyanan55@huawei.com> <20201211080115.21460-2-wangyanan55@huawei.com> <8d006755e5afce7e49b03993316c4fcc@kernel.org> From: "wangyanan (Y)" Message-ID: <8adf8bda-58da-bddd-0522-f750a2d030c2@huawei.com> Date: Mon, 14 Dec 2020 15:20:49 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.4.0 MIME-Version: 1.0 In-Reply-To: <8d006755e5afce7e49b03993316c4fcc@kernel.org> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X-Originating-IP: [10.174.187.128] X-ClientProxiedBy: dggeme719-chm.china.huawei.com (10.1.199.115) To dggeme708-chm.china.huawei.com (10.1.199.104) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2020/12/11 17:49, Marc Zyngier wrote: > Hi Yanan, > > On 2020-12-11 08:01, Yanan Wang wrote: >> In dirty-logging, or dirty-logging-stopped time, even normal running >> time of a guest configed with huge mappings and numbers of vCPUs, >> translation faults by different vCPUs on the same GPA could occur >> successively almost at the same time. There are two reasons for it. >> >> (1) If there are some vCPUs accessing the same GPA at the same time >> and the leaf PTE is not set yet, then they will all cause translation >> faults and the first vCPU holding mmu_lock will set valid leaf PTE, >> and the others will later choose to update the leaf PTE or not. >> >> (2) When changing a leaf entry or a table entry with break-before-make, >> if there are some vCPUs accessing the same GPA just catch the moment >> when the target PTE is set invalid in a BBM procedure coincidentally, >> they will all cause translation faults and will later choose to update >> the leaf PTE or not. >> >> The worst case can be like this: some vCPUs cause translation faults >> on the same GPA with different prots, they will fight each other by >> changing back access permissions of the PTE with break-before-make. >> And the BBM-invalid moment might trigger more unnecessary translation >> faults. As a result, some useless small loops will occur, which could >> lead to vCPU stuck. >> >> To avoid unnecessary update and small loops, add prejudgement in the >> translation fault handler: Skip updating the valid leaf PTE if we are >> trying to recreate exactly the same mapping or to reduce access >> permissions only(such as RW-->RO). And update the valid leaf PTE without >> break-before-make if we are trying to add more permissions only. > > I'm a bit perplexed with this: why are you skipping the update if the > permissions need to be reduced? Even more, how can we reduce the > permissions from a vCPU fault? I can't really think of a scenario where > that happens. > > Or are you describing a case where two vcpus fault simultaneously with > conflicting permissions: > > - Both vcpus fault on translation fault > - vcpu A wants W access > - vpcu B wants R access > > and 'A' gets in first, set the permissions to RW (because R is > implicitly added to W), followed by 'B' which downgrades it to RO? > > If that's what you are describing, then I agree we could do better. Yes, this is exactly what I want to describe. > >> >> Signed-off-by: Yanan Wang >> --- >>  arch/arm64/kvm/hyp/pgtable.c | 73 +++++++++++++++++++++++++----------- >>  1 file changed, 52 insertions(+), 21 deletions(-) >> >> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c >> index 23a01dfcb27a..f8b3248cef1c 100644 >> --- a/arch/arm64/kvm/hyp/pgtable.c >> +++ b/arch/arm64/kvm/hyp/pgtable.c >> @@ -45,6 +45,8 @@ >> >>  #define KVM_PTE_LEAF_ATTR_HI_S2_XN    BIT(54) >> >> +#define KVM_PTE_LEAF_ATTR_PERMS    (GENMASK(7, 6) | BIT(54)) >> + >>  struct kvm_pgtable_walk_data { >>      struct kvm_pgtable        *pgt; >>      struct kvm_pgtable_walker    *walker; >> @@ -170,10 +172,9 @@ static void kvm_set_table_pte(kvm_pte_t *ptep, >> kvm_pte_t *childp) >>      smp_store_release(ptep, pte); >>  } >> >> -static bool kvm_set_valid_leaf_pte(kvm_pte_t *ptep, u64 pa, >> kvm_pte_t attr, >> -                   u32 level) >> +static kvm_pte_t kvm_init_valid_leaf_pte(u64 pa, kvm_pte_t attr, u32 >> level) >>  { >> -    kvm_pte_t old = *ptep, pte = kvm_phys_to_pte(pa); >> +    kvm_pte_t pte = kvm_phys_to_pte(pa); >>      u64 type = (level == KVM_PGTABLE_MAX_LEVELS - 1) ? >> KVM_PTE_TYPE_PAGE : >>                                 KVM_PTE_TYPE_BLOCK; >> >> @@ -181,12 +182,7 @@ static bool kvm_set_valid_leaf_pte(kvm_pte_t >> *ptep, u64 pa, kvm_pte_t attr, >>      pte |= FIELD_PREP(KVM_PTE_TYPE, type); >>      pte |= KVM_PTE_VALID; >> >> -    /* Tolerate KVM recreating the exact same mapping. */ >> -    if (kvm_pte_valid(old)) >> -        return old == pte; >> - >> -    smp_store_release(ptep, pte); >> -    return true; >> +    return pte; >>  } >> >>  static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data >> *data, u64 addr, >> @@ -341,12 +337,17 @@ static int hyp_map_set_prot_attr(enum >> kvm_pgtable_prot prot, >>  static bool hyp_map_walker_try_leaf(u64 addr, u64 end, u32 level, >>                      kvm_pte_t *ptep, struct hyp_map_data *data) >>  { >> +    kvm_pte_t new, old = *ptep; >>      u64 granule = kvm_granule_size(level), phys = data->phys; >> >>      if (!kvm_block_mapping_supported(addr, end, phys, level)) >>          return false; >> >> -    WARN_ON(!kvm_set_valid_leaf_pte(ptep, phys, data->attr, level)); >> +    /* Tolerate KVM recreating the exact same mapping. */ >> +    new = kvm_init_valid_leaf_pte(phys, data->attr, level); >> +    if (old != new && !WARN_ON(kvm_pte_valid(old))) >> +        smp_store_release(ptep, new); >> + >>      data->phys += granule; >>      return true; >>  } >> @@ -461,25 +462,56 @@ static int stage2_map_set_prot_attr(enum >> kvm_pgtable_prot prot, >>      return 0; >>  } >> >> +static bool stage2_set_valid_leaf_pte_pre(u64 addr, u32 level, >> +                      kvm_pte_t *ptep, kvm_pte_t new, >> +                      struct stage2_map_data *data) >> +{ >> +    kvm_pte_t old = *ptep, old_attr, new_attr; >> + >> +    if ((old ^ new) & (~KVM_PTE_LEAF_ATTR_PERMS)) >> +        return false; >> + >> +    /* >> +     * Skip updating if we are trying to recreate exactly the same >> mapping >> +     * or to reduce the access permissions only. And update the >> valid leaf >> +     * PTE without break-before-make if we are trying to add more >> access >> +     * permissions only. >> +     */ >> +    old_attr = (old & KVM_PTE_LEAF_ATTR_PERMS) ^ >> KVM_PTE_LEAF_ATTR_HI_S2_XN; >> +    new_attr = (new & KVM_PTE_LEAF_ATTR_PERMS) ^ >> KVM_PTE_LEAF_ATTR_HI_S2_XN; >> +    if (new_attr <= old_attr) >> +        return true; >> + >> +    WRITE_ONCE(*ptep, new); >> +    kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level); > > I think what bothers me the most here is that we are turning a mapping > into > a permission update, which makes the code really hard to read, and mixes > two things that were so far separate. > > I wonder whether we should instead abort the update and simply take > the fault > again, if we ever need to do it. > > Thanks, > >         M.