Received: by 2002:a89:48b:0:b0:1f5:f2ab:c469 with SMTP id a11csp332312lqd; Wed, 24 Apr 2024 03:59:51 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCXURQzc3DGuyeouqnelDiiahpMAgiV1toBbj1aiDKrjHCIIzmwgUEKcOWIJBHF9za6V9qYeZMTX5+zSyD6lpbkA7axegAZphC1mNUbnxg== X-Google-Smtp-Source: AGHT+IEkEw2lcx88R9Sme/FomiOMiGr73Raub6mpZhHe+bE2kf78yIpIoMIME9UdgGoEWrO+UE3o X-Received: by 2002:ac2:5106:0:b0:513:5af1:9d7b with SMTP id q6-20020ac25106000000b005135af19d7bmr1414542lfb.47.1713956391081; Wed, 24 Apr 2024 03:59:51 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1713956391; cv=pass; d=google.com; s=arc-20160816; b=w71U0vTZ76JWGXtcw5VfABAdzaoF3H2Iby9K2okIfH/1Acpjoq4+R6MzAK2qF+kNJU NI6xGvTOFqdTaAcTUImzJl5FqAC12TmHSuhZAA4tgIZMUXx7Papak5lMiW2okWxkPAWO 34jQr8NWf+xUlqsI2/wl3CG2szbsgMp57YbNd5BlsGpQFkGU2/g8poGfPjbk0WNr40Ma kgwz5WUOI4gzZgfvmzU3Z4FIORIjN9sDI/SrBFd8M2ctJHH7di2tNRHZDCbuzFy8VP+P LRUBt2zpjN6nm1y6niFXWN4BSuj13SGm9yGTRHcJs1FHqmnTMLZFyrq/o4pVPs5w5XJs 0RyA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:in-reply-to:content-language:from :references:cc:to:subject:user-agent:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:date:message-id; bh=XvKnmz6pUFUHTI11PIgk7j55g+j7AOFoCnFP3mtzq5I=; fh=8akuJXVy1dlAnXUBkde1HaiIpT2OA048zzPHGl8mqpU=; b=iN+mJXS4IkQAdQCzwqyF9WB42WYvIrWLmBacuuTW3fMdCd6ZZQ/GtuKU2W3byml7ag z88aXXXOEtObonELPQ4TbVKTWhAz3JH8qEQK/pj5nEJ7XHYjzu18wbYnH/eXkasGQQ3u YptMRRyO0526DAtCW7Ho2vA/pR7+yFB4oCVlbsrqWRIgqvzK9QBJDZq6000KR7DeN+7I wS4Uaxe45bUrafEM8vgCAqqSQbN1CJP+QRMYWEscRE2n3eqqVtHsq6KL22NLPt9mzSBd /FlXQjMx1z6ukaOlXxWnmt+M7WUPVNiiUsJFlmk2L41bhp76SA21FdlyFUsg0RqPY9Y+ xlow==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=arm.com dmarc=pass fromdomain=arm.com); spf=pass (google.com: domain of linux-kernel+bounces-156767-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-156767-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [2604:1380:4601:e00::3]) by mx.google.com with ESMTPS id bt8-20020a170906b14800b00a51cf7066cesi8472833ejb.482.2024.04.24.03.59.50 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 24 Apr 2024 03:59:51 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-156767-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=arm.com dmarc=pass fromdomain=arm.com); spf=pass (google.com: domain of linux-kernel+bounces-156767-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-156767-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 96FF31F230A6 for ; Wed, 24 Apr 2024 10:59:50 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 9AC5315A483; Wed, 24 Apr 2024 10:59:38 +0000 (UTC) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 4CF7E15959D; Wed, 24 Apr 2024 10:59:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713956377; cv=none; b=FkXIZOsPeqrTVD6fkL1vDvGb/zQikUZVqPxZoAhRvA5w5yed1rUv8kiagNtyCSZybgq9KDVBdAvzbNKpuTrfP89WMirQp4NYfNu3v1EqQ4SISNDTpdNJs/EjMrGCmuppr5gBsCsb9dFhSdB48wD1ZXu38b9Pb2VlsLGmI5rHjRc= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713956377; c=relaxed/simple; bh=bTl3loIXLN2H+/fOqmBgK3pFEanAnlrk0H6llGWSkJw=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=AokOZEgGnXA1H0hNClXs8DkrByR09k+rnCo21o8ZHEPHwi9bVVrgq8WYhF6vcrsT5IbO3doC2XRunRoN541jJRDJUKtXJ00kn8A7aK5FThQIpoZlE2RtPFxzzIuLf5PDzJQnzoUGhoXFykQrzMPLO2i1Oxb3SJ8elbate8S7+Q8= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 534CD2F; Wed, 24 Apr 2024 04:00:02 -0700 (PDT) Received: from [10.57.57.30] (unknown [10.57.57.30]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 8A89A3F73F; Wed, 24 Apr 2024 03:59:31 -0700 (PDT) Message-ID: <40413863-3158-460b-ad7e-e5afdb45b701@arm.com> Date: Wed, 24 Apr 2024 11:59:34 +0100 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2 13/43] arm64: RME: RTT handling To: Suzuki K Poulose , kvm@vger.kernel.org, kvmarm@lists.linux.dev Cc: Catalin Marinas , Marc Zyngier , Will Deacon , James Morse , Oliver Upton , Zenghui Yu , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, Joey Gouly , Alexandru Elisei , Christoffer Dall , Fuad Tabba , linux-coco@lists.linux.dev, Ganapatrao Kulkarni References: <20240412084056.1733704-1-steven.price@arm.com> <20240412084309.1733783-1-steven.price@arm.com> <20240412084309.1733783-14-steven.price@arm.com> From: Steven Price Content-Language: en-GB In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit On 17/04/2024 14:37, Suzuki K Poulose wrote: > Hi Steven > > minort nit, Subject: arm64: RME: RTT tear down > > This patch is all about tearing the RTTs, so may be the subject could > be adjusted accordingly. Good point, this patch has evolved and is now all about tearing down. > On 12/04/2024 09:42, Steven Price wrote: >> The RMM owns the stage 2 page tables for a realm, and KVM must request >> that the RMM creates/destroys entries as necessary. The physical pages >> to store the page tables are delegated to the realm as required, and can >> be undelegated when no longer used. >> >> Creating new RTTs is the easy part, tearing down is a little more >> tricky. The result of realm_rtt_destroy() can be used to effectively >> walk the tree and destroy the entries (undelegating pages that were >> given to the realm). > > The patch looks functionally correct to me. Some minor style related > comments below. > >> Signed-off-by: Steven Price > >> --- >>   arch/arm64/include/asm/kvm_rme.h |  19 ++++ >>   arch/arm64/kvm/mmu.c             |   6 +- >>   arch/arm64/kvm/rme.c             | 171 +++++++++++++++++++++++++++++++ >>   3 files changed, 193 insertions(+), 3 deletions(-) >> >> diff --git a/arch/arm64/include/asm/kvm_rme.h >> b/arch/arm64/include/asm/kvm_rme.h >> index fba85e9ce3ae..4ab5cb5e91b3 100644 >> --- a/arch/arm64/include/asm/kvm_rme.h >> +++ b/arch/arm64/include/asm/kvm_rme.h >> @@ -76,5 +76,24 @@ u32 kvm_realm_ipa_limit(void); >>   int kvm_realm_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap); >>   int kvm_init_realm_vm(struct kvm *kvm); >>   void kvm_destroy_realm(struct kvm *kvm); >> +void kvm_realm_destroy_rtts(struct kvm *kvm, u32 ia_bits); >> + >> +#define RME_RTT_BLOCK_LEVEL    2 >> +#define RME_RTT_MAX_LEVEL    3 >> + >> +#define RME_PAGE_SHIFT        12 >> +#define RME_PAGE_SIZE        BIT(RME_PAGE_SHIFT) >> +/* See ARM64_HW_PGTABLE_LEVEL_SHIFT() */ >> +#define RME_RTT_LEVEL_SHIFT(l)    \ >> +    ((RME_PAGE_SHIFT - 3) * (4 - (l)) + 3) >> +#define RME_L2_BLOCK_SIZE    BIT(RME_RTT_LEVEL_SHIFT(2)) >> + >> +static inline unsigned long rme_rtt_level_mapsize(int level) >> +{ >> +    if (WARN_ON(level > RME_RTT_MAX_LEVEL)) >> +        return RME_PAGE_SIZE; >> + >> +    return (1UL << RME_RTT_LEVEL_SHIFT(level)); >> +} >> > > super minor nit: We only support 4K for now, so may be could reuse > the ARM64 generic macro helpers. I am fine either way. Given we'll likely want to support host granules other than 4k in the future I'd like to avoid using the generic ones. It's also a clear signal that the code is referring to the RTTs rather than the host's page tables. > >>   #endif >> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c >> index af4564f3add5..46f0c4e80ace 100644 >> --- a/arch/arm64/kvm/mmu.c >> +++ b/arch/arm64/kvm/mmu.c >> @@ -1012,17 +1012,17 @@ void stage2_unmap_vm(struct kvm *kvm) >>   void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu) >>   { >>       struct kvm *kvm = kvm_s2_mmu_to_kvm(mmu); >> -    struct kvm_pgtable *pgt = NULL; >> +    struct kvm_pgtable *pgt; >>         write_lock(&kvm->mmu_lock); >> +    pgt = mmu->pgt; >>       if (kvm_is_realm(kvm) && >>           (kvm_realm_state(kvm) != REALM_STATE_DEAD && >>            kvm_realm_state(kvm) != REALM_STATE_NONE)) { >> -        /* TODO: teardown rtts */ >>           write_unlock(&kvm->mmu_lock); >> +        kvm_realm_destroy_rtts(kvm, pgt->ia_bits); >>           return; >>       } >> -    pgt = mmu->pgt; >>       if (pgt) { >>           mmu->pgd_phys = 0; >>           mmu->pgt = NULL; >> diff --git a/arch/arm64/kvm/rme.c b/arch/arm64/kvm/rme.c >> index 9652ec6ab2fd..09b59bcad8b6 100644 >> --- a/arch/arm64/kvm/rme.c >> +++ b/arch/arm64/kvm/rme.c >> @@ -47,6 +47,53 @@ static int rmi_check_version(void) >>       return 0; >>   } >>   +static phys_addr_t __alloc_delegated_page(struct realm *realm, >> +                      struct kvm_mmu_memory_cache *mc, >> +                      gfp_t flags) > > minor nit: Do we need "__" here ? The counter part is plain > free_delegated_page without the "__". We could drop the prefix > > Or we could split the function as: > > alloc_delegated_page() > { >   if (spare_page_available) >     return spare_page; >   return __alloc_delegated_page(); /* Alloc and delegate a page */ > } I'm not really sure I follow. The reason for the 'wrapper' function alloc_delegated_page() is because most call sites don't care about the GFP flags (defaults to GFP_KERNEL), but for ensure_spare_page() we need to pass GFP_ATOMIC. Admittedly there are only 3 call sites in total and the wrapper isn't added yet. I'll tidy this up by simply adding the GFP_KERNEL flag onto the two call sites. > >> +{ >> +    phys_addr_t phys = PHYS_ADDR_MAX; >> +    void *virt; >> + >> +    if (realm->spare_page != PHYS_ADDR_MAX) { >> +        swap(realm->spare_page, phys); >> +        goto out; >> +    } >> + >> +    if (mc) >> +        virt = kvm_mmu_memory_cache_alloc(mc); >> +    else >> +        virt = (void *)__get_free_page(flags); >> + >> +    if (!virt) >> +        goto out; >> + >> +    phys = virt_to_phys(virt); >> + >> +    if (rmi_granule_delegate(phys)) { >> +        free_page((unsigned long)virt); >> + >> +        phys = PHYS_ADDR_MAX; >> +    } >> + >> +out: >> +    return phys; >> +} >> + >> +static void free_delegated_page(struct realm *realm, phys_addr_t phys) >> +{ >> +    if (realm->spare_page == PHYS_ADDR_MAX) { >> +        realm->spare_page = phys; >> +        return; >> +    } >> + >> +    if (WARN_ON(rmi_granule_undelegate(phys))) { >> +        /* Undelegate failed: leak the page */ >> +        return; >> +    } >> + >> +    free_page((unsigned long)phys_to_virt(phys)); >> +} >> + >>   u32 kvm_realm_ipa_limit(void) >>   { >>       return u64_get_bits(rmm_feat_reg0, RMI_FEATURE_REGISTER_0_S2SZ); >> @@ -124,6 +171,130 @@ static int realm_create_rd(struct kvm *kvm) >>       return r; >>   } >>   +static int realm_rtt_destroy(struct realm *realm, unsigned long addr, >> +                 int level, phys_addr_t *rtt_granule, >> +                 unsigned long *next_addr) >> +{ >> +    unsigned long out_rtt; >> +    unsigned long out_top; >> +    int ret; >> + >> +    ret = rmi_rtt_destroy(virt_to_phys(realm->rd), addr, level, >> +                  &out_rtt, &out_top); >> + >> +    if (rtt_granule) >> +        *rtt_granule = out_rtt; >> +    if (next_addr) >> +        *next_addr = out_top; > > minor nit: As mentioned in the previous patch, we could move this check > to the rmi_rtt_destroy(). Done, I've also dropped the check for rtt_granule - it's a bug to be passing that as NULL. >> + >> +    return ret; >> +} >> + >> +static int realm_tear_down_rtt_level(struct realm *realm, int level, >> +                     unsigned long start, unsigned long end) >> +{ >> +    ssize_t map_size; >> +    unsigned long addr, next_addr; >> + >> +    if (WARN_ON(level > RME_RTT_MAX_LEVEL)) >> +        return -EINVAL; >> + >> +    map_size = rme_rtt_level_mapsize(level - 1); >> + >> +    for (addr = start; addr < end; addr = next_addr) { >> +        phys_addr_t rtt_granule; >> +        int ret; >> +        unsigned long align_addr = ALIGN(addr, map_size); >> + >> +        next_addr = ALIGN(addr + 1, map_size); >> + >> +        if (next_addr <= end && align_addr == addr) { >> +            ret = realm_rtt_destroy(realm, addr, level, >> +                        &rtt_granule, &next_addr); >> +        } else { >> +            /* Recurse a level deeper */ >> +            ret = realm_tear_down_rtt_level(realm, >> +                            level + 1, >> +                            addr, >> +                            min(next_addr, end)); >> +            if (ret) >> +                return ret; >> +            continue; >> +        } > > I think it would be better readable if we did something like : > >         /* >          * The target range is smaller than what this level >          * covers. Go deeper. >          */ >         if (next_addr > end || align_addr != addr) { >             ret = realm_tear_down_rtt_level(realm, >                             level + 1, addr, >                             min(next_addr, end)); >             if (ret) >                 return ret; >             continue; >         } > >         ret = realm_rtt_destroy(realm, addr, level, >                     &rtt_granule, &next_addr); Yes that seems clearly, thanks for the suggestion. >> + >> +        switch (RMI_RETURN_STATUS(ret)) { >> +        case RMI_SUCCESS: >> +            if (!WARN_ON(rmi_granule_undelegate(rtt_granule))) >> +                free_page((unsigned long)phys_to_virt(rtt_granule)); >> +            break; >> +        case RMI_ERROR_RTT: >> +            if (next_addr > addr) { >> +                /* unassigned or destroyed */ > > minor nit: >                 /* RTT doesn't exist, skip */ Indeed, this comment is out of date - the spec now calls this condition "Missing RTT" so I'll use that wording. > >> +                break; >> +            } > >> +            if (WARN_ON(RMI_RETURN_INDEX(ret) != level)) >> +                return -EBUSY; > > In practise, we only call this for the full IPA range and we wouldn't go > deeper, if the top level entry was missing. So, there is no reason why > the RMM didn't walk to the requested level. May be we could add a > comment here : >             /* >              * We tear down the RTT range for the full IPA >              * space, after everything is unmapped. Also we >              * descend down only if we cannot tear down a >              * top level RTT. Thus RMM must be able to walk >              * to the requested level. e.g., a block mapping >              * exists at L1 or L2. >              */ Sure, will add. >> +            if (WARN_ON(level == RME_RTT_MAX_LEVEL)) { >> +                // Live entry >> +                return -EBUSY; > > > The first part of the comment above applies to this. So may be it is > good to have it. > > >> +            } > >> +            /* Recurse a level deeper */ > > minor nit: >             /* >              * The table has active entries in it, recurse >              * deeper and tear down the RTTs. >              */ Sure >> +            next_addr = ALIGN(addr + 1, map_size); >> +            ret = realm_tear_down_rtt_level(realm, >> +                            level + 1, >> +                            addr, >> +                            next_addr); >> +            if (ret) >> +                return ret; >> +            /* Try again at this level */ > >             /* >              * Now that the children RTTs are destroyed, >              * retry at this level. >              */ Sure >> +            next_addr = addr; >> +            break; >> +        default: >> +            WARN_ON(1); >> +            return -ENXIO; >> +        } >> +    } >> + >> +    return 0; >> +} >> + >> +static int realm_tear_down_rtt_range(struct realm *realm, >> +                     unsigned long start, unsigned long end) >> +{ >> +    return realm_tear_down_rtt_level(realm, get_start_level(realm) + 1, >> +                     start, end); >> +} >> + >> +static void ensure_spare_page(struct realm *realm) >> +{ >> +    phys_addr_t tmp_rtt; >> + >> +    /* >> +     * Make sure we have a spare delegated page for tearing down the >> +     * block mappings. We do this by allocating then freeing a page. >> +     * We must use Atomic allocations as we are called with >> kvm->mmu_lock >> +     * held. >> +     */ >> +    tmp_rtt = __alloc_delegated_page(realm, NULL, GFP_ATOMIC); >> + >> +    /* >> +     * If the allocation failed, continue as we may not have a block >> level >> +     * mapping so it may not be fatal, otherwise free it to assign it >> +     * to the spare page. >> +     */ >> +    if (tmp_rtt != PHYS_ADDR_MAX) >> +        free_delegated_page(realm, tmp_rtt); >> +} >> + >> +void kvm_realm_destroy_rtts(struct kvm *kvm, u32 ia_bits) >> +{ >> +    struct realm *realm = &kvm->arch.realm; >> + >> +    ensure_spare_page(realm); >> + >> +    WARN_ON(realm_tear_down_rtt_range(realm, 0, (1UL << ia_bits))); >> +} > > minor nit: We don't seem to be using the "spare_page" yet in this patch. > May be a good idea to move all the related changes > (alloc_delegated_page() / free_delegated_page, ensure_spare_page() etc) > to the patch where it may be better suited ? Good point - I think the calls get added in "arm64: RME: Allow VMM to set RIPAS". I'll try to move them there. Thanks, Steve > Suzuki > >> + >>   /* Protects access to rme_vmid_bitmap */ >>   static DEFINE_SPINLOCK(rme_vmid_lock); >>   static unsigned long *rme_vmid_bitmap; >