Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp64918pxu; Tue, 1 Dec 2020 06:16:35 -0800 (PST) X-Google-Smtp-Source: ABdhPJziVBGb4yCTb4E9e0VjN+ZCG54f+lX1DV+sN14anIKeDAPsbmBwDoYSf5MS/IoHUjQA6c99 X-Received: by 2002:a17:906:56ca:: with SMTP id an10mr3221166ejc.498.1606832195228; Tue, 01 Dec 2020 06:16:35 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1606832195; cv=none; d=google.com; s=arc-20160816; b=XMx/Eye9r0Q1yH5b+w2LO7JzGAHiVcUn697fod6E8GG2Sx9NkdrH3+pnhT+UcrGTgZ zGi/NdfQACpAZwRHta05kSNNHAarJ+sHNRZfIQSHH5+u6s9C5Wj7sdrePj/aPzWgz8zu 0rVtSZZHV6wRZJ+iKSMW+iGRM5YVlO0Lk1M8pY8Gv3PIDBU7/hgkPtFzTKVMVU4WUfKJ sWmPxLSU2kCO6WNkLZNGDKSmZhXTeH3lUVOB8jcQdvPTWQnUKwC6qlhFQ831vfFYFpOG kspKAPFO2PcY8GoDEz1l/ogYCzPpzXK8cWxRa7Eiy2Xq+JTxsSmR1m06Ri1FW2tkM46A 4MoQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-language:content-transfer-encoding :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject; bh=xC0vjTR28G5JRp9gxI1CG3axWt6NhlnbytIIfdxlqs8=; b=tK3FpongkC8t1zKEWc+YWLceHM9DZoUWloZQSeXS7rqaDqdPO6xwXRNzHL9Qm7txI4 yHBgCXyPtNaPkvKrBxmxQgjGBJmdBTRKshQysenhSfdKUd1HugoUHI7BHNuQLkB6chP+ prTcw4tT4otD3OZLgNYAmhJcolwxUy9mC1LTva1PdmPMstwm/HxHYASXbGBRK5EmF7Zx ipS07wHlvjY6I7l74LeW5q6r1FABkob29lDfi8NkWpbFE+bDnSFeOxtmaAvQMbjpZqAx C9+477J598wMeODRjdiqKIp1bcp8a/EKjQuhXwxA6JReIF9PIgAOTYWNdgLZusiqmn/3 hsDQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id by22si990800ejb.309.2020.12.01.06.16.11; Tue, 01 Dec 2020 06:16:35 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730169AbgLAOMb (ORCPT + 99 others); Tue, 1 Dec 2020 09:12:31 -0500 Received: from szxga01-in.huawei.com ([45.249.212.187]:2079 "EHLO szxga01-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729321AbgLAOMa (ORCPT ); Tue, 1 Dec 2020 09:12:30 -0500 Received: from dggeme717-chm.china.huawei.com (unknown [172.30.72.55]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4ClkXk3DWkzVkDQ; Tue, 1 Dec 2020 22:11:02 +0800 (CST) Received: from [10.174.186.123] (10.174.186.123) by dggeme717-chm.china.huawei.com (10.1.199.113) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1913.5; Tue, 1 Dec 2020 22:11:45 +0800 Subject: Re: [RFC PATCH 2/3] KVM: arm64: Fix handling of merging tables into a block entry To: Will Deacon CC: , , Marc Zyngier , Catalin Marinas , James Morse , Julien Thierry , Suzuki K Poulose , Gavin Shan , Quentin Perret , , , , , , , References: <20201130121847.91808-1-wangyanan55@huawei.com> <20201130121847.91808-3-wangyanan55@huawei.com> <20201130133421.GB24837@willie-the-truck> <67e9e393-1836-eca7-4235-6f4a19fed652@huawei.com> <20201130160119.GA25051@willie-the-truck> <868a4403-10d3-80f3-4ae1-a490813c55e2@huawei.com> <20201201134606.GB26973@willie-the-truck> From: "wangyanan (Y)" Message-ID: <2e92a511-496c-d446-95f4-6211ec8b4bb6@huawei.com> Date: Tue, 1 Dec 2020 22:11:37 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.4.0 MIME-Version: 1.0 In-Reply-To: <20201201134606.GB26973@willie-the-truck> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X-Originating-IP: [10.174.186.123] X-ClientProxiedBy: dggeme719-chm.china.huawei.com (10.1.199.115) To dggeme717-chm.china.huawei.com (10.1.199.113) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2020/12/1 21:46, Will Deacon wrote: > On Tue, Dec 01, 2020 at 10:30:41AM +0800, wangyanan (Y) wrote: >> On 2020/12/1 0:01, Will Deacon wrote: >>> On Mon, Nov 30, 2020 at 11:24:19PM +0800, wangyanan (Y) wrote: >>>> On 2020/11/30 21:34, Will Deacon wrote: >>>>> On Mon, Nov 30, 2020 at 08:18:46PM +0800, Yanan Wang wrote: >>>>>> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c >>>>>> index 696b6aa83faf..fec8dc9f2baa 100644 >>>>>> --- a/arch/arm64/kvm/hyp/pgtable.c >>>>>> +++ b/arch/arm64/kvm/hyp/pgtable.c >>>>>> @@ -500,6 +500,9 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level, >>>>>> return 0; >>>>>> } >>>>>> +static void stage2_flush_dcache(void *addr, u64 size); >>>>>> +static bool stage2_pte_cacheable(kvm_pte_t pte); >>>>>> + >>>>>> static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, >>>>>> struct stage2_map_data *data) >>>>>> { >>>>>> @@ -507,9 +510,17 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, >>>>>> struct page *page = virt_to_page(ptep); >>>>>> if (data->anchor) { >>>>>> - if (kvm_pte_valid(pte)) >>>>>> + if (kvm_pte_valid(pte)) { >>>>>> + kvm_set_invalid_pte(ptep); >>>>>> + kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, >>>>>> + addr, level); >>>>>> put_page(page); >>>>> This doesn't make sense to me: the page-table pages we're walking when the >>>>> anchor is set are not accessible to the hardware walker because we unhooked >>>>> the entire sub-table in stage2_map_walk_table_pre(), which has the necessary >>>>> TLB invalidation. >>>>> >>>>> Are you seeing a problem in practice here? >>>> Yes, I indeed find a problem in practice. >>>> >>>> When the migration was cancelled, a TLB conflic abort  was found in guest. >>>> >>>> This problem is fixed before rework of the page table code, you can have a >>>> look in the following two links: >>>> >>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3c3736cd32bf5197aed1410ae826d2d254a5b277 >>>> >>>> https://lists.cs.columbia.edu/pipermail/kvmarm/2019-March/035031.html >>> Ok, let's go through this, because I still don't see the bug. Please correct >>> me if you spot any mistakes: >>> >>> 1. We have a block mapping for X => Y >>> 2. Dirty logging is enabled, so the block mapping is write-protected and >>> ends up being split into page mappings >>> 3. Dirty logging is disabled due to a failed migration. >>> >>> --- At this point, I think we agree that the state of the MMU is alright --- >>> >>> 4. We take a stage-2 fault and want to reinstall the block mapping: >>> >>> a. kvm_pgtable_stage2_map() is invoked to install the block mapping >>> b. stage2_map_walk_table_pre() finds a table where we would like to >>> install the block: >>> >>> i. The anchor is set to point at this entry >>> ii. The entry is made invalid >>> iii. We invalidate the TLB for the input address. This is >>> TLBI IPAS2SE1IS without level hint and then TLBI VMALLE1IS. >>> >>> *** At this point, the page-table pointed to by the old table entry >>> is not reachable to the hardware walker *** >>> >>> c. stage2_map_walk_leaf() is called for each leaf entry in the >>> now-unreachable subtree, dropping page-references for each valid >>> entry it finds. >>> d. stage2_map_walk_table_post() is eventually called for the entry >>> which we cleared back in b.ii, so we install the new block mapping. >>> >>> You are proposing to add additional TLB invalidation to (c), but I don't >>> think that is necessary, thanks to the invalidation already performed in >>> b.iii. What am I missing here? >> The point is at b.iii where the TLBI is not enough. There are many page >> mappings that we need to merge into a block mapping. >> >> We invalidate the TLB for the input address without level hint at b.iii, but >> this operation just flush TLB for one page mapping, there >> >> are still some TLB entries for the other page mappings in the cache, the MMU >> hardware walker can still hit these entries next time. > Ah, yes, I see. Thanks. I hadn't considered the case where there are table > entries beneath the anchor. So how about the diff below? > > Will > > --->8 Hi, I think it's inappropriate to put the TLBI of all the leaf entries in function stage2_map_walk_table_post(), because the *ptep must be an upper table entry when we enter stage2_map_walk_table_post(). We should make the TLBI for every leaf entry not table entry in the last lookup level,  just as I am proposing to add the additional TLBI in function stage2_map_walk_leaf(). Thanks. Yanan > > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c > index 0271b4a3b9fe..12526d8c7ae4 100644 > --- a/arch/arm64/kvm/hyp/pgtable.c > +++ b/arch/arm64/kvm/hyp/pgtable.c > @@ -493,7 +493,7 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level, > return 0; > > kvm_set_invalid_pte(ptep); > - kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, 0); > + /* TLB invalidation is deferred until the _post handler */ > data->anchor = ptep; > return 0; > } > @@ -547,11 +547,21 @@ static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level, > struct stage2_map_data *data) > { > int ret = 0; > + kvm_pte_t pte = *ptep; > > if (!data->anchor) > return 0; > > - free_page((unsigned long)kvm_pte_follow(*ptep)); > + kvm_set_invalid_pte(ptep); > + > + /* > + * Invalidate the whole stage-2, as we may have numerous leaf > + * entries below us which would otherwise need invalidating > + * individually. > + */ > + kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu); > + > + free_page((unsigned long)kvm_pte_follow(pte)); > put_page(virt_to_page(ptep)); > > if (data->anchor == ptep) { > .