Received: by 2002:ac0:a874:0:0:0:0:0 with SMTP id c49csp516113ima; Fri, 15 Mar 2019 07:57:49 -0700 (PDT) X-Google-Smtp-Source: APXvYqzT1y8+wjNByzjlnXZSQsinwQiTxHkpfkfEGbcnPYwkZhY/nkHZHMgAXTaV9zaNuI0yFq8S X-Received: by 2002:aa7:838c:: with SMTP id u12mr4419296pfm.189.1552661869828; Fri, 15 Mar 2019 07:57:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1552661869; cv=none; d=google.com; s=arc-20160816; b=rn5M/Shr4psbCo3knvkylwF/rrsYMmIZ3e5ekTDtR6bMGr/LPqg8W8SWX9vUCw4TW1 yeLtYFVMccuqcvKwE89hfPMPAJ50lSUE2Jct6pU8LD7N/0AL7Tj9vtOxwarWQ3A/k2jv NduvJ+2EjSUUkEKVc7fKWbt9lmxC/2UfPR2rHbdOl8qqyLXKzTz2QvHCZudENWrxcAnC HeY87Z3q6TmIKT2igV794o8KUNXZBfEn5FxQlAu2p8WdKTG8SZuMXERLAS0AK14zJ2O9 4e1pEF4DBOHi3VNBTcp86mXg1S2v5YbR2B/hnwzhwAhv/n7xZUgMTLvgQtHIHYZLHXwr M6RQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=NKbq5QeijfiVrOf/nzuA4B2o8pyTEKjrcoFkU/SAYuw=; b=f188/syqC0wFVutVzQjpbg3i/z9XRrMt+hold/8ZpUZwclInEzGS1EA3Vk/EbqUCD+ FKTNk7BqOgdJfli5p8kh8FUmczkXo0lz3x2wRV3Zq5tMxLo3Kydmcdt/VVTOvnviUoTA dUFo782KNfFM/gRW2JLfbgsAzwntsgxTDF70g+o69Cmcqo61hZuIhi1lj3Ae/obdLmxd JYLg0ka1LD1dxGWE7nFop3QhYomkJdYKFUWVWPZ5DHczA7guNidMg2y6lUfQsN5s4inG x5dHw0fpBlbghlZd2TPnaY/KsV1k3K1g5Y9eZwXSZ3QyxpK7zhakFstFHxcH8fLltn9q 8Arg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id s2si2001506pgh.19.2019.03.15.07.57.34; Fri, 15 Mar 2019 07:57:49 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727803AbfCOO4v (ORCPT + 99 others); Fri, 15 Mar 2019 10:56:51 -0400 Received: from foss.arm.com ([217.140.101.70]:60550 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726655AbfCOO4v (ORCPT ); Fri, 15 Mar 2019 10:56:51 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D52B9A78; Fri, 15 Mar 2019 07:56:50 -0700 (PDT) Received: from [10.1.196.93] (en101.cambridge.arm.com [10.1.196.93]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 701FE3F614; Fri, 15 Mar 2019 07:56:48 -0700 (PDT) Subject: Re: [RFC] Question about TLB flush while set Stage-2 huge pages To: zhengxiang9@huawei.com, yuzenghui@huawei.com Cc: marc.zyngier@arm.com, christoffer.dall@arm.com, catalin.marinas@arm.com, will.deacon@arm.com, james.morse@arm.com, linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu, linux-kernel@vger.kernel.org, wanghaibin.wang@huawei.com, lious.lilei@hisilicon.com, lishuo1@hisilicon.com References: <5f712cc6-0874-adbe-add6-46f5de24f36f@huawei.com> <1c0e07b9-73f0-efa4-c1b7-ad81789b42c5@huawei.com> <5188e3b9-5b5a-a6a7-7ef0-09b7b4f06af6@arm.com> <348d0b3b-c74b-7b39-ec30-85905c077c38@huawei.com> <20190314105537.GA15323@en101> <368bd218-ac1d-19b2-6e92-960b91afee8b@huawei.com> From: Suzuki K Poulose Message-ID: <6aea4049-7860-7144-a7be-14f856cdc789@arm.com> Date: Fri, 15 Mar 2019 14:56:45 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.5.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Zhengui, On 15/03/2019 08:21, Zheng Xiang wrote: > Hi Suzuki, > > I have tested this patch, VM doesn't hang and we get expected WARNING log: Thanks for the quick testing ! > However, we also get the following unexpected log: > > [ 908.329900] BUG: Bad page state in process qemu-kvm pfn:a2fb41cf > [ 908.339415] page:ffff7e28bed073c0 count:-4 mapcount:0 mapping:0000000000000000 index:0x0 > [ 908.339416] flags: 0x4ffffe0000000000() > [ 908.339418] raw: 4ffffe0000000000 dead000000000100 dead000000000200 0000000000000000 > [ 908.339419] raw: 0000000000000000 0000000000000000 fffffffcffffffff 0000000000000000 > [ 908.339420] page dumped because: nonzero _refcount > [ 908.339437] CPU: 32 PID: 72599 Comm: qemu-kvm Kdump: loaded Tainted: G B W 5.0.0+ #1 > [ 908.339438] Call trace: > [ 908.339439] dump_backtrace+0x0/0x188 > [ 908.339441] show_stack+0x24/0x30 > [ 908.339442] dump_stack+0xa8/0xcc > [ 908.339443] bad_page+0xf0/0x150 > [ 908.339445] free_pages_check_bad+0x84/0xa0 > [ 908.339446] free_pcppages_bulk+0x4b8/0x750 > [ 908.339448] free_unref_page_commit+0x13c/0x198 > [ 908.339449] free_unref_page+0x84/0xa0 > [ 908.339451] __free_pages+0x58/0x68 > [ 908.339452] zap_huge_pmd+0x290/0x2d8 > [ 908.339454] unmap_page_range+0x2b4/0x470 > [ 908.339455] unmap_single_vma+0x94/0xe8 > [ 908.339457] unmap_vmas+0x8c/0x108 > [ 908.339458] exit_mmap+0xd4/0x178 > [ 908.339459] mmput+0x74/0x180 > [ 908.339460] do_exit+0x2b4/0x5b0 > [ 908.339462] do_group_exit+0x3c/0xe0 > [ 908.339463] __arm64_sys_exit_group+0x24/0x28 > [ 908.339465] el0_svc_common+0xa0/0x180 > [ 908.339466] el0_svc_handler+0x38/0x78 > [ 908.339467] el0_svc+0x8/0xc Thats bad, we seem to be making upto 4 unbalanced put_page(). >>> --- >>>   virt/kvm/arm/mmu.c | 51 +++++++++++++++++++++++++++++++++++---------------- >>>   1 file changed, 35 insertions(+), 16 deletions(-) >>> >>> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c >>> index 66e0fbb5..04b0f9b 100644 >>> --- a/virt/kvm/arm/mmu.c >>> +++ b/virt/kvm/arm/mmu.c >>> @@ -1076,24 +1076,38 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache >>>            * Skip updating the page table if the entry is >>>            * unchanged. >>>            */ >>> -        if (pmd_val(old_pmd) == pmd_val(*new_pmd)) >>> +        if (pmd_val(old_pmd) == pmd_val(*new_pmd)) { >>>               return 0; >>> - >>> +        } else if (WARN_ON_ONCE(!pmd_thp_or_huge(old_pmd))) { >>>           /* >>> -         * Mapping in huge pages should only happen through a >>> -         * fault.  If a page is merged into a transparent huge >>> -         * page, the individual subpages of that huge page >>> -         * should be unmapped through MMU notifiers before we >>> -         * get here. >>> -         * >>> -         * Merging of CompoundPages is not supported; they >>> -         * should become splitting first, unmapped, merged, >>> -         * and mapped back in on-demand. >>> +         * If we have PTE level mapping for this block, >>> +         * we must unmap it to avoid inconsistent TLB >>> +         * state. We could end up in this situation if >>> +         * the memory slot was marked for dirty logging >>> +         * and was reverted, leaving PTE level mappings >>> +         * for the pages accessed during the period. >>> +         * Normal THP split/merge follows mmu_notifier >>> +         * callbacks and do get handled accordingly. >>>            */ >>> -        VM_BUG_ON(pmd_pfn(old_pmd) != pmd_pfn(*new_pmd)); >>> +            unmap_stage2_range(kvm, (addr & S2_PMD_MASK), S2_PMD_SIZE); > > It seems that kvm decreases the _refcount of the page twice in transparent_hugepage_adjust() > and unmap_stage2_range(). But I thought we should be doing that on the head_page already, as this is THP. I will take a look and get back to you on this. Btw, is it possible for you to turn on CONFIG_DEBUG_VM and re-run with the above patch ? Kind regards Suzuki