Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp4735354yba; Wed, 10 Apr 2019 03:57:26 -0700 (PDT) X-Google-Smtp-Source: APXvYqxZTjwoXaEHM5dYVChwyRBd/h+lydNmIuk6wwjbqoic1zoOz+97Zs9Rw6k/bI9exwT98HcA X-Received: by 2002:a63:ef4c:: with SMTP id c12mr27410574pgk.43.1554893846730; Wed, 10 Apr 2019 03:57:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1554893846; cv=none; d=google.com; s=arc-20160816; b=H2osTxT72NElqlcw1c1Nz4WkU2xwbDIxg5wFzF+dVcl/wCkBw0E9+wJzQfeeoMa7md 4+ZdAw0H/zwFHZJgu1bNrnr0QzYHnT2iAvwKMXmH4bsOvcV2/QVXRRCZs6T35LA99tWw lUc0ynVgB8nzlHG6KlH7aNkcNz+9Pg72ewKa+L4zSngHGLEII0olxbakPkeN2gcSF3v8 Qf/XabmPTT+uHqsli0BDQ0E1G/FaqKSRvZxj0vDd7ICPcdv8N/iJ2macYi5FDFKin18f 9LCHF6EMG0bABWSC3v334GQJjLxWHaNd6Hia7v+ANLRo6BHj2lDFhI+nbPRPgQxYdE8/ 9nmQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=bDRw5TO1TPNZ3IkJgDN9fBIGx2Ku5WMCHxXc1Nun54c=; b=w6aRSYWduuToT+NCY8zwYrr6oz9ijPD2ASEVhzInVRDQI/zqVrjBm2SSo7kbFbGxhr 0zdH7MSX7evZdPFvfJY+bJGsh+Mu7Wafdll/kT82WC17+2uDk2VgsPSLfDDlXqKITRfN aEbdwJVoyt6wc2RBA6uJrClvF1VzlUHqFthu54fYYInfqaekQpe2LAbwNNZFDVa8hwhP Rs+YMdfDZ7cYKGEbc1ZbpvRqa6so7NP9I+ArMN2FfLM9mOXslKpF+DyOHWslmK+RMjG7 5uUY4CkLg8hj93kIKNiDOl9pcggIhFbS4rUmitsRvxBbB+vqc3ZcHS1tcFuKHO+TWRdu ZHjQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id ba5si1877757plb.24.2019.04.10.03.57.11; Wed, 10 Apr 2019 03:57:26 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728673AbfDJIjF (ORCPT + 99 others); Wed, 10 Apr 2019 04:39:05 -0400 Received: from usa-sjc-mx-foss1.foss.arm.com ([217.140.101.70]:50046 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727331AbfDJIjF (ORCPT ); Wed, 10 Apr 2019 04:39:05 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 3F615A78; Wed, 10 Apr 2019 01:39:04 -0700 (PDT) Received: from [10.1.196.93] (en101.cambridge.arm.com [10.1.196.93]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 715303F718; Wed, 10 Apr 2019 01:39:02 -0700 (PDT) Subject: Re: [PATCH] kvm: arm: Skip stage2 huge mappings for unaligned ipa backed by THP To: yuzenghui@huawei.com, linux-arm-kernel@lists.infradead.org Cc: kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, eric.auger@redhat.com, marc.zyngier@arm.com, christoffer.dall@arm.com, zhengxiang9@huawei.com, andrew.murray@arm.com, wanghaibin.wang@huawei.com References: <1554203176-3958-1-git-send-email-suzuki.poulose@arm.com> <2ea55b9c-09da-c3d0-3616-aa6be85b5a46@huawei.com> <44ffad26-a407-4603-0bb6-145f7290adbe@huawei.com> <730c25b4-dbc5-8d8b-514c-4ed8641701ce@arm.com> <1b398e29-e2d5-2a58-b4ca-c72979a170d8@huawei.com> <520e277c-f096-a84b-8405-636b19e4cc46@arm.com> <8bf8e863-93f5-ab44-c0aa-1ad23f91a016@huawei.com> From: Suzuki K Poulose Message-ID: <2df077f2-cb2c-0cb3-5c37-520956832f87@arm.com> Date: Wed, 10 Apr 2019 09:39:00 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1 MIME-Version: 1.0 In-Reply-To: <8bf8e863-93f5-ab44-c0aa-1ad23f91a016@huawei.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 10/04/2019 03:20, Zenghui Yu wrote: > > On 2019/4/9 22:59, Suzuki K Poulose wrote: >> Hi Zenghui >> >> On 04/09/2019 09:05 AM, Zenghui Yu wrote: >>> >>> >>> On 2019/4/9 2:40, Suzuki K Poulose wrote: >>>> Hi Zenhui, >>>> >>>> On 04/08/2019 04:11 PM, Zenghui Yu wrote: >>>>> Hi Suzuki, >>>>> >>>>> Thanks for the reply. >>>>> >>>> >>>> ... >>>> >>>>>>> Hi Suzuki, >>>>>>> >>>>>>> Why not making use of fault_supports_stage2_huge_mapping()?  Let >>>>>>> it do >>>>>>> some checks for us. >>>>>>> >>>>>>> fault_supports_stage2_huge_mapping() was intended to do a *two-step* >>>>>>> check to tell us that can we create stage2 huge block mappings, >>>>>>> and this >>>>>>> check is both for hugetlbfs and THP.  With commit >>>>>>> a80868f398554842b14, >>>>>>> we pass PAGE_SIZE as "map_size" for normal size pages (which >>>>>>> turned out >>>>>>> to be almost meaningless), and unfortunately the THP check no longer >>>>>>> works. >>>>>> >>>>>> Thats correct. >>>>>> >>>>>>> >>>>>>> So we want to rework *THP* check process.  Your patch fixes the first >>>>>>> checking-step, but the second is still missed, am I wrong? >>>>>> >>>>>> It fixes the step explicitly for the THP by making sure that the >>>>>> GPA and >>>>>> the HVA are aligned to the map size. >>>>> >>>>> Yes, I understand how your patch had fixed the issue.  But what I'm >>>>> really concerned about here is the *second* checking-step in >>>>> fault_supports_stage2_huge_mapping(). >>>>> >>>>> We have to check if we are mapping a non-block aligned or non-block >>>>> sized memslot, if so, we can not create block mappings for the >>>>> beginning >>>>> and end of this memslot.  This is what the second part of >>>>> fault_supports_stage2_huge_mapping() had done. >>>>> >>>>> I haven't seen this checking-step in your patch, did I miss something? >>>>> >>>> >>>> I see. >>>> >>>>>> I don't think this calls for a VM_BUG_ON(). It is simply a case where >>>>>> the GPA is not aligned to HVA, but for normal VMA that could be >>>>>> made THP. >>>>>> >>>>>> We had this VM_BUG_ON(), which would have never hit because we would >>>>>> have set force_pte if they were not aligned. >>>>> >>>>> Yes, I agree. >>>>> >>>>>>>> +        /* Skip memslots with unaligned IPA and user address */ >>>>>>>> +        if ((gfn & mask) != (pfn & mask)) >>>>>>>> +            return false; >>>>>>>>           if (pfn & mask) { >>>>>>>>               *ipap &= PMD_MASK; >>>>>>>>               kvm_release_pfn_clean(pfn); >>>>>>>> >>>>>>> >>>>>>> ---8>--- >>>>>>> >>>>>>> Rework fault_supports_stage2_huge_mapping(), let it check THP again. >>>>>>> >>>>>>> Signed-off-by: Zenghui Yu >>>>>>> --- >>>>>>>   virt/kvm/arm/mmu.c | 11 ++++++++++- >>>>>>>   1 file changed, 10 insertions(+), 1 deletion(-) >>>>>>> >>>>>>> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c >>>>>>> index 27c9583..5e1b258 100644 >>>>>>> --- a/virt/kvm/arm/mmu.c >>>>>>> +++ b/virt/kvm/arm/mmu.c >>>>>>> @@ -1632,6 +1632,15 @@ static bool >>>>>>> fault_supports_stage2_huge_mapping(struct kvm_memory_slot *memslot, >>>>>>>       uaddr_end = uaddr_start + size; >>>>>>> >>>>>>>       /* >>>>>>> +     * If the memslot is _not_ backed by hugetlbfs, then check if it >>>>>>> +     * can be backed by transparent hugepages. >>>>>>> +     * >>>>>>> +     * Currently only PMD_SIZE THPs are supported, revisit it later. >>>>>>> +     */ >>>>>>> +    if (map_size == PAGE_SIZE) >>>>>>> +        map_size = PMD_SIZE; >>>>>>> + >>>>>> >>>>>> This looks hackish. What is we support PUD_SIZE huge page in the >>>>>> future >>>>>> ? >>>>> >>>>> Yes, this might make the code a little difficult to understand. But by >>>>> doing so, we follow the same logic before commit a80868f398554842b14, >>>>> that said, we do the two-step checking for normal size pages in >>>>> fault_supports_stage2_huge_mapping(), to decide if we can create THP >>>>> mappings for these pages. >>>>> >>>>> As for PUD_SIZE THPs, to be honest, I have no idea now :( >>>> >>>> How about the following diff ? >>>> >>>> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c >>>> index 97b5417..98e5cec 100644 >>>> --- a/virt/kvm/arm/mmu.c >>>> +++ b/virt/kvm/arm/mmu.c >>>> @@ -1791,7 +1791,8 @@ static int user_mem_abort(struct kvm_vcpu >>>> *vcpu, phys_addr_t fault_ipa, >>>>            * currently supported. This code will need to be >>>>            * updated to support other THP sizes. >>>>            */ >>>> -        if (transparent_hugepage_adjust(&pfn, &fault_ipa)) >>>> +        if (fault_supports_stage2_huge_mappings(memslot, hva, >>>> PMD_SIZE) && >>>> +            transparent_hugepage_adjust(&pfn, &fault_ipa)) >>>>               vma_pagesize = PMD_SIZE; >>>>       } >>> >>> I think this is good enough for the issue. >>> >>> (One minor concern: With this change, it seems that we no longer need >>> "force_pte" and can just use "logging_active" instead. But this is not >>> much related to what we're fixing.) >> >> I would still leave the force_pte there to avoid checking for a THP case >> in a situation where we forced to PTE level mapping on a hugepage backed >> VMA. It would serve to avoid another check. > > Hi Suzuki, > > Yes, I agree, thanks. Cool, I have a patch to fix this properly and two other patches to clean up and unify the way we handle the THP backed hugepages. Will send them out after a bit of testing, later today. Cheers Suzuki