Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp4427861yba; Tue, 9 Apr 2019 19:22:43 -0700 (PDT) X-Google-Smtp-Source: APXvYqw7Qyl1rA4EFk+c/eio8GnM9I0fwSIFrFitNHOfm/5IYzE1Ak/4enYJmlNXLjESsg58DEht X-Received: by 2002:a17:902:2865:: with SMTP id e92mr19304091plb.269.1554862963137; Tue, 09 Apr 2019 19:22:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1554862963; cv=none; d=google.com; s=arc-20160816; b=wo9Rkl0C4VMmealNFP/sWOtCpFDaqFtmz3JSKgUPRTYMRYP/96jF4PhAOqQsBAFU8a UyoIRdLN5iyD2ENEROHKGXXs3QZJw9/ayioCmYOjHScv70sX7NmL2+pdlKeD0BZHeCbc 7E9jg8dDOK4VGPlVVoMod2NlzdPYuSGqLqz93JHBKYwNgNKrw4ZCus9ofBNxVuvSx0xe McGJ0Rxj17uklVoYDlxKrhifOzDczrCF+z5SIagLmNLjU9CiRJLbQZdD1EjKnJqk+MkU uM5qbUe9dD9LGUnBZrJCm91wYwNp8be2mMDLHo9VpSPWIXjmQags0BsVCRQY5COcTvNb r97Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=tfbh48IMFVfcdhY4Bs9xPsszDkD7AHJtUVaYs4rGxk8=; b=OLRsTpxGWTEcYxJGBd0YgAWxZL4XZqCROJ+X1a78KvHZuH/vA6xGyGuOb9Nv5SX9DU kuJweVtK2VRpHapNmIUHttrg376cmuhc+A7GuN1sjhrwu+oe2XnOPHDk209gbrpXYhF/ 55ASjEUiybqSU+SF9c3g+Q3iUmhxYGal9BUXoIb87Vtds/pTGJRT8KtY8sY4Q1WA0bjP mq1stRlTj79OheOYkAjov2L/2olCBRyQlCHfKKIWEgzR5v1FcMoWEam2N+fFXykWewh3 Bs80u2+MUlkdkW1L8HO2mV04fZx9kIUBn5n573ezloDTZ8rEiNKLLgbv32stug2+C6Fp Srug== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q12si31068730plr.434.2019.04.09.19.22.27; Tue, 09 Apr 2019 19:22:43 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727278AbfDJCVj (ORCPT + 99 others); Tue, 9 Apr 2019 22:21:39 -0400 Received: from szxga07-in.huawei.com ([45.249.212.35]:53050 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726640AbfDJCVj (ORCPT ); Tue, 9 Apr 2019 22:21:39 -0400 Received: from DGGEMS403-HUB.china.huawei.com (unknown [172.30.72.59]) by Forcepoint Email with ESMTP id F3092206E0C8039C669D; Wed, 10 Apr 2019 10:21:33 +0800 (CST) Received: from [127.0.0.1] (10.184.12.158) by DGGEMS403-HUB.china.huawei.com (10.3.19.203) with Microsoft SMTP Server id 14.3.408.0; Wed, 10 Apr 2019 10:21:26 +0800 Subject: Re: [PATCH] kvm: arm: Skip stage2 huge mappings for unaligned ipa backed by THP To: Suzuki K Poulose , CC: , , , , , , , , References: <1554203176-3958-1-git-send-email-suzuki.poulose@arm.com> <2ea55b9c-09da-c3d0-3616-aa6be85b5a46@huawei.com> <44ffad26-a407-4603-0bb6-145f7290adbe@huawei.com> <730c25b4-dbc5-8d8b-514c-4ed8641701ce@arm.com> <1b398e29-e2d5-2a58-b4ca-c72979a170d8@huawei.com> <520e277c-f096-a84b-8405-636b19e4cc46@arm.com> From: Zenghui Yu Message-ID: <8bf8e863-93f5-ab44-c0aa-1ad23f91a016@huawei.com> Date: Wed, 10 Apr 2019 10:20:30 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:64.0) Gecko/20100101 Thunderbird/64.0 MIME-Version: 1.0 In-Reply-To: <520e277c-f096-a84b-8405-636b19e4cc46@arm.com> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-Originating-IP: [10.184.12.158] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2019/4/9 22:59, Suzuki K Poulose wrote: > Hi Zenghui > > On 04/09/2019 09:05 AM, Zenghui Yu wrote: >> >> >> On 2019/4/9 2:40, Suzuki K Poulose wrote: >>> Hi Zenhui, >>> >>> On 04/08/2019 04:11 PM, Zenghui Yu wrote: >>>> Hi Suzuki, >>>> >>>> Thanks for the reply. >>>> >>> >>> ... >>> >>>>>> Hi Suzuki, >>>>>> >>>>>> Why not making use of fault_supports_stage2_huge_mapping()?  Let >>>>>> it do >>>>>> some checks for us. >>>>>> >>>>>> fault_supports_stage2_huge_mapping() was intended to do a *two-step* >>>>>> check to tell us that can we create stage2 huge block mappings, >>>>>> and this >>>>>> check is both for hugetlbfs and THP.  With commit >>>>>> a80868f398554842b14, >>>>>> we pass PAGE_SIZE as "map_size" for normal size pages (which >>>>>> turned out >>>>>> to be almost meaningless), and unfortunately the THP check no longer >>>>>> works. >>>>> >>>>> Thats correct. >>>>> >>>>>> >>>>>> So we want to rework *THP* check process.  Your patch fixes the first >>>>>> checking-step, but the second is still missed, am I wrong? >>>>> >>>>> It fixes the step explicitly for the THP by making sure that the >>>>> GPA and >>>>> the HVA are aligned to the map size. >>>> >>>> Yes, I understand how your patch had fixed the issue.  But what I'm >>>> really concerned about here is the *second* checking-step in >>>> fault_supports_stage2_huge_mapping(). >>>> >>>> We have to check if we are mapping a non-block aligned or non-block >>>> sized memslot, if so, we can not create block mappings for the >>>> beginning >>>> and end of this memslot.  This is what the second part of >>>> fault_supports_stage2_huge_mapping() had done. >>>> >>>> I haven't seen this checking-step in your patch, did I miss something? >>>> >>> >>> I see. >>> >>>>> I don't think this calls for a VM_BUG_ON(). It is simply a case where >>>>> the GPA is not aligned to HVA, but for normal VMA that could be >>>>> made THP. >>>>> >>>>> We had this VM_BUG_ON(), which would have never hit because we would >>>>> have set force_pte if they were not aligned. >>>> >>>> Yes, I agree. >>>> >>>>>>> +        /* Skip memslots with unaligned IPA and user address */ >>>>>>> +        if ((gfn & mask) != (pfn & mask)) >>>>>>> +            return false; >>>>>>>           if (pfn & mask) { >>>>>>>               *ipap &= PMD_MASK; >>>>>>>               kvm_release_pfn_clean(pfn); >>>>>>> >>>>>> >>>>>> ---8>--- >>>>>> >>>>>> Rework fault_supports_stage2_huge_mapping(), let it check THP again. >>>>>> >>>>>> Signed-off-by: Zenghui Yu >>>>>> --- >>>>>>   virt/kvm/arm/mmu.c | 11 ++++++++++- >>>>>>   1 file changed, 10 insertions(+), 1 deletion(-) >>>>>> >>>>>> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c >>>>>> index 27c9583..5e1b258 100644 >>>>>> --- a/virt/kvm/arm/mmu.c >>>>>> +++ b/virt/kvm/arm/mmu.c >>>>>> @@ -1632,6 +1632,15 @@ static bool >>>>>> fault_supports_stage2_huge_mapping(struct kvm_memory_slot *memslot, >>>>>>       uaddr_end = uaddr_start + size; >>>>>> >>>>>>       /* >>>>>> +     * If the memslot is _not_ backed by hugetlbfs, then check if it >>>>>> +     * can be backed by transparent hugepages. >>>>>> +     * >>>>>> +     * Currently only PMD_SIZE THPs are supported, revisit it later. >>>>>> +     */ >>>>>> +    if (map_size == PAGE_SIZE) >>>>>> +        map_size = PMD_SIZE; >>>>>> + >>>>> >>>>> This looks hackish. What is we support PUD_SIZE huge page in the >>>>> future >>>>> ? >>>> >>>> Yes, this might make the code a little difficult to understand. But by >>>> doing so, we follow the same logic before commit a80868f398554842b14, >>>> that said, we do the two-step checking for normal size pages in >>>> fault_supports_stage2_huge_mapping(), to decide if we can create THP >>>> mappings for these pages. >>>> >>>> As for PUD_SIZE THPs, to be honest, I have no idea now :( >>> >>> How about the following diff ? >>> >>> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c >>> index 97b5417..98e5cec 100644 >>> --- a/virt/kvm/arm/mmu.c >>> +++ b/virt/kvm/arm/mmu.c >>> @@ -1791,7 +1791,8 @@ static int user_mem_abort(struct kvm_vcpu >>> *vcpu, phys_addr_t fault_ipa, >>>            * currently supported. This code will need to be >>>            * updated to support other THP sizes. >>>            */ >>> -        if (transparent_hugepage_adjust(&pfn, &fault_ipa)) >>> +        if (fault_supports_stage2_huge_mappings(memslot, hva, >>> PMD_SIZE) && >>> +            transparent_hugepage_adjust(&pfn, &fault_ipa)) >>>               vma_pagesize = PMD_SIZE; >>>       } >> >> I think this is good enough for the issue. >> >> (One minor concern: With this change, it seems that we no longer need >> "force_pte" and can just use "logging_active" instead. But this is not >> much related to what we're fixing.) > > I would still leave the force_pte there to avoid checking for a THP case > in a situation where we forced to PTE level mapping on a hugepage backed > VMA. It would serve to avoid another check. Hi Suzuki, Yes, I agree, thanks. zenghui