Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp5072867imm; Mon, 14 May 2018 19:05:37 -0700 (PDT) X-Google-Smtp-Source: AB8JxZoz892uxKI5cWrLB+qal/VWkPnT7ayAUzA35O5IHCPDezUbuSXIVIx0vhyjXinJPToLCKuy X-Received: by 2002:a63:85c1:: with SMTP id u184-v6mr10448711pgd.442.1526349937222; Mon, 14 May 2018 19:05:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1526349937; cv=none; d=google.com; s=arc-20160816; b=AequTwcnvfUxRf6v1LsYyXfkZPH0Sumk6q49SSoITQCjZyglNdt9f8vomTeLFAFnIR BxZa57V8KeBO2paCYdvVoxBcRQCw2Ql+UfLSXMvLo4GbKTUKSJxW0R+mkLkBg8DBUSAp QnxpcS1HvcWu1+M4AjVDf5S/R8WUAKNo1auxGf8Mv8Cwfi0hutY1fHCUl6ZunI6lRQfE mP/dyK4M2SSZAYfVmGBpVN2YhYNW6kzzk5Spwv0bbUTz17EwYovKdhAETPuRosRDV0KJ txvCfK+el940W399cXiq7AthI88BC7lasPXU3LS5jaUp66gkDi4yQ2cNVMbHy0Jeoy7f TFoA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:in-reply-to :mime-version:user-agent:date:message-id:from:references:cc:to :subject:dkim-signature:arc-authentication-results; bh=hCm0SIyUPCQtf9sofm4CD+OQBDRX2URv7OH9OEz6g/I=; b=jF/klA76GdP586J5bjMY2ZRUlDBS3XkdCEN/LTBlB3bPLExiR7puYeZ1Mgx8ESKLMP nJDFuKoOUrSxI4qKiADD9/vrM3yO6t3+WyEdjUdUrnIGKS99lu95fAuVEC6CWmev9vkf DmRHag22/3RJjLuSHr68yU4epeHnszhg7cTGqNmHV/ko4EDLIuZ3rzNI4Gg4SPNEq1L5 mMspfdaWdodOHjAqNwMZeZs1CC7yTq9lNJF8iXKGhnfhgEcNTUcWhXxnAtuHANQeuTiE dx6kxrffvypMvr59uyP5n06dLA7hSYdjUZ6q3TYnUB9OfbKtjbb7ZVvs+SOMJO5sLcJX DIpw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=DjxqZ3yV; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id bb5-v6si10176443plb.80.2018.05.14.19.05.23; Mon, 14 May 2018 19:05:37 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=DjxqZ3yV; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752530AbeEOCDn (ORCPT + 99 others); Mon, 14 May 2018 22:03:43 -0400 Received: from mail-pf0-f193.google.com ([209.85.192.193]:42427 "EHLO mail-pf0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752138AbeEOCDl (ORCPT ); Mon, 14 May 2018 22:03:41 -0400 Received: by mail-pf0-f193.google.com with SMTP id p14-v6so6892214pfh.9 for ; Mon, 14 May 2018 19:03:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-transfer-encoding; bh=hCm0SIyUPCQtf9sofm4CD+OQBDRX2URv7OH9OEz6g/I=; b=DjxqZ3yVO3Or2iBQcmRy8swa0KZc8iMczlnGQpEwZG881x/13X7zxP/i0dZAsGV2Oz ItcmkaBZ7I0d0nGkvh+NrdcUR3/HxQS+fFP6p12Csqd0RZKsCwnfz6A9/IfLkNnKU6/L 7nTZWhXBwg5uw6n0Il7zr0M2XVOCJFAPNKF63fnv3PciroHPZhEAh9wR3XnaU08HFlnz XfXn+UZS9FMc3GzA1esOXA+tCr7OKIxY4UlVCL2Zq3d5qJ3vGXVYIBeZw0BFjoAglYaq wLlptgyXpgBAv83Jbmw3agfItc46qvs08itLJCdAwaAtqexJH4AdEukP2mMkKOjarsyD h5UQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=hCm0SIyUPCQtf9sofm4CD+OQBDRX2URv7OH9OEz6g/I=; b=hlSdHz+E4mNzKcJ3JMlZesI80mLwicPCwtBlIAwRbvzZ4/k5xMuVBFfq8PXcXgT0e+ 2BFW33EJ3R4rAAEHm+xfZ+lH7cGQinOUUo2jbQuBPxh2IvDffbL1c7V5OSGVqFd3FpHL 1ksh991OHq8vQHuWBOP8EIwW5k2Mzcs0OfI7A6fWuc9zBagQLRqLnI4zoJLpOxzOi5EI v1TV+XF4bz16EmZ36K1m4EfwiXgJABDZhWcKJk/qZ3o5/qAxeib8+oVGFE/A5uxlLQ0d LfwZ9Kk1qD3NztteT7qx1Q5l033LI17BoTnCEgTAaEMD1NM6Y/Cc1cXq2xllO6wenWZs fPpQ== X-Gm-Message-State: ALKqPwfnRsg7NDoM7cHOMjiNq6gi4s2SO05z1Zbtyitt+GZo5DOyGaNx 2oSpF9FpbyOE69P1Zo4/w0w= X-Received: by 2002:aa7:8084:: with SMTP id v4-v6mr12823827pff.105.1526349820193; Mon, 14 May 2018 19:03:40 -0700 (PDT) Received: from [0.0.0.0] (67.216.217.169.16clouds.com. [67.216.217.169]) by smtp.gmail.com with ESMTPSA id g76-v6sm20535714pfj.102.2018.05.14.19.03.33 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 14 May 2018 19:03:39 -0700 (PDT) Subject: Re: [PATCH] KVM: arm/arm64: fix unaligned hva start and end in handle_hva_to_gpa To: Suzuki K Poulose , Marc Zyngier , Christoffer Dall , linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu Cc: linux-kernel@vger.kernel.org, Jia He , li.zhang@hxt-semitech.com, hughd@google.com, Andrea Arcangeli ";" Minchan Kim ";" Claudio Imbrenda ";" Arvind Yadav ";" Mike Rapoport , akpm@linux-foundation.org References: <1525244911-5519-1-git-send-email-hejianet@gmail.com> <04e6109f-cbbd-24b0-03bb-9247b930d42c@arm.com> <85e04362-05dd-c697-e9c4-ad5824e63819@gmail.com> <0f134188-12d5-7184-3fbd-0ec8204cf649@arm.com> <695beacb-ff51-bb2c-72ef-d268f7d4e59d@arm.com> From: Jia He Message-ID: <1e065e9b-4dad-611d-fc5b-26fe6c031507@gmail.com> Date: Tue, 15 May 2018 10:03:30 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: <695beacb-ff51-bb2c-72ef-d268f7d4e59d@arm.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Suzuki I will merge the other thread into this, and add the necessary CC list That WARN_ON call trace is very easy to reproduce in my armv8a server after I start 20 guests and run memhog in the host. Of course, ksm should be enabled For you question about my inject fault debug patch: diff --git a/mm/ksm.c b/mm/ksm.c index e3cbf9a..876bec8 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -43,6 +43,8 @@  #include  #include "internal.h" +int trigger_by_ksm = 0; +  #ifdef CONFIG_NUMA  #define NUMA(x)                (x)  #define DO_NUMA(x)     do { (x); } while (0) @@ -2587,11 +2589,14 @@ void rmap_walk_ksm(struct page *page, struct rmap_walk_control *rwc)                         if (rwc->invalid_vma && rwc->invalid_vma(vma, rwc->arg))                                 continue; +trigger_by_ksm = 1;                         if (!rwc->rmap_one(page, vma,                                         rmap_item->address, rwc->arg)) {                                 anon_vma_unlock_read(anon_vma); +trigger_by_ksm = 0;                                 return;                         } +trigger_by_ksm = 0;                         if (rwc->done && rwc->done(page)) {                                 anon_vma_unlock_read(anon_vma);                                 return; diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 7f6a944..ab8545e 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -290,12 +290,17 @@ static void unmap_stage2_puds(struct kvm *kvm, pgd_t *pgd,   * destroying the VM), otherwise another faulting VCPU may come in and mess   * with things behind our backs.   */ +extern int trigger_by_ksm;  static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)  {         pgd_t *pgd;         phys_addr_t addr = start, end = start + size;         phys_addr_t next; +       if(trigger_by_ksm) { +               end -= 0x200; +       } +         assert_spin_locked(&kvm->mmu_lock);         pgd = kvm->arch.pgd + stage2_pgd_index(addr);         do { I need to point out that I never reproduced it without this debugging patch. Please also see my comments below On 5/14/2018 6:06 PM, Suzuki K Poulose Wrote: > On 14/05/18 03:30, Jia He wrote: >> >> On 5/11/2018 9:39 PM, Suzuki K Poulose Wrote: >>> Marc >>> >>> Thanks for looping me in. Comments below. >>> >>> >>> On 03/05/18 03:02, Jia He wrote: >>>> Hi Marc >>>> >>>> Thanks for the review >>>> >>>> >>>> On 5/2/2018 10:26 PM, Marc Zyngier Wrote: >>>>> [+ Suzuki] >>>>> >>>>> On 02/05/18 08:08, Jia He wrote: >>>>>> From: Jia He >>>>>> >>>>>> In our armv8a server (QDF2400), I noticed a WARN_ON as follows: >>>>>> >>>>>> [  800.202850] WARNING: CPU: 33 PID: 255 at arch/arm64/kvm/../../../virt/kvm/arm/mmu.c:1670 kvm_age_hva_handler+0xcc/0xd4 >>>>> Which kernel version is that? I don't have a WARN_ON() at this line in >>>>> 4.17. Do you have a reproducer? >>>> My running kernel version is v4.14-15, but I can reproduced it in 4.17 (start 20 guests and run memhog in the host) >>>> In 4.17, the warn_on is at >>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/virt/kvm/arm/mmu.c#n1826 >>>>>> [  800.213535] Modules linked in: vhost_net vhost tap xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm vfat fat iw_cm mlx5_ib ib_core dm_mirror dm_region_hash dm_log dm_mod crc32_ce ipmi_ssif sg nfsd >>>>>> [  800.284115]  auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c mlx5_core ixgbe mlxfw devlink mdio ahci_platform libahci_platform qcom_emac libahci hdma hdma_mgmt i2c_qup >>>>>> [  800.300382] CPU: 33 PID: 255 Comm: khugepaged Tainted: G        W       4.14.36+ #6 >>>>>> [  800.308030] Hardware name: >>>>> Well, that's QDF2400, right? ;-) >>>> yes, exactly :) >>>>>> [  800.318717] task: ffff8017c949c000 task.stack: ffff8017c9498000 >>>>>> [  800.324629] PC is at kvm_age_hva_handler+0xcc/0xd4 >>>>>> [  800.329412] LR is at handle_hva_to_gpa+0xec/0x15c >>>>>> [  800.334109] pc : [] lr : [] pstate: 20400145 >>>>>> [  800.341496] sp : ffff8017c949b260 >>>>>> [  800.344804] x29: ffff8017c949b260 x28: ffff801663e25008 >>>>>> [  800.350110] x27: 0000000000020000 x26: 00000001fb1a0000 >>>>>> [  800.355416] x25: 0000ffff605b0200 x24: 0000ffff605a0200 >>>>>> [  800.360722] x23: 0000000000000000 x22: 000000000000ffff >>>>>> [  800.366028] x21: 00000001fb1a0000 x20: ffff8017c085a000 >>>>>> [  800.371334] x19: ffff801663e20008 x18: 0000000000000000 >>>>>> [  800.376641] x17: 0000000000000000 x16: 0000000000000000 >>>>>> [  800.381947] x15: 0000000000000000 x14: 3d646e655f617668 >>>>>> [  800.387254] x13: 2c30303230623530 x12: 36666666663d646e >>>>>> [  800.392560] x11: 652c303032306135 x10: 3036666666663d74 >>>>>> [  800.397867] x9 : 0000000000003796 x8 : 655f6e66672c3030 >>>>>> [  800.403173] x7 : ffff00000859434c x6 : ffff8017f9c30cb8 >>>>>> [  800.408479] x5 : ffff8017f9c30cb8 x4 : ffff0000080b4e60 >>>>>> [  800.413786] x3 : 0000000000000000 x2 : 0000000000020000 >>>>>> [  800.419092] x1 : 00000001fb1a0000 x0 : 0000000020000000 >>>>>> [  800.424398] Call trace: >>>>>> [  800.426838] Exception stack(0xffff8017c949b120 to 0xffff8017c949b260) >>>>>> [  800.433272] b120: 0000000020000000 00000001fb1a0000 0000000000020000 0000000000000000 >>>>>> [  800.441095] b140: ffff0000080b4e60 ffff8017f9c30cb8 ffff8017f9c30cb8 ffff00000859434c >>>>>> [  800.448918] b160: 655f6e66672c3030 0000000000003796 3036666666663d74 652c303032306135 >>>>>> [  800.456740] b180: 36666666663d646e 2c30303230623530 3d646e655f617668 0000000000000000 >>>>>> [  800.464563] b1a0: 0000000000000000 0000000000000000 0000000000000000 ffff801663e20008 >>>>>> [  800.472385] b1c0: ffff8017c085a000 00000001fb1a0000 000000000000ffff 0000000000000000 >>>>>> [  800.480208] b1e0: 0000ffff605a0200 0000ffff605b0200 00000001fb1a0000 0000000000020000 >>>>>> [  800.488030] b200: ffff801663e25008 ffff8017c949b260 ffff0000080b4838 ffff8017c949b260 >>>>>> [  800.495853] b220: ffff0000080b4f2c 0000000020400145 0000000000000001 ffff8017c949b2a0 >>>>>> [  800.503676] b240: ffffffffffffffff ffff8017c949b260 ffff8017c949b260 ffff0000080b4f2c >>>>>> [  800.511498] [] kvm_age_hva_handler+0xcc/0xd4 >>>>>> [  800.517324] [] handle_hva_to_gpa+0xec/0x15c >>>>>> [  800.523063] [] kvm_age_hva+0x5c/0xcc >>>>>> [  800.528194] [] kvm_mmu_notifier_clear_flush_young+0x54/0x90 >>>>>> [  800.535324] [] __mmu_notifier_clear_flush_young+0x6c/0xa8 >>>>>> [  800.542279] [] page_referenced_one+0x1e0/0x1fc >>>>>> [  800.548279] [] rmap_walk_ksm+0x124/0x1a0 >>>>>> [  800.553759] [] rmap_walk+0x94/0x98 >>>>>> [  800.558717] [] page_referenced+0x120/0x180 >>>>>> [  800.564369] [] shrink_active_list+0x218/0x4a4 >>>>>> [  800.570281] [] shrink_node_memcg+0x58c/0x6fc >>>>>> [  800.576107] [] shrink_node+0xe4/0x328 >>>>>> [  800.581325] [] do_try_to_free_pages+0xe4/0x3b8 >>>>>> [  800.587324] [] try_to_free_pages+0x124/0x234 >>>>>> [  800.593150] [] __alloc_pages_nodemask+0x564/0xf7c >>>>>> [  800.599412] [] khugepaged_alloc_page+0x38/0xb8 >>>>>> [  800.605411] [] collapse_huge_page+0x74/0xd70 >>>>>> [  800.611238] [] khugepaged_scan_mm_slot+0x654/0xa98 >>>>>> [  800.617585] [] khugepaged+0x2bc/0x49c >>>>>> [  800.622803] [] kthread+0x124/0x150 >>>>>> [  800.627762] [] ret_from_fork+0x10/0x1c >>>>>> [  800.633066] ---[ end trace 944c130b5252fb01 ]--- >>>>>> ------------------------------------------------------------------------- >>>>>> >>>>>> The root cause might be: we can't guarantee that the parameter start and end >>>>>> in handle_hva_to_gpa is PAGE_SIZE aligned, let alone hva_start and hva_end. >>>>> So why not aligning them the first place? >>>> at the first place of handle_hva_to_gpa()? >>>> but boundary check is needed in each loop of kvm_for_each_memslot. Am I missing anything here? >>>>>> This bug is introduced by commit 056aad67f836 ("kvm: arm/arm64: Rework gpa >>>>>> callback handlers") >>>>>> >>>>>> It fixes the bug by use pfn size converted. >>>>>> >>>>>> Fixes: 056aad67f836 ("kvm: arm/arm64: Rework gpa callback handlers") >>>>>> >>>>>> Signed-off-by: jia.he@hxt-semitech.com >>>>>> Signed-off-by: li.zhang@hxt-semitech.com >>>>>> --- >>>>>>    virt/kvm/arm/mmu.c | 10 ++++++---- >>>>>>    1 file changed, 6 insertions(+), 4 deletions(-) >>>>>> >>>>>> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c >>>>>> index 7f6a944..9dd7ae4 100644 >>>>>> --- a/virt/kvm/arm/mmu.c >>>>>> +++ b/virt/kvm/arm/mmu.c >>>>>> @@ -1744,7 +1744,7 @@ static int handle_hva_to_gpa(struct kvm *kvm, >>>>>>        /* we only care about the pages that the guest sees */ >>>>>>        kvm_for_each_memslot(memslot, slots) { >>>>>>            unsigned long hva_start, hva_end; >>>>>> -        gfn_t gpa; >>>>>> +        gpa_t gpa, gpa_end; >>>>>>            hva_start = max(start, memslot->userspace_addr); >>>>>>            hva_end = min(end, memslot->userspace_addr + >>>>>> @@ -1753,7 +1753,9 @@ static int handle_hva_to_gpa(struct kvm *kvm, >>>>>>                continue; >>>>>>            gpa = hva_to_gfn_memslot(hva_start, memslot) << PAGE_SHIFT; >>>>>> -        ret |= handler(kvm, gpa, (u64)(hva_end - hva_start), data); >>>>>> +        gpa_end = hva_to_gfn_memslot(hva_end + PAGE_SIZE - 1, memslot) >>>>>> +                             << PAGE_SHIFT; >>>>>> +        ret |= handler(kvm, gpa, (u64)(gpa_end - gpa), data); >>>>> But we're looking for the mapping in the same memslot, so the distance >>>>> between hva and hva_end is the same as the one between gpa and gpa_end >>>>> if you didn't align it. >>>> maybe not, sometimes hva_end-hva != gpa_end-gpa >>>> start=fffdc37f0200,hva_start=fffdc37f0200,end=fffdc3800200,hva_end=fffdc3800000,gpa=3ff0000,gfn_end=4000000 >>>> >>>> but sometimes it is: >>>> start=ffff60590200,hva_start=ffff60590200,end=ffff605a0200,hva_end=ffff605a0200,gpa=1fb190000,gfn_end=1fb1b0000 >>>> >>>> IMO, the unalignment is caused by the ksm stable page flag STABLE_FLAG. I will >>>> propose another ksm patch to fix it。 >>>> But from handle_hva_to_gpa's point of view, arm kvm needs to void the followup >>>> exception, just like what powerpc andx86 have done. >>> >>> As far as I can see this is triggered by someone (in this page_referenced_one via ksm?) >>> triggering a clear_flush_young for a page, with a non-aligned page address. >>> >>> If you look at the code path, the __mmu_notifier_clear_flush_young is invoked >>> via 2 code paths with the "given" address. >>> >>> ptep_clear_flush_young_notify(), in which case the end is set to + PAGE_SIZE >>> pmdp_clear_flush_young_notify(), in which case the end is set to + PMD_SIZE >>> >>> We were supposed to only clear_flush_young for *the page* containing >>> address (start), but we do a clear_flush_young for the next page >>> as well, which (I think) is not something intended. So to me, it looks like, either >>> page_referenced_one() or its caller must align the address to the PAGE_SIZE >>> or PMD_SIZE depending on what it really wants to do, to avoid touching >>> the adjacent entries (page or block pages). >>> >>> Suzuki > Jia He, > >> Suzuki, thanks for the comments. >> >> I proposed another ksm patch https://lkml.org/lkml/2018/5/3/1042 >> The root cause is ksm will add some extra flags to indicate that the page >> is in/not_in the stable tree. This makes address not be aligned with PAGE_SIZE. > Thanks for the pointer. In the future, please Cc the people relevant to the > discussion in the patches. > >> From arm kvm mmu point of view, do you think handle_hva_to_gpa still need to handle >> the unalignment case? > I don't think we should do that. Had we done this, we would never have caught this bug > in KSM. Eventually if some other new implementation comes up with the a new notifier > consumer which doesn't check alignment and doesn't WARN, it could simply do the wrong > thing. So I believe what we have is a good measure to make sure that things are > in the right order. > >> IMO, the PAGE_SIZE alignment is still needed because we should not let the bottom function >> kvm_age_hva_handler to handle the exception. Please refer to the implementation in X86 and >> powerpc kvm_handle_hva_range(). They both aligned the hva with hva_to_gfn_memslot. >> > From an API perspective, you are passed on a "start" and "end" address. So, you could potentially > do the wrong thing if you align the "start" and "end". May be those handlers should also do the > same thing as we do. But handle_hva_to_gpa has partially adjusted the alignment possibly:    1750         kvm_for_each_memslot(memslot, slots) {    1751                 unsigned long hva_start, hva_end;    1752                 gfn_t gpa;    1753    1754                 hva_start = max(start, memslot->userspace_addr);    1755                 hva_end = min(end, memslot->userspace_addr +    1756                             (memslot->npages << PAGE_SHIFT)); at line 1755, let us assume that end=0x12340200 and memslot->userspace_addr + (memslot->npages << PAGE_SHIFT)=0x12340000 Then, hva_start is not page_size aligned and hva_end is aligned, and the size will be PAGE_SIZE-0x200, just as what I had done in the inject fault debugging patch. -- Cheers, Jia