Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp1410422imm; Tue, 15 May 2018 19:48:37 -0700 (PDT) X-Google-Smtp-Source: AB8JxZpLNLBbq2TvD8QEAz3LVCjTZ6/dG9h5g/Z6Wn+McaQRgYXtwoWy5BhiLrZYJRXkAej+UeCI X-Received: by 2002:a65:62d9:: with SMTP id m25-v6mr14282512pgv.407.1526438917437; Tue, 15 May 2018 19:48:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1526438917; cv=none; d=google.com; s=arc-20160816; b=ZZ5ep0IoZ3DpBHmL3DT/ocu+zY9v3XFq8abXWRNmMyPZ37jmodBLXZFJCUGVvkhis5 RBtdiP4XcBJOxcT9TKVud49QWfhxAmkQ34TllmRfETuWaC5M9mn+MJ6g7o9YsMIrFEqJ zZqcv7ufE18jPSK9Q8lSwDiHk1RiwXztCROEOWK1sq3W6p9NHHpt/J6te+iN3Lsggzs6 T53HqGOFmkSLH//aENhIadLN1uz0vNZ70Ut6rs3ciiktcpDN8cSwWPzTEo3mlWZCCtcD LsKO9mVBARjWNQ7WJCl1RBDDHQ+ZERUG6nIK3GPffsicMBpZwkru7L8xJjmMYz1WRNcm qX0w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:in-reply-to :mime-version:user-agent:date:message-id:from:references:cc:to :subject:dkim-signature:arc-authentication-results; bh=xXp9LYldbHEhtgkRJLmdjlSrvivfvq93eSnP3Gz/EWk=; b=B9a4IlRffM5XmiqD7U18Bv2UvZcytCvw7LOK49TGlRh6KTF4DpHZ7Sv0UK9P5wteVe OceSb22B2qzO/j48EEb8pwsz4D1wU6Npbsa6pKTUxDQ2XbdVzg7Os0Eh4Z15TLVbE2f7 9KzaTJocdKr90z4e8ns7dTa0TytqwfLRO0cy6ydU2aScMHeISl0pJJNLg7gJ0fA6Jqxx 4jRvjZw38SAb2vzNDAQ9vPgiRIUDo/T4lO5+XgKiF0+eZS4xYjtHtxYKMYG/Pd5YX/Lb qQYUig+Tln+XBn4dMbxsmIAFj4neBqtVXqXJddiFgdDONXLZQibiESa8/mhOJYd9A96M Jxrg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=GwK6c4Yz; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id r63-v6si1448811plb.366.2018.05.15.19.48.23; Tue, 15 May 2018 19:48:37 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=GwK6c4Yz; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752215AbeEPCsJ (ORCPT + 99 others); Tue, 15 May 2018 22:48:09 -0400 Received: from mail-pf0-f193.google.com ([209.85.192.193]:40107 "EHLO mail-pf0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751904AbeEPCsH (ORCPT ); Tue, 15 May 2018 22:48:07 -0400 Received: by mail-pf0-f193.google.com with SMTP id f189-v6so1021045pfa.7 for ; Tue, 15 May 2018 19:48:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-transfer-encoding; bh=xXp9LYldbHEhtgkRJLmdjlSrvivfvq93eSnP3Gz/EWk=; b=GwK6c4YzBnmgAD9P00rboxemXdd3K7z9/NF9NuTW8GkWYvE7Mz8IMkp8laPQD6SeLC trVkdn78UEH7XSQskca8uZB0ukxh2/WnhfZVMuKPhLbyRxxHNqxBg/NszJSWZiQrelmD 68T/MJDuidjmR19z3pT0KP2WOAZ0uf/plcdV8SB0JApSIZqSjZrt4IkT25meIk/xYxu7 JTQ/mWkzIeRxdeoHF9ib5IlQcVhHgIMPKlar8phHfVZ8kfC96YtCtvo8IQs8VwsOVA5n oklBxM6u0P09X9USd48nySzGYK/Wh4b8YolalXtsDlA6gge/2zeyeYQK/r3zsrrWcKPX ZKKw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=xXp9LYldbHEhtgkRJLmdjlSrvivfvq93eSnP3Gz/EWk=; b=N919M0gwKhvGiW25dGsM+f7zxvKm9bH5M5/daAdY+ACQqkRH64QK7tdoY/5O5dFJ0w kQePJ2cpdktB3Wg/AqSXvP3mOWo6jax7mAv3cioZCbWppwoAcxtf5hVmn9i0CPibWi13 1pE0FQ4eMOYB/feyb44ZxnHF8iqBs6QxoY7htwOqDQqGMKEzthwzU8NjGTQtn1j4sxQx tcGtvIihW8HlZX7keVK8iwPn6VZXL7XL9CivtYiH0KkyTw3iVcxcz/ivAhceH43WgqyR Pmy+gg1EOjPZVBvs2B4GJQXivxDLOUyTdETRyeIBoy8l5UVWYCtyCK0bpQ3agTzDodI4 ut2A== X-Gm-Message-State: ALKqPwerHBgO4gQYFOAlwhOFyK4s/rOHGjDpwqDS8+Yjmde8Vg63zB8t QLPmI0TGW2vvq2t8KqUn3A4= X-Received: by 2002:a65:64c8:: with SMTP id t8-v6mr14346780pgv.414.1526438886713; Tue, 15 May 2018 19:48:06 -0700 (PDT) Received: from [0.0.0.0] (67.216.217.169.16clouds.com. [67.216.217.169]) by smtp.gmail.com with ESMTPSA id d4-v6sm1914800pfl.24.2018.05.15.19.48.00 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 15 May 2018 19:48:05 -0700 (PDT) Subject: Re: [PATCH] KVM: arm/arm64: fix unaligned hva start and end in handle_hva_to_gpa To: Suzuki K Poulose , Marc Zyngier , Christoffer Dall , linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu Cc: linux-kernel@vger.kernel.org, Jia He , li.zhang@hxt-semitech.com, hughd@google.com, Andrea Arcangeli ";" Minchan Kim ";" Claudio Imbrenda ";" Arvind Yadav ";" Mike Rapoport , akpm@linux-foundation.org References: <1525244911-5519-1-git-send-email-hejianet@gmail.com> <04e6109f-cbbd-24b0-03bb-9247b930d42c@arm.com> <85e04362-05dd-c697-e9c4-ad5824e63819@gmail.com> <0f134188-12d5-7184-3fbd-0ec8204cf649@arm.com> <695beacb-ff51-bb2c-72ef-d268f7d4e59d@arm.com> <1e065e9b-4dad-611d-fc5b-26fe6c031507@gmail.com> <14954b3a-6a2a-fe05-47b7-1890375ab8a4@arm.com> <5efb2da1-fed8-443b-d06b-cc03152b2c2e@gmail.com> <09c30e43-7107-395c-bde3-0310bbdbca91@arm.com> From: Jia He Message-ID: <8c409754-9067-bc98-d695-9f116f2916c9@gmail.com> Date: Wed, 16 May 2018 10:47:58 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: <09c30e43-7107-395c-bde3-0310bbdbca91@arm.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 5/16/2018 12:21 AM, Suzuki K Poulose Wrote: > Hi Jia, > > On 15/05/18 14:15, Jia He wrote: >> >> >> On 5/15/2018 8:38 PM, Jia He Wrote: >>> Hi Suzuki >>> >>> On 5/15/2018 4:36 PM, Suzuki K Poulose Wrote: >>>> >>>> Hi Jia >>>> >>>> On 05/15/2018 03:03 AM, Jia He wrote: >>>>> Hi Suzuki >>>>> >>>>> I will merge the other thread into this, and add the necessary CC list >>>>> >>>>> That WARN_ON call trace is very easy to reproduce in my armv8a server after I >>>>> start 20 guests >>>>> >>>>> and run memhog in the host. Of course, ksm should be enabled >>>>> >>>>> For you question about my inject fault debug patch: >>>> >>>> >>>> Thanks for the patch, comments below. >>>> >>>>> >>>> >>>> ... >>>> >>>>> index 7f6a944..ab8545e 100644 >>>>> --- a/virt/kvm/arm/mmu.c >>>>> +++ b/virt/kvm/arm/mmu.c >>>>> @@ -290,12 +290,17 @@ static void unmap_stage2_puds(struct kvm *kvm, pgd_t >>>>> *pgd, >>>>>     * destroying the VM), otherwise another faulting VCPU may come in and mess >>>>>     * with things behind our backs. >>>>>     */ >>>>> +extern int trigger_by_ksm; >>>>>    static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 >>>>> size) >>>>>    { >>>>>           pgd_t *pgd; >>>>>           phys_addr_t addr = start, end = start + size; >>>>>           phys_addr_t next; >>>>> >>>>> +       if(trigger_by_ksm) { >>>>> +               end -= 0x200; >>>>> +       } >>>>> + >>>>>           assert_spin_locked(&kvm->mmu_lock); >>>>>           pgd = kvm->arch.pgd + stage2_pgd_index(addr); >>>>>           do { >>>>> >>>>> I need to point out that I never reproduced it without this debugging patch. >>>> >>>> That could trigger the panic iff your "size" <= 0x200, leading to the >>>> condition (end < start), which can make the loop go forever, as we >>>> do while(addr < end) and end up accessing something which may not be PGD entry >>>> and thus get a bad page with bad numbers all around. This case could be hit >>>> only >>>> with your change and the bug in the KSM which gives us an address near the page >>>> boundary. >>> No, I injected the fault on purpose to simulate the case when size is less than >>> PAGE_SIZE(eg. PAGE_SIZE-0x200=65024) >>> I ever got the panic info [1] *without* the debugging patch only once >>> >>> [1] https://lkml.org/lkml/2018/5/9/992 >>>> >>>> So, I think we can safely ignore the PANIC(). >>>> More below. > > I am a bit confused now. Do you mean, the panic was triggered *without* the > debugging > patch ? If that is the case, it is really worrying. > Hi Suzuki, sorry for my unclear decription before. Without ksm fixing patch and injecting fault debugging patch, I ever met WARN_ON storm(easily reproduced) and panic on unmap_stage2_range(only once). After the injecting fault debugging patch, the panic can be easily reproduced, hence I thought the panic was also caused by the same root cause. After ksm fixing patch, the WARN_ON and panic are both disappearred. What I mean is: 1. if PAGE_SIZE alignment *should* be guaranteed in handle_hva_to_gpa, it would be better to add a WARN_ON or WARN_ON_ONCE in unmap_stage2_range() 2. if PAGE_SIZE alignment is *not* guaranteed, handle_hva_to_gpa needs to handle the unalignment cases, I will propose to change the codes in handle_hva_to_gpa. eg. use (gpa_end-gpa) instead of (hva_end - hva_start) Cheers, Jia >>>> >>>> >>>>>>> Suzuki, thanks for the comments. >>>>>>> >>>>>>> I proposed another ksm patch https://lkml.org/lkml/2018/5/3/1042 >>>>>>> The root cause is ksm will add some extra flags to indicate that the page >>>>>>> is in/not_in the stable tree. This makes address not be aligned with >>>>>>> PAGE_SIZE. >>>>>> Thanks for the pointer. In the future, please Cc the people relevant to the >>>>>> discussion in the patches. >>>>>> >>>>>>>    From arm kvm mmu point of view, do you think handle_hva_to_gpa still need >>>>>>> to handle >>>>>>> the unalignment case? >>>>>> I don't think we should do that. Had we done this, we would never have caught >>>>>> this bug >>>>>> in KSM. Eventually if some other new implementation comes up with the a new >>>>>> notifier >>>>>> consumer which doesn't check alignment and doesn't WARN, it could simply do >>>>>> the wrong >>>>>> thing. So I believe what we have is a good measure to make sure that >>>>>> things are >>>>>> in the right order. >>>>>> >>>>>>> IMO, the PAGE_SIZE alignment is still needed because we should not let the >>>>>>> bottom function >>>>>>> kvm_age_hva_handler to handle the exception. Please refer to the >>>>>>> implementation in X86 and >>>>>>> powerpc kvm_handle_hva_range(). They both aligned the hva with >>>>>>> hva_to_gfn_memslot. >>>>>>> >>>>>>    From an API perspective, you are passed on a "start" and "end" address. >>>>>> So, >>>>>> you could potentially >>>>>> do the wrong thing if you align the "start" and "end". May be those handlers >>>>>> should also do the >>>>>> same thing as we do. >>>> >>>>> But handle_hva_to_gpa has partially adjusted the alignment possibly: >>>>>      1750         kvm_for_each_memslot(memslot, slots) { >>>>>      1751                 unsigned long hva_start, hva_end; >>>>>      1752                 gfn_t gpa; >>>>>      1753 >>>>>      1754                 hva_start = max(start, memslot->userspace_addr); >>>>>      1755                 hva_end = min(end, memslot->userspace_addr + >>>>>      1756                             (memslot->npages << PAGE_SHIFT)); >>>>> >>>>> at line 1755, let us assume that end=0x12340200 and >>>>> memslot->userspace_addr + (memslot->npages << PAGE_SHIFT)=0x12340000 >>>>> Then, hva_start is not page_size aligned and hva_end is aligned, and the size >>>>> will be PAGE_SIZE-0x200, >>>>> just as what I had done in the inject fault debugging patch. >>>> >>>> Thats because we want to limit the handling of the hva/gpa range by memslot. >>>> So, >>>> we make sure we pass on the range within the given memslot >>>> to hva_to_gfn_memslot(). But we do iterate over the next memslot if the >>>> original range falls in to the next slot. So, in practice, there is no >>>> alignment/trimming of the range. Its just that we pass on the appropriate range >>>> for each slot. >>>> >>> Yes, I understand what the codes did in hva_to_gfn_memslot(). What I mean is >>> hva_end may be changed and (hva_end - hva_start) will not be same as the >>> parameter _size_ ? >>> >>>> ret |= handler(kvm, gpa, (u64)(hva_end - hva_start), data); >>> >>> Anyway, I have to admit that all the exceptions are originally caused by the >>> STABLE_FLAG in ksm code. What I want to discuss here is how to make arm kvm >>> handle the exception more gracefully. > > Thats my point. We need the fix in ksm. Once we have the fix in place, do > we hit the WARN_ON() any more ? > >>> >> Hi Suzuki >> How about this patch (balance of avoiding the WARN_ON storm and debugging >> convenience): > > The problem with WARN_ON_ONCE() is, it could potentially suppress the different > call paths that could lead to the triggers. e.g, unmap_stage2_range() could be > called from different paths and having a WARN_ON_ONCE() could potentially > hide the other call paths. > >> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c >> index 7f6a944..4033946 100644 >> --- a/virt/kvm/arm/mmu.c >> +++ b/virt/kvm/arm/mmu.c >> @@ -297,6 +297,8 @@ static void unmap_stage2_range(struct kvm *kvm, phys_addr_t >> start, u64 size) >>       phys_addr_t next; >> >>       assert_spin_locked(&kvm->mmu_lock); >> + >> +    WARN_ON_ONCE(size & ~PAGE_MASK); >>       pgd = kvm->arch.pgd + stage2_pgd_index(addr); >>       do { >>           /* >> @@ -1792,7 +1794,7 @@ static int kvm_set_spte_handler(struct kvm *kvm, gpa_t >> gpa, u64 size, void *data >>   { >>       pte_t *pte = (pte_t *)data; >> >> -    WARN_ON(size != PAGE_SIZE); >> +    WARN_ON_ONCE(size != PAGE_SIZE); >>       /* >>        * We can always call stage2_set_pte with KVM_S2PTE_FLAG_LOGGING_ACTIVE >>        * flag clear because MMU notifiers will have unmapped a huge PMD before >> @@ -1823,7 +1825,7 @@ static int kvm_age_hva_handler(struct kvm *kvm, gpa_t gpa, >> u64 size, void *data) >>       pmd_t *pmd; >>       pte_t *pte; >> >> -    WARN_ON(size != PAGE_SIZE && size != PMD_SIZE); >> +    WARN_ON_ONCE(size != PAGE_SIZE && size != PMD_SIZE); >>       pmd = stage2_get_pmd(kvm, NULL, gpa); >>       if (!pmd || pmd_none(*pmd))    /* Nothing there */ >>           return 0; >> @@ -1843,7 +1845,7 @@ static int kvm_test_age_hva_handler(struct kvm *kvm, gpa_t >> gpa, u64 size, void * >>       pmd_t *pmd; >>       pte_t *pte; >> >> -    WARN_ON(size != PAGE_SIZE && size != PMD_SIZE); >> +    WARN_ON_ONCE(size != PAGE_SIZE && size != PMD_SIZE); >>       pmd = stage2_get_pmd(kvm, NULL, gpa); >>       if (!pmd || pmd_none(*pmd))    /* Nothing there */ >>           return 0; >> > > > Cheers > Suzuki >