Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp547989imm; Tue, 15 May 2018 05:45:42 -0700 (PDT) X-Google-Smtp-Source: AB8JxZo5H5RtRNWyrgBhQKjAZ/QDa3RW1ZlPhW8+Iuj6Z+iQOOKYnzhg5+UML1pqTBus7+709VpJ X-Received: by 2002:a63:894a:: with SMTP id v71-v6mr12365197pgd.423.1526388342537; Tue, 15 May 2018 05:45:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1526388342; cv=none; d=google.com; s=arc-20160816; b=eh6XE059ftMrqIKodW/DeB1xpcZwYvAomHfPxqSzRsEP7LbKYJ8VHVLDt2elAIyhci b96/HmySX87Np6dGXvoJ7FOnf0lFybcLDnYOtKMT0Hsb3JHhhqQjyW4f9fJ6WxIAFrAt UKj4+pHSr6ZuGHvJ7z99ESy4vjgvJSSOXRCGD1uXIt2cdUx+iKwJuQ3Tu2s1FSo9azf0 IpckvJFi2DkW0QpYUYWlq0SNhDCi+P51g2ydhHBKgA0eajMoCQyxs6Hbxp9JSGxS76Vr qxRofhtbStUIhGwkLUp5ET/2coAAQO7qGHIrDtyX3ZkriVBEoxKpr2C188g9Sb4q4Z3q KUCA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:in-reply-to :mime-version:user-agent:date:message-id:from:references:cc:to :subject:dkim-signature:arc-authentication-results; bh=T5ZZvDj1xf4czIpQ7QFEcIAXhCwODDzbjNRj1nq/aPg=; b=Tj/oq/Y7NeHC0WFnHTZNVUkTTbQr/GEJq6YDsY6LlxvEzReZOho85WFu8PSEVDZbYJ g8uUUbMMu4L81MDUlx6pUcJqkGB0PMAhfawYJno4R2NuDAPjLchs74zWxgVtV0oWp3mg Y2f2fB9uSCRzHHMmYPR8a2XyZNm/yoncJ5q7FBM9/L44OnKnPdGzoS3KQ8Y3zLH+M6HH 9ONptAjWM5+BsdI2xfX9/8+KHDIIkKqjnlySq/lxDsQNDp0s8YsaxPvyhGhTrAOOccij bcM+jIyPZbeYxeoYNKeqRB17qPxHVyEYE4D96jNi2F8W2IL0IZw+LXMZPQ3T6QxzjPMG ewfA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=JMaTIrzd; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a13-v6si11904686pfd.338.2018.05.15.05.45.27; Tue, 15 May 2018 05:45:42 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=JMaTIrzd; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752854AbeEOMis (ORCPT + 99 others); Tue, 15 May 2018 08:38:48 -0400 Received: from mail-pf0-f180.google.com ([209.85.192.180]:46733 "EHLO mail-pf0-f180.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752227AbeEOMiq (ORCPT ); Tue, 15 May 2018 08:38:46 -0400 Received: by mail-pf0-f180.google.com with SMTP id p12-v6so7669614pff.13 for ; Tue, 15 May 2018 05:38:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-transfer-encoding; bh=T5ZZvDj1xf4czIpQ7QFEcIAXhCwODDzbjNRj1nq/aPg=; b=JMaTIrzdoMkHX4klNUgWL+fXttmC5tDL/weL9HGBxF+nEadsNjMF/fcSQrL/hXs97v QsCWwDU2WaLjoIkEqC81MlPrhZabvPGK20GjSc1CeJzN9bMjR9lozetXYq27A36akAEe WEeO8xnHkI+pUsKjkap82+m/2bUYVDK7o9vrY/x/SNVEs4lrn4vCCfa4ORiHT2cQQ5IG PL+DIwgZKpzJseESSpdGOscmflu1K4V5LW8s1BTn/ZgzkaDjmOXWraLI7u0tnRLTHpMA p7wj2nNQp4HoQHFnolFkyuftuh6qYuvrnJvk37/Rg+WBcRK4wYIWoEdD4K8cGMtBqfjI W9Eg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=T5ZZvDj1xf4czIpQ7QFEcIAXhCwODDzbjNRj1nq/aPg=; b=ttjh6oC0FHiEVrrBm/OwUOZYltyLmauuHIg/IustsH5TpHQRa94LMOboJimWOKCOUL jBVlZPgxwCjsD7HX/6cv9guMGrcM7Jsop05jJ003VCykos3h6vuqr/YRf6GJxNNbSwaW hLn7Ah0GN2jMcYLiRtzFyFqyigZ1MueN4nAng5t/Z3UoMXzEd60c9ueN8zLzWyw4YEKR 1/toK1HpdgCNyFFUMFbTjqe20n3/cns9UEn7BFticKr10O9BBSxFEOqwvHRdQimFzWOj QLiu1YJAhHvMn9FPjYta2wcDCa7QpDEHCpyTBMtKnIDn/iiPs2wT4XcQoypdnccBHTzl hKxg== X-Gm-Message-State: ALKqPweYMeN32Xxcggh/yKE3RrrTaycEGXAaWx54JilN0uoXGGO0pCxk R8guUSOSHXE5MLUeetaVbbQ= X-Received: by 2002:a63:9612:: with SMTP id c18-v6mr12063072pge.361.1526387926340; Tue, 15 May 2018 05:38:46 -0700 (PDT) Received: from [0.0.0.0] (67.216.217.169.16clouds.com. [67.216.217.169]) by smtp.gmail.com with ESMTPSA id k193-v6sm2037739pgc.39.2018.05.15.05.38.37 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 15 May 2018 05:38:45 -0700 (PDT) Subject: Re: [PATCH] KVM: arm/arm64: fix unaligned hva start and end in handle_hva_to_gpa To: Suzuki K Poulose , Marc Zyngier , Christoffer Dall , linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu Cc: linux-kernel@vger.kernel.org, Jia He , li.zhang@hxt-semitech.com, hughd@google.com, Andrea Arcangeli ";" Minchan Kim ";" Claudio Imbrenda ";" Arvind Yadav ";" Mike Rapoport , akpm@linux-foundation.org References: <1525244911-5519-1-git-send-email-hejianet@gmail.com> <04e6109f-cbbd-24b0-03bb-9247b930d42c@arm.com> <85e04362-05dd-c697-e9c4-ad5824e63819@gmail.com> <0f134188-12d5-7184-3fbd-0ec8204cf649@arm.com> <695beacb-ff51-bb2c-72ef-d268f7d4e59d@arm.com> <1e065e9b-4dad-611d-fc5b-26fe6c031507@gmail.com> <14954b3a-6a2a-fe05-47b7-1890375ab8a4@arm.com> From: Jia He Message-ID: Date: Tue, 15 May 2018 20:38:32 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: <14954b3a-6a2a-fe05-47b7-1890375ab8a4@arm.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Suzuki On 5/15/2018 4:36 PM, Suzuki K Poulose Wrote: > > Hi Jia > > On 05/15/2018 03:03 AM, Jia He wrote: >> Hi Suzuki >> >> I will merge the other thread into this, and add the necessary CC list >> >> That WARN_ON call trace is very easy to reproduce in my armv8a server after I >> start 20 guests >> >> and run memhog in the host. Of course, ksm should be enabled >> >> For you question about my inject fault debug patch: > > > Thanks for the patch, comments below. > >> > > ... > >> index 7f6a944..ab8545e 100644 >> --- a/virt/kvm/arm/mmu.c >> +++ b/virt/kvm/arm/mmu.c >> @@ -290,12 +290,17 @@ static void unmap_stage2_puds(struct kvm *kvm, pgd_t *pgd, >>    * destroying the VM), otherwise another faulting VCPU may come in and mess >>    * with things behind our backs. >>    */ >> +extern int trigger_by_ksm; >>   static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size) >>   { >>          pgd_t *pgd; >>          phys_addr_t addr = start, end = start + size; >>          phys_addr_t next; >> >> +       if(trigger_by_ksm) { >> +               end -= 0x200; >> +       } >> + >>          assert_spin_locked(&kvm->mmu_lock); >>          pgd = kvm->arch.pgd + stage2_pgd_index(addr); >>          do { >> >> I need to point out that I never reproduced it without this debugging patch. > > That could trigger the panic iff your "size" <= 0x200, leading to the > condition (end < start), which can make the loop go forever, as we > do while(addr < end) and end up accessing something which may not be PGD entry > and thus get a bad page with bad numbers all around. This case could be hit only > with your change and the bug in the KSM which gives us an address near the page > boundary. No, I injected the fault on purpose to simulate the case when size is less than PAGE_SIZE(eg. PAGE_SIZE-0x200=65024) I ever got the panic info [1] *without* the debugging patch only once [1] https://lkml.org/lkml/2018/5/9/992 > > So, I think we can safely ignore the PANIC(). > More below. > > >>>> Suzuki, thanks for the comments. >>>> >>>> I proposed another ksm patch https://lkml.org/lkml/2018/5/3/1042 >>>> The root cause is ksm will add some extra flags to indicate that the page >>>> is in/not_in the stable tree. This makes address not be aligned with PAGE_SIZE. >>> Thanks for the pointer. In the future, please Cc the people relevant to the >>> discussion in the patches. >>> >>>>   From arm kvm mmu point of view, do you think handle_hva_to_gpa still need >>>> to handle >>>> the unalignment case? >>> I don't think we should do that. Had we done this, we would never have caught >>> this bug >>> in KSM. Eventually if some other new implementation comes up with the a new >>> notifier >>> consumer which doesn't check alignment and doesn't WARN, it could simply do >>> the wrong >>> thing. So I believe what we have is a good measure to make sure that things are >>> in the right order. >>> >>>> IMO, the PAGE_SIZE alignment is still needed because we should not let the >>>> bottom function >>>> kvm_age_hva_handler to handle the exception. Please refer to the >>>> implementation in X86 and >>>> powerpc kvm_handle_hva_range(). They both aligned the hva with >>>> hva_to_gfn_memslot. >>>> >>>   From an API perspective, you are passed on a "start" and "end" address. So, >>> you could potentially >>> do the wrong thing if you align the "start" and "end". May be those handlers >>> should also do the >>> same thing as we do. > >> But handle_hva_to_gpa has partially adjusted the alignment possibly: >>     1750         kvm_for_each_memslot(memslot, slots) { >>     1751                 unsigned long hva_start, hva_end; >>     1752                 gfn_t gpa; >>     1753 >>     1754                 hva_start = max(start, memslot->userspace_addr); >>     1755                 hva_end = min(end, memslot->userspace_addr + >>     1756                             (memslot->npages << PAGE_SHIFT)); >> >> at line 1755, let us assume that end=0x12340200 and >> memslot->userspace_addr + (memslot->npages << PAGE_SHIFT)=0x12340000 >> Then, hva_start is not page_size aligned and hva_end is aligned, and the size >> will be PAGE_SIZE-0x200, >> just as what I had done in the inject fault debugging patch. > > Thats because we want to limit the handling of the hva/gpa range by memslot. So, > we make sure we pass on the range within the given memslot > to hva_to_gfn_memslot(). But we do iterate over the next memslot if the > original range falls in to the next slot. So, in practice, there is no > alignment/trimming of the range. Its just that we pass on the appropriate range > for each slot. > Yes, I understand what the codes did in hva_to_gfn_memslot(). What I mean is hva_end may be changed and (hva_end - hva_start) will not be same as the parameter _size_ ? >ret |= handler(kvm, gpa, (u64)(hva_end - hva_start), data); Anyway, I have to admit that all the exceptions are originally caused by the STABLE_FLAG in ksm code. What I want to discuss here is how to make arm kvm handle the exception more gracefully. -- Cheers, Jia