Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp294043imm; Tue, 15 May 2018 01:37:40 -0700 (PDT) X-Google-Smtp-Source: AB8JxZpwNjD4MVpP6SRHzhMyrYnurMmRciklxl3uinN611yedqTFVRnOBGlsAj+MJ8uuNyzUaztP X-Received: by 2002:a63:7a5e:: with SMTP id j30-v6mr11556268pgn.145.1526373460790; Tue, 15 May 2018 01:37:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1526373460; cv=none; d=google.com; s=arc-20160816; b=GZJVOBI+RccUpVHy2Mw0jtRhiNgkuepFHpHifV5Fa8CFNUrMqCKlQ97XdaGyj5C6b8 IHnr07APvByL5nr3Z7Cg6k+d8mLR07gBufycqMueXFBOrFWh9aD78OxaAbdb5Atj+0ZM a/Y5GQ1rCM55N6LsJsplFKA2qE/5XMrazGZLKNAfJbxw96ouLX5aVAyWkg+l7PZNdXtl p0NiIdndMU4LPBx+IeQ9R9XrIwV8Wq1vxnR6cCeTsdbjZ76EPzR+axrQoFVLBM1feGmO nJlr8FNCjHb0R+O9fdmMa4Cvdy//RZKlZnfaFF5URxRpeCg8ZGW43lAbqwQcGoun4HY9 WLGw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:arc-authentication-results; bh=hBJFY4G/7nR22okJHGvDhMYj6olscwGgUKH9sH4cZvM=; b=rKcVep9Kt75waYe9Y20Q5aDFHvu72vlHKzLwgGkzG0W0h7k16YfA6IrL6UDHVJPMNX xC72jppHluSqRlD+Ce+r6lCgOQLkFZF17I4OQ4nF3ZqjkoXcVgtd3UQZjKJ6RlzKZQUx JJAdEdvi5juB8CKmgBo78wUl71W5WBybnXI83XlxV1WbbTwgDgVA1n8uHDgbz0ugdw+k AQY726PtFv4IFETO9sBTbTSZU3iHoZNk0TReVE//Szuxvzp6IRsg9nNVrV6P4JWo4Cea ryF1gx+xopH5RQb/J2wr5wPmf3kJFlZk4ep3s1mHpR9MUJdxOgj2IExo5Jc7YfdGVIa6 erew== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g77-v6si11672331pfa.304.2018.05.15.01.37.26; Tue, 15 May 2018 01:37:40 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752519AbeEOIge (ORCPT + 99 others); Tue, 15 May 2018 04:36:34 -0400 Received: from foss.arm.com ([217.140.101.70]:55754 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752129AbeEOIgc (ORCPT ); Tue, 15 May 2018 04:36:32 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 798531435; Tue, 15 May 2018 01:36:32 -0700 (PDT) Received: from [192.168.0.21] (usa-sjc-mx-foss1.foss.arm.com [217.140.101.70]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id F227C3F25D; Tue, 15 May 2018 01:36:29 -0700 (PDT) Subject: Re: [PATCH] KVM: arm/arm64: fix unaligned hva start and end in handle_hva_to_gpa To: Jia He , Marc Zyngier , Christoffer Dall , linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu Cc: linux-kernel@vger.kernel.org, Jia He , li.zhang@hxt-semitech.com, hughd@google.com, Andrea Arcangeli ";" Minchan Kim ";" Claudio Imbrenda ";" Arvind Yadav ";" Mike Rapoport , akpm@linux-foundation.org References: <1525244911-5519-1-git-send-email-hejianet@gmail.com> <04e6109f-cbbd-24b0-03bb-9247b930d42c@arm.com> <85e04362-05dd-c697-e9c4-ad5824e63819@gmail.com> <0f134188-12d5-7184-3fbd-0ec8204cf649@arm.com> <695beacb-ff51-bb2c-72ef-d268f7d4e59d@arm.com> <1e065e9b-4dad-611d-fc5b-26fe6c031507@gmail.com> From: Suzuki K Poulose Message-ID: <14954b3a-6a2a-fe05-47b7-1890375ab8a4@arm.com> Date: Tue, 15 May 2018 09:36:32 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: <1e065e9b-4dad-611d-fc5b-26fe6c031507@gmail.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Jia On 05/15/2018 03:03 AM, Jia He wrote: > Hi Suzuki > > I will merge the other thread into this, and add the necessary CC list > > That WARN_ON call trace is very easy to reproduce in my armv8a server > after I start 20 guests > > and run memhog in the host. Of course, ksm should be enabled > > For you question about my inject fault debug patch: Thanks for the patch, comments below. > ... > index 7f6a944..ab8545e 100644 > --- a/virt/kvm/arm/mmu.c > +++ b/virt/kvm/arm/mmu.c > @@ -290,12 +290,17 @@ static void unmap_stage2_puds(struct kvm *kvm, > pgd_t *pgd, >   * destroying the VM), otherwise another faulting VCPU may come in and > mess >   * with things behind our backs. >   */ > +extern int trigger_by_ksm; >  static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 > size) >  { >         pgd_t *pgd; >         phys_addr_t addr = start, end = start + size; >         phys_addr_t next; > > +       if(trigger_by_ksm) { > +               end -= 0x200; > +       } > + >         assert_spin_locked(&kvm->mmu_lock); >         pgd = kvm->arch.pgd + stage2_pgd_index(addr); >         do { > > I need to point out that I never reproduced it without this debugging > patch. That could trigger the panic iff your "size" <= 0x200, leading to the condition (end < start), which can make the loop go forever, as we do while(addr < end) and end up accessing something which may not be PGD entry and thus get a bad page with bad numbers all around. This case could be hit only with your change and the bug in the KSM which gives us an address near the page boundary. So, I think we can safely ignore the PANIC(). More below. >>> Suzuki, thanks for the comments. >>> >>> I proposed another ksm patch https://lkml.org/lkml/2018/5/3/1042 >>> The root cause is ksm will add some extra flags to indicate that the >>> page >>> is in/not_in the stable tree. This makes address not be aligned with >>> PAGE_SIZE. >> Thanks for the pointer. In the future, please Cc the people relevant >> to the >> discussion in the patches. >> >>>   From arm kvm mmu point of view, do you think handle_hva_to_gpa >>> still need to handle >>> the unalignment case? >> I don't think we should do that. Had we done this, we would never have >> caught this bug >> in KSM. Eventually if some other new implementation comes up with the >> a new notifier >> consumer which doesn't check alignment and doesn't WARN, it could >> simply do the wrong >> thing. So I believe what we have is a good measure to make sure that >> things are >> in the right order. >> >>> IMO, the PAGE_SIZE alignment is still needed because we should not >>> let the bottom function >>> kvm_age_hva_handler to handle the exception. Please refer to the >>> implementation in X86 and >>> powerpc kvm_handle_hva_range(). They both aligned the hva with >>> hva_to_gfn_memslot. >>> >>   From an API perspective, you are passed on a "start" and "end" >> address. So, you could potentially >> do the wrong thing if you align the "start" and "end". May be those >> handlers should also do the >> same thing as we do. > But handle_hva_to_gpa has partially adjusted the alignment possibly: >    1750         kvm_for_each_memslot(memslot, slots) { >    1751                 unsigned long hva_start, hva_end; >    1752                 gfn_t gpa; >    1753 >    1754                 hva_start = max(start, memslot->userspace_addr); >    1755                 hva_end = min(end, memslot->userspace_addr + >    1756                             (memslot->npages << PAGE_SHIFT)); > > at line 1755, let us assume that end=0x12340200 and > memslot->userspace_addr + (memslot->npages << PAGE_SHIFT)=0x12340000 > Then, hva_start is not page_size aligned and hva_end is aligned, and the > size will be PAGE_SIZE-0x200, > just as what I had done in the inject fault debugging patch. Thats because we want to limit the handling of the hva/gpa range by memslot. So, we make sure we pass on the range within the given memslot to hva_to_gfn_memslot(). But we do iterate over the next memslot if the original range falls in to the next slot. So, in practice, there is no alignment/trimming of the range. Its just that we pass on the appropriate range for each slot. Cheers Suzuki