Received: by 2002:ab2:4a89:0:b0:1f4:a8b6:6e69 with SMTP id w9csp3201lqj; Wed, 10 Apr 2024 01:46:42 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCVa2qbyZzCRbJxRhpQOsbWgeiXNL+kveF80wvy9r/dGvTAfN4ApumOVkmLSaMzdBc0DaAjt/ZoqkfQVWHAJifblW9jrvKFOu99buCs3JQ== X-Google-Smtp-Source: AGHT+IGPRuSkg8zc4ghaI9OLYH3nFKsh9YlOOQwLhhA25Bqz7u9erZOoOdLZSj1PI40Ee9q4QH1c X-Received: by 2002:ac8:7f95:0:b0:434:dce5:c0c3 with SMTP id z21-20020ac87f95000000b00434dce5c0c3mr2263243qtj.2.1712738802469; Wed, 10 Apr 2024 01:46:42 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1712738802; cv=pass; d=google.com; s=arc-20160816; b=wqu0rG+MWtZJ8u19WoX2qK5cBgpkxSXZX0K3Fav1sh8n4v7fP6NZNBP1oZ5gmHXr4C 5WUnUuVKNjqZxVq+EPPo2eoNsWd8v0WJZpmlCMmpHI93+mkr91hcXMQ8295AM8DrZl5g 1AkeAwhaU5jwhJN3XJ6umubfX8Wd+m0ytKhmLqdfhRD1s6WkwwUWHMoAWaEUfdQCidld IduZlHLfgMw7HDW0iEIu/lGd1+BITfaDaLx1wLEW4AP6fE1jE1wTwhKqWjCCqjNwccZm a+oO+d5Lg8D7qHTz+NyN64UM6kGVZ4vFbtZBPjZhl6sileb9uHQf61NQF/MfWUI1zVq7 ZwwQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:list-unsubscribe:list-subscribe:list-id:precedence :user-agent:references:in-reply-to:subject:cc:to:from:message-id :date:dkim-signature; bh=Wh496kURRDNyw2T5yTziAC7jrFqv+UCK0BNVpn87RyI=; fh=PaShlIYoU5rtOr6vlBEnC/2u2YQZPPnKAwcmE7paN3Y=; b=lJ08cGIUTf+hFhlNx4vmiRqMv/2MtPzCX1/1NLR0tQC8WFLtRsXr2b0iG/hLIKaqHW PvacbYwnzt1B713i1ikwzYytc5giJKVysY/IE/cdXi1K/QRdJSF3aN3ZZd2wd5sK8bRN JyK78SM+nKUA7mFmFUkA05HRN8QASGYoi7eiPRazUys7u2T7jsH4ejrV5uFtNyFhHdw/ hYidlEmBqyXVgcQM4fyaH4wZRPkMEsoyIfUper1z6uM44Sl7/4qBzlqKsfRpYhpPKOU2 D3hnXM7QDY4Zo1lOWHapJLb3gfVRQ43B7sbiNottTh0mOHtVJ99mBQ3RfzObNmXPFMln ewlA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=nZnkn8Hr; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-138196-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-138196-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id b17-20020a05622a021100b00432e741f154si11457507qtx.623.2024.04.10.01.46.41 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 10 Apr 2024 01:46:42 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-138196-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=nZnkn8Hr; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-138196-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-138196-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id C66831C2240F for ; Wed, 10 Apr 2024 08:46:41 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 43A881552E8; Wed, 10 Apr 2024 08:45:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="nZnkn8Hr" Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 49B98154C0F for ; Wed, 10 Apr 2024 08:45:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712738740; cv=none; b=LpmmQuQ8jT7ezBTBpTlXNtpNkwiwjyB7zDLrmBZ6gIrgqdjCMv5uDr1bCDZju088Tbt26jgTetrVRB5jTkHPk3vCVrIzEQl+zLoYUFP/q9IRxhEt+tDxxo1AWNp81XtACJNgjrwXxm5CR5g7uHTtpbm14/moTiClV1ns3uTunCg= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712738740; c=relaxed/simple; bh=rjpJ/B0jBLfP6NqGowlaRPiXpWGZqILHxap2uCmUq7E=; h=Date:Message-ID:From:To:Cc:Subject:In-Reply-To:References: MIME-Version:Content-Type; b=Gv6lq//iiArTBPjbROeNbqQQslg/u81sbj0WoR4I4tXxotAzpGXTUnBDAcq3m01B2Em25GA/2kFv+RQWHz6z0vyNH/mUS83ltjakKlvfeIqRdMqOcfNZvdYQ4Vz+jw+u2VVR3bpj5a7u50uEYXPCPwippxfzfeMSZ4BYLESeQA8= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=nZnkn8Hr; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id D033EC433C7; Wed, 10 Apr 2024 08:45:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1712738739; bh=rjpJ/B0jBLfP6NqGowlaRPiXpWGZqILHxap2uCmUq7E=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=nZnkn8HrApjPn+iBhw/Z53N0bB5DkavtqggzdJ1FKc6z1KIgt0UcANOWxTPYTfGOd b9mtO7drDcTBlWti/u/3DvjOkvxj5wYBbafkNTnrgekkExcrFU7dx7o2mATD9GZnNS iLWh1Th97DjMhQdxJ6h1Ve+tuxUIRjwM+UiRFcSbriQN4RK2cutC6Gfa772FQQdzCi Rp8IRo65oXNhaMrn0C12M4J//fX5ggxoWjij075NKo1+RkQ8si6QmEBR753jNfEPR6 PCJxjwLBozu7H4wLtaxJ1xEY2Edtl00qQ+bsW8x0oEwpRWeYDW2cpbQUA9kB1LWABC y/B88RqarDZzQ== Received: from sofa.misterjones.org ([185.219.108.64] helo=goblin-girl.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1ruTaL-0034tA-JX; Wed, 10 Apr 2024 09:45:37 +0100 Date: Wed, 10 Apr 2024 09:45:37 +0100 Message-ID: <86wmp5sj0u.wl-maz@kernel.org> From: Marc Zyngier To: Ryan Roberts Cc: Gavin Shan , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, catalin.marinas@arm.com, will@kernel.org, akpm@linux-foundation.org, oliver.upton@linux.dev, apopple@nvidia.com, rananta@google.com, mark.rutland@arm.com, v-songbaohua@oppo.com, yangyicong@hisilicon.com, shahuang@redhat.com, yihyu@redhat.com, shan.gavin@gmail.com Subject: Re: [PATCH v3 1/3] arm64: tlb: Fix TLBI RANGE operand In-Reply-To: <7a929104-5f09-4ff6-8792-4a9e93bc0894@arm.com> References: <20240405035852.1532010-1-gshan@redhat.com> <20240405035852.1532010-2-gshan@redhat.com> <7a929104-5f09-4ff6-8792-4a9e93bc0894@arm.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/29.2 (aarch64-unknown-linux-gnu) MULE/6.0 (HANACHIRUSATO) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: ryan.roberts@arm.com, gshan@redhat.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, catalin.marinas@arm.com, will@kernel.org, akpm@linux-foundation.org, oliver.upton@linux.dev, apopple@nvidia.com, rananta@google.com, mark.rutland@arm.com, v-songbaohua@oppo.com, yangyicong@hisilicon.com, shahuang@redhat.com, yihyu@redhat.com, shan.gavin@gmail.com X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false On Mon, 08 Apr 2024 09:29:31 +0100, Ryan Roberts wrote: > > On 05/04/2024 04:58, Gavin Shan wrote: > > KVM/arm64 relies on TLBI RANGE feature to flush TLBs when the dirty > > pages are collected by VMM and the page table entries become write > > protected during live migration. Unfortunately, the operand passed > > to the TLBI RANGE instruction isn't correctly sorted out due to the > > commit 117940aa6e5f ("KVM: arm64: Define kvm_tlb_flush_vmid_range()"). > > It leads to crash on the destination VM after live migration because > > TLBs aren't flushed completely and some of the dirty pages are missed. > > > > For example, I have a VM where 8GB memory is assigned, starting from > > 0x40000000 (1GB). Note that the host has 4KB as the base page size. > > In the middile of migration, kvm_tlb_flush_vmid_range() is executed > > to flush TLBs. It passes MAX_TLBI_RANGE_PAGES as the argument to > > __kvm_tlb_flush_vmid_range() and __flush_s2_tlb_range_op(). SCALE#3 > > and NUM#31, corresponding to MAX_TLBI_RANGE_PAGES, isn't supported > > by __TLBI_RANGE_NUM(). In this specific case, -1 has been returned > > from __TLBI_RANGE_NUM() for SCALE#3/2/1/0 and rejected by the loop > > in the __flush_tlb_range_op() until the variable @scale underflows > > and becomes -9, 0xffff708000040000 is set as the operand. The operand > > is wrong since it's sorted out by __TLBI_VADDR_RANGE() according to > > invalid @scale and @num. > > > > Fix it by extending __TLBI_RANGE_NUM() to support the combination of > > SCALE#3 and NUM#31. With the changes, [-1 31] instead of [-1 30] can > > be returned from the macro, meaning the TLBs for 0x200000 pages in the > > above example can be flushed in one shoot with SCALE#3 and NUM#31. The > > macro TLBI_RANGE_MASK is dropped since no one uses it any more. The > > comments are also adjusted accordingly. > > Perhaps I'm being overly pedantic, but I don't think the bug is > __TLBI_RANGE_NUM() not being able to return 31; It is clearly documented that it > can only return in the range [-1, 30] and a maximum of (MAX_TLBI_RANGE_PAGES - > 1) pages are supported. I guess "clearly" is pretty relative. I find it misleading that we don't support the full range of what the architecture offers and have these odd limitations. > The bug is in the kvm caller, which tries to call __flush_tlb_range_op() with > MAX_TLBI_RANGE_PAGES; clearly out-of-bounds. Nobody disputes that point, hence the Fixes: tag pointing to the KVM patch. But there are two ways to fix it: either reduce the amount KVM can use for range invalidation, or fix the helper to allow the full range offered by the architecture. > So personally, I would prefer to fix the bug first. Then separately > enhance the infrastructure to support NUM=31. I don't think this buys us much, apart from making it harder for people to know what they need/want/randomly-elect to backport. > > Fixes: 117940aa6e5f ("KVM: arm64: Define kvm_tlb_flush_vmid_range()") > > I would argue that the bug was actually introduced by commit 360839027a6e > ("arm64: tlb: Refactor the core flush algorithm of __flush_tlb_range"), which > separated the tlbi loop from the range size validation in __flush_tlb_range(). > Before this, all calls would have to go through __flush_tlb_range() and > therefore anything bigger than (MAX_TLBI_RANGE_PAGES - 1) pages would cause the > whole mm to be flushed. Although I get that bisect will lead to this one, so > that's probably the right one to highlight. I haven't tried to bisect it, only picked this as the obviously culprit. To your point, using __flush_tlb_range() made little sense for KVM -- what would be the vma here? Splitting the helper out was, I think the correct decision. But we of course lost sight of the __TLBI_RANGE_NUM limitation in the process. > I get why it was split, but perhaps it should have been split at a higher level; > If tlbi range is not supported, then KVM will flush the whole vmid. Would it be > better for KVM to follow the same pattern as __flush_tlb_range_nosync() and > issue per-block tlbis upto a max of MAX_DVM_OPS before falling back to the whole > vmid? And if tlbi range is supported, KVM uses it regardless of the size of the > range, whereas __flush_tlb_range_nosync() falls back to flush_tlb_mm() at a > certain size. It's not clear why this divergence is useful? That'd be a definitive improvement indeed, and would bring back some much needed consistency. > > Cc: stable@kernel.org # v6.6+ > > Reported-by: Yihuang Yu > > Suggested-by: Marc Zyngier > > Signed-off-by: Gavin Shan > > Anyway, the implementation looks correct, so: > > Reviewed-by: Ryan Roberts Thanks for that! M. -- Without deviation from the norm, progress is not possible.