Received: by 2002:ab2:1149:0:b0:1f3:1f8c:d0c6 with SMTP id z9csp2717886lqz; Wed, 3 Apr 2024 06:47:29 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCVvKRRmjjIZebYeHIzn+ibFMFjICuNxNuNn5di9QHG3LopobqCGi22tg/tY2mNWeSSRNp6tBWDf3GMJwpZjmiR9Xr/Fua1J0vQxATJnxA== X-Google-Smtp-Source: AGHT+IGyOvFq0/k0SygMpRzTkp+IyJhdS2aJXt3aTYNooBCvjEjkb7wb7rGJ7d7NFc9o9ESbJOwC X-Received: by 2002:a05:6871:690:b0:22a:28b3:3e9c with SMTP id l16-20020a056871069000b0022a28b33e9cmr17094105oao.18.1712152049533; Wed, 03 Apr 2024 06:47:29 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1712152049; cv=pass; d=google.com; s=arc-20160816; b=PFxF4YWzk1+yaqBQqKN39lwnBGJDU6TnCxtlNJhn5w2SGR8UAmAPphzHYwksu2MHKq EIDy9IxCH9a0XDVr7xfVs+8f6qNGcD2VxooNzBY3cBj2KS192tcPIW11+ajYtvU7yaZt zcERQ5y7RacJqp7RsC1mssenr1Q6Q27nsOTTjIxLy715oYq0gQOBneCpKTcVAUKYcybY gyeBQzqek50xEQCWq4MujcjQh6B97WgavxkWIuh9+H8b66JiqrrJA/GxYRUC1wtJQH8B YSkOp+bfrceagmWJt7Ln3uTsYiGCTG6QBQjXOGqc06w/55W0/FqXK8MqwzMYDkDwaaK3 lieA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:list-unsubscribe:list-subscribe:list-id:precedence :user-agent:references:in-reply-to:subject:cc:to:from:message-id :date:dkim-signature; bh=TbeDlqjwOsZerYSYtO45Qq7PrTY3oTFnR1Yu1Zl2+T0=; fh=r3kpkANNhlMvWD2tu6bE3piO+mMQzjgY8YsPgRZnRB8=; b=YwN62R2SgKP4Q+k0WIjYs/TvOoznCZ0PkITPxfI99idhWzUu2CM2bjMXXnMz5AG6Hd beeCpNFhGLcRG6blDHrbLMze6bt5CSypef/4RoBd+Yj2K7HQU5YuLEx9uhiIDwr/KDkH t1Rxz9Fn959BvM3iMt0L9LWma/P4knKVNlBU9sMTC0WOAXTbj0NaH9aGicqR3W/Ap+6K IWzt57A3iO307Ny9cVC8KBOutUzkF1+xE9TGk87HLkCdpl37GwErYHVxT4LIjHqi4BVQ shx/KaqvB+zGyBgsUV9nL6ry1+/bR3bQSd8NZy3yhsxhV+Xpnd1gSWCjrOVvhz2cmA3R rW6w==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=NYeBYAFB; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-129873-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-129873-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id fb16-20020a056a002d9000b006e6a4501674si13993867pfb.146.2024.04.03.06.47.29 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 03 Apr 2024 06:47:29 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-129873-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=NYeBYAFB; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-129873-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-129873-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id A312528628B for ; Wed, 3 Apr 2024 13:47:08 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id A39641487EA; Wed, 3 Apr 2024 13:47:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="NYeBYAFB" Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6439826AC8 for ; Wed, 3 Apr 2024 13:47:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712152023; cv=none; b=ku09NFF8B+uioo506E96FNJTNvN0f/ppaolkGbexpeHzRK7ZKZNwx4rlkOP7YhJdjnQi3us54lGoKMY2gukkXgIQJTcIYf8lK0sIvpqSo0/v6hyD19wmr8pW+/vXfYFTT2fDv+BqB0f5peJnHYpRqWBz9BfRdkXuVqDl6SvdYRU= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712152023; c=relaxed/simple; bh=dTgXZ+NSpVKduPQ/ipnnqvPmogOZCaXs2Qxwl7B7ZSE=; h=Date:Message-ID:From:To:Cc:Subject:In-Reply-To:References: MIME-Version:Content-Type; b=eD76KlLNMAaEZGUPXRMgK8sPENb6zzgXufc3s3IqH7u3xwqJw+2lqXAkKW0xYGIDqyCb3OHAUfhygMfl2yLgplJuj/zYwaZkwgXtSu8ZLKypycfuqu5cpjR8PgRaXT6ER9sL5M+7l51HXxUIJCSw6pxUnwjBtQh7yfYzl5WzCbY= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=NYeBYAFB; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id C6966C433F1; Wed, 3 Apr 2024 13:47:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1712152022; bh=dTgXZ+NSpVKduPQ/ipnnqvPmogOZCaXs2Qxwl7B7ZSE=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=NYeBYAFBeNAar3SAZz7bDpoMjDnCBsSwoPebptHl8AipcjdLnmyw4pIbqF2U2FbHB 7RJv5ZZfL/fUCuVMbr3jI3oufvlXTkmw4oClvDa1f04wUafxFRmwSD8ulz6O1+u+8H acvzoXIix7GC2rT0er3fsuOUIHgnoI/2OaSAV7eOiOgg6MvdkzDQsbyCYIuUDTDWIL rek0DflQuzM+YNLSru5IM3Dmj/o22DDf2k3YDGsF8YVfoLiNkFdHfyM8n8sB3sBrvy MEyATpUmHkAyXB+k/uilIJ80XESq++8YUzVAwmCCPXRs6W0QpxEIRPOikF1NkWoZDN 6H0Mv41/1px6w== Received: from [185.201.63.254] (helo=wait-a-minute.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1rs0wG-0016hj-FY; Wed, 03 Apr 2024 14:47:00 +0100 Date: Wed, 03 Apr 2024 14:44:21 +0100 Message-ID: <87jzlesgqy.wl-maz@kernel.org> From: Marc Zyngier To: Gavin Shan Cc: linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, catalin.marinas@arm.com, will@kernel.org, akpm@linux-foundation.org, apopple@nvidia.com, mark.rutland@arm.com, ryan.roberts@arm.com, rananta@google.com, yangyicong@hisilicon.com, v-songbaohua@oppo.com, yezhenyu2@huawei.com, yihyu@redhat.com, shan.gavin@gmail.com Subject: Re: [PATCH] arm64: tlb: Fix TLBI RANGE operand In-Reply-To: <20134458-508b-4853-a962-c1b44009e8ed@redhat.com> References: <20240403064929.1438475-1-gshan@redhat.com> <86edbmu8kn.wl-maz@kernel.org> <20134458-508b-4853-a962-c1b44009e8ed@redhat.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/28.2 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 185.201.63.254 X-SA-Exim-Rcpt-To: gshan@redhat.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, catalin.marinas@arm.com, will@kernel.org, akpm@linux-foundation.org, apopple@nvidia.com, mark.rutland@arm.com, ryan.roberts@arm.com, rananta@google.com, yangyicong@hisilicon.com, v-songbaohua@oppo.com, yezhenyu2@huawei.com, yihyu@redhat.com, shan.gavin@gmail.com X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false On Wed, 03 Apr 2024 12:37:30 +0100, Gavin Shan wrote: > > On 4/3/24 18:58, Marc Zyngier wrote: > > On Wed, 03 Apr 2024 07:49:29 +0100, > > Gavin Shan wrote: > >> > >> KVM/arm64 relies on TLBI RANGE feature to flush TLBs when the dirty > >> bitmap is collected by VMM and the corresponding PTEs need to be > >> write-protected again. Unfortunately, the operand passed to the TLBI > >> RANGE instruction isn't correctly sorted out by commit d1d3aa98b1d4 > >> ("arm64: tlb: Use the TLBI RANGE feature in arm64"). It leads to > >> crash on the destination VM after live migration because some of the > >> dirty pages are missed. > >> > >> For example, I have a VM where 8GB memory is assigned, starting from > >> 0x40000000 (1GB). Note that the host has 4KB as the base page size. > >> All TLBs for VM can be covered by one TLBI RANGE operation. However, > >> I receives 0xffff708000040000 as the operand, which is wrong and the > >> correct one should be 0x00007f8000040000. From the wrong operand, we > >> have 3 and 1 for SCALE (bits[45:44) and NUM (bits943:39], only 1GB > >> instead of 8GB memory is covered. > >> > >> Fix the macro __TLBI_RANGE_NUM() so that the correct NUM and TLBI > >> RANGE operand are provided. > >> > >> Fixes: d1d3aa98b1d4 ("arm64: tlb: Use the TLBI RANGE feature in arm64") > >> Cc: stable@kernel.org # v5.10+ > >> Reported-by: Yihuang Yu > >> Signed-off-by: Gavin Shan > >> --- > >> arch/arm64/include/asm/tlbflush.h | 2 +- > >> 1 file changed, 1 insertion(+), 1 deletion(-) > >> > >> diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h > >> index 3b0e8248e1a4..07c4fb4b82b4 100644 > >> --- a/arch/arm64/include/asm/tlbflush.h > >> +++ b/arch/arm64/include/asm/tlbflush.h > >> @@ -166,7 +166,7 @@ static inline unsigned long get_trans_granule(void) > >> */ > >> #define TLBI_RANGE_MASK GENMASK_ULL(4, 0) > >> #define __TLBI_RANGE_NUM(pages, scale) \ > >> - ((((pages) >> (5 * (scale) + 1)) & TLBI_RANGE_MASK) - 1) > >> + ((((pages) >> (5 * (scale) + 1)) - 1) & TLBI_RANGE_MASK) > >> /* > >> * TLB Invalidation > > > > This looks pretty wrong, by the very definition of the comment that's > > just above: > > > > > > /* > > * Generate 'num' values from -1 to 30 with -1 rejected by the > > * __flush_tlb_range() loop below. > > */ > > > > > > With your change, num can't ever be negative, and that breaks > > __flush_tlb_range_op(): > > > > > > num = __TLBI_RANGE_NUM(pages, scale); \ > > if (num >= 0) { \ > > addr = __TLBI_VADDR_RANGE(start >> shift, asid, \ > > scale, num, tlb_level); \ > > __tlbi(r##op, addr); \ > > if (tlbi_user) \ > > __tlbi_user(r##op, addr); \ > > start += __TLBI_RANGE_PAGES(num, scale) << PAGE_SHIFT; \ > > pages -= __TLBI_RANGE_PAGES(num, scale); \ > > } \ > > scale--; \ > > > > > > We'll then shove whatever value we've found in the TLBI operation, > > leading to unknown results instead of properly adjusting the scale to > > issue a smaller invalidation. > > > > Marc, thanks for your review and comments. > > Indeed, this patch is incomplete at least. I think we need __TLBI_RANGE_NUM() > to return [-1 31] instead of [-1 30], to be consistent with MAX_TLBI_RANGE_PAGES. > -1 will be rejected in the following loop. I'm not 100% sure if I did the correct > calculation though. > > /* > * Generate 'num' values in range [-1 31], but -1 will be rejected > * by the __flush_tlb_range() loop below. > */ > #define __TLBI_RANGE_NUM(pages, scale) \ > ({ \ > int __next = (pages) & (1ULL << (5 * (scale) + 6)); \ > int __mask = ((pages) >> (5 * (scale) + 1)) & TLBI_RANGE_MASK; \ > int __num = (((pages) >> (5 * (scale) + 1)) - 1) & \ > TLBI_RANGE_MASK; \ > (__next || __mask) ? __num : -1; \ > }) I'm afraid I don't follow the logic here, and it looks awfully complex. I came up with something simpler with this: diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h index 3b0e8248e1a4..b3f1a9c61189 100644 --- a/arch/arm64/include/asm/tlbflush.h +++ b/arch/arm64/include/asm/tlbflush.h @@ -161,12 +161,18 @@ static inline unsigned long get_trans_granule(void) #define MAX_TLBI_RANGE_PAGES __TLBI_RANGE_PAGES(31, 3) /* - * Generate 'num' values from -1 to 30 with -1 rejected by the + * Generate 'num' values from -1 to 31 with -1 rejected by the * __flush_tlb_range() loop below. */ #define TLBI_RANGE_MASK GENMASK_ULL(4, 0) -#define __TLBI_RANGE_NUM(pages, scale) \ - ((((pages) >> (5 * (scale) + 1)) & TLBI_RANGE_MASK) - 1) +#define __TLBI_RANGE_NUM(pages, scale) \ + ({ \ + int __pages = min((pages), \ + __TLBI_RANGE_PAGES(31, (scale))); \ + int __numplus1 = __pages >> (5 * (scale) + 1); \ + \ + (__numplus1 - 1); \ + }) /* * TLB Invalidation @@ -379,10 +385,6 @@ static inline void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch) * 3. If there is 1 page remaining, flush it through non-range operations. Range * operations can only span an even number of pages. We save this for last to * ensure 64KB start alignment is maintained for the LPA2 case. - * - * Note that certain ranges can be represented by either num = 31 and - * scale or num = 0 and scale + 1. The loop below favours the latter - * since num is limited to 30 by the __TLBI_RANGE_NUM() macro. */ #define __flush_tlb_range_op(op, start, pages, stride, \ asid, tlb_level, tlbi_user, lpa2) \ > > Alternatively, we can also limit the number of pages to be invalidated from > arch/arm64/kvm/hyp/pgtable.c::kvm_tlb_flush_vmid_range() because the maximal > capacity is (MAX_TLBI_RANGE_PAGES - 1) instead of MAX_TLBI_RANGE_PAGES, as > the comments for __flush_tlb_range_nosync() say. > > - inval_pages = min(pages, MAX_TLBI_RANGE_PAGES); > + inval_pages = min(pages, MAX_TLBI_RANGE_PAGES - 1); > > > static inline void __flush_tlb_range_nosync(...) > { > : > /* > * When not uses TLB range ops, we can handle up to > * (MAX_DVM_OPS - 1) pages; > * When uses TLB range ops, we can handle up to > * (MAX_TLBI_RANGE_PAGES - 1) pages. > */ > if ((!system_supports_tlb_range() && > (end - start) >= (MAX_DVM_OPS * stride)) || > pages >= MAX_TLBI_RANGE_PAGES) { > flush_tlb_mm(vma->vm_mm); > return; > } > } > > Please let me know which way is better. I would really prefer to fix the range stuff itself instead of papering over the problem by reducing the reach of the range invalidation. > > > I think the problem is that you are triggering NUM=31 and SCALE=3, > > which the current code cannot handle as per the comment above > > __flush_tlb_range_op() (we can't do NUM=30 and SCALE=4, obviously). > > > > Yes, exactly. > > > Can you try the untested patch below? > > > > > > diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h > > index 3b0e8248e1a4..b71a1cece802 100644 > > --- a/arch/arm64/include/asm/tlbflush.h > > +++ b/arch/arm64/include/asm/tlbflush.h > > @@ -379,10 +379,6 @@ static inline void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch) > > * 3. If there is 1 page remaining, flush it through non-range operations. Range > > * operations can only span an even number of pages. We save this for last to > > * ensure 64KB start alignment is maintained for the LPA2 case. > > - * > > - * Note that certain ranges can be represented by either num = 31 and > > - * scale or num = 0 and scale + 1. The loop below favours the latter > > - * since num is limited to 30 by the __TLBI_RANGE_NUM() macro. > > */ > > #define __flush_tlb_range_op(op, start, pages, stride, \ > > asid, tlb_level, tlbi_user, lpa2) \ > > @@ -407,6 +403,7 @@ do { \ > > \ > > num = __TLBI_RANGE_NUM(pages, scale); \ > > if (num >= 0) { \ > > + num += 1; \ > > addr = __TLBI_VADDR_RANGE(start >> shift, asid, \ > > scale, num, tlb_level); \ > > __tlbi(r##op, addr); \ > > > > Thanks, but I don't think it's going to work. The loop will be running infinitely > because the condition 'if (num >= 0)' can't be met when @pages is 0x200000 when > @scale is 3/2/1/0 until @scale becomes negative and positive again, but @scale > isn't in range [0 3]. I ported the chunk of code to user-space and I can see this > with added printf() messages. Yeah, we lose num==0, which is silly. Hopefully the hack above helps a bit. Thanks, M. -- Without deviation from the norm, progress is not possible.