Received: by 2002:ab2:2441:0:b0:1f3:1f8c:d0c6 with SMTP id k1csp142702lqe; Thu, 4 Apr 2024 02:10:38 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCWLU8MyQ5DnHQ3OOF1BynqwCqXtN+TSDfxu+Ua0wa1ayPEfO/+U7u7My3aKC+mSMaIT4M2x/BDfcO8D0nOXto7tu5ACsuHRFevLurAhgQ== X-Google-Smtp-Source: AGHT+IGDzPcQiHImgGq9tWKtRFSVFwsNhQwivJOHGnNkppnVdI7uy305uucvYKOUs1eipUjvd3ou X-Received: by 2002:a17:906:6a1a:b0:a4e:5bad:b6f3 with SMTP id qw26-20020a1709066a1a00b00a4e5badb6f3mr1434812ejc.10.1712221838546; Thu, 04 Apr 2024 02:10:38 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1712221838; cv=pass; d=google.com; s=arc-20160816; b=mp/jvoqEk2nEpv6D++OufrxYEXna7xn0DNBDoxUrjNwFNiYuJZfdK3uhrE7rWwubgT sDPAQ5BOwrLLzoRuXZjcd7UVenltz1r+6XFqhRYRbnyK8EKA6VgZ87Fir6GXzhNgdXSp inxU5AwR1wctHhDYyZLbEeNN8aq2ui8VlEL8WT31FhxXAeIASBDKV1cU086QqxYTjbDT /lMnFce9T8KtN1BCKydujRYYClECAnsa/u9S/fQMWTHp4mmoDjS544WM06l39k3qkryh KnXpfST6sEDxsVjFkFWrVY/cuSv18y/AfLbnp/2KkMm1bbxlcjqAWgl5lsZV0V/N7buN B9Aw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:list-unsubscribe:list-subscribe:list-id:precedence :user-agent:references:in-reply-to:subject:cc:to:from:message-id :date:dkim-signature; bh=LdjMzy4+7ZovfCGyOedaZBYwpqztAOp2QvTX0IK9As4=; fh=J7A7OBX81+wSg+aYBQoB4FsYkU0feQ6qjxCcT9vDpZ0=; b=P6AlR1euAAMLjqHcIOQuOtJchxLwuvSg5cTeKM77A7pNsdVnc+qrz+dFN85kias+X7 sBEQZbjXxtgSz7QdTjBLzEXgcixAMV7nrgTrBcsIuiAa44m95f0TC47u8NWZE21f2gQ0 pPFktc984RON2BdmBY0pa6Bx3TsmbUjtwpNZxovDz7VPyKFOttfwO1ch27Kwdswd5nU9 XimFoyrWiO2sY8AZMvuI0JZEG/GmIY0/bn4pxURX2+nplrVMyDPB4LY0tH+abqgAM9lJ WNk3+W8nkZOhkfumxn1fjgi28Mpo9nh4C04D2p44SvjP0+4oolCe9IZa/t/FkSki/77F bLkQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=DFF8E+X+; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-131150-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-131150-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [2604:1380:4601:e00::3]) by mx.google.com with ESMTPS id s19-20020a170906061300b00a461a8f8beesi7395921ejb.978.2024.04.04.02.10.38 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 04 Apr 2024 02:10:38 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-131150-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=DFF8E+X+; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-131150-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-131150-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 1EE001F25C66 for ; Thu, 4 Apr 2024 09:10:38 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id A60DB74267; Thu, 4 Apr 2024 09:10:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="DFF8E+X+" Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 914111E86F for ; Thu, 4 Apr 2024 09:10:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712221830; cv=none; b=g8QF0kieRBq9zcTarJ+Ri7X0ufNzsTFdYU9NNmYyzl3towA+MPPgYPx7gK9pbpW9oEWfA66M2QGsGdxJ0RDqIyV6PuVAN9AsapMert+IoSTrjF/CNsHj9WEOTT5As4X+/GJh+d2SY8cpMl8D8OFI9zBpecHlpXTrPNGusLnih5w= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712221830; c=relaxed/simple; bh=f05m8PEbeJDSLndrUkpBjqyWXCZfLOBBZIVWIpskd2Q=; h=Date:Message-ID:From:To:Cc:Subject:In-Reply-To:References: MIME-Version:Content-Type; b=lSF+cA1MBnyTfDPvRWO9t8aSR9nEmL9F+oEWTcM6/JVgkpAjtTIi5b6qMAURCspBBK1gX6pbKyUvNK028H2Bd+KFfdz9CsWy5SsmFfa79fYse5zvJkLX7Myl6ju1UOX/jmzEgB63THP3JgS9kFmBnhmcBsHXNd8iN0zv1X4uqp8= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=DFF8E+X+; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 04E99C433C7; Thu, 4 Apr 2024 09:10:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1712221830; bh=f05m8PEbeJDSLndrUkpBjqyWXCZfLOBBZIVWIpskd2Q=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=DFF8E+X+fbEA0uawBttPRl9GZTK01Oz8ZRhr3iUd5qgtKeUCnbxs38JDNVuIqNHYU 0fTbtsFU6zprjYXnmMHYk5ASapRy0EimI7HKqscw3PrPWhdm0NIwfdVHoB7XQ4E1Xu mXaDU8UCcy+GdK978XwsKlnJHsfb2rohfuOE5AkpRJKFwwZsR62K3g9JdNu2A81HSi TO0nDi6etmpf+lDtx4rWKumUp0nMkCIpIUAi1+M+p92yBNAujD8s7+vZHZTi5Q6Nzv aXbioWQCCOB/KJWeLONFd4GW9anc2oleGa+vgiNZNH17dWpsTl2Q0pqo3R/KoK1CkA NNFnAnx5SlA5w== Received: from [185.201.63.251] (helo=wait-a-minute.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1rsJ74-001Pk1-SZ; Thu, 04 Apr 2024 10:10:27 +0100 Date: Thu, 04 Apr 2024 10:10:23 +0100 Message-ID: <87il0xsdc0.wl-maz@kernel.org> From: Marc Zyngier To: Gavin Shan Cc: linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, catalin.marinas@arm.com, will@kernel.org, akpm@linux-foundation.org, oliver.upton@linux.dev, mark.rutland@arm.com, ryan.roberts@arm.com, apopple@nvidia.com, rananta@google.com, yangyicong@hisilicon.com, v-songbaohua@oppo.com, yezhenyu2@huawei.com, yihyu@redhat.com, shan.gavin@gmail.com Subject: Re: [PATCH v2] arm64: tlb: Fix TLBI RANGE operand In-Reply-To: <20240404053624.1485237-1-gshan@redhat.com> References: <20240404053624.1485237-1-gshan@redhat.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/28.2 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 185.201.63.251 X-SA-Exim-Rcpt-To: gshan@redhat.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, catalin.marinas@arm.com, will@kernel.org, akpm@linux-foundation.org, oliver.upton@linux.dev, mark.rutland@arm.com, ryan.roberts@arm.com, apopple@nvidia.com, rananta@google.com, yangyicong@hisilicon.com, v-songbaohua@oppo.com, yezhenyu2@huawei.com, yihyu@redhat.com, shan.gavin@gmail.com X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false On Thu, 04 Apr 2024 06:36:24 +0100, Gavin Shan wrote: > > KVM/arm64 relies on TLBI RANGE feature to flush TLBs when the dirty > bitmap is collected by VMM and the corresponding PTEs need to be > write-protected during live migration. Unfortunately, the operand > passed to the TLBI RANGE instruction isn't correctly sorted out by > commit d1d3aa98b1d4 ("arm64: tlb: Use the TLBI RANGE feature in arm64"). This isn't the offending commit. See below. > It leads to crash on the destination VM after live migration because > TLBs aren't flushed completely and some of the dirty pages are missed. > > For example, I have a VM where 8GB memory is assigned, starting from > 0x40000000 (1GB). Note that the host has 4KB as the base page size. > All TLBs for VM can be covered by one TLBI RANGE operation. However, > the operand 0xffff708000040000 is set for scale -9, and -1 is returned It's not scale, as it is limited to 2 bits. It's a random value that actively corrupts adjacent fields because it is wrongly sign-extended. ASID and TG are now utter bollocks, and the CPU is within its rights to totally ignore the TLBI (TG indicates 64kB translation granule...). We really should fix __TLBI_VADDR_RANGE() to use proper bit fields instead of a bunch of shifts that lead to this mess. > from __TLBI_RANGE_NUM() for scale 3/2/1/0 and rejected by the loop in > __flush_tlb_range_op(). __TLBI_RANGE_NUM() isn't expected to work > like this because all the pages should be covered by scale 3/2/1/0, > plus an additional page if needed. > > Fix the macro __TLBI_RANGE_NUM() so that the correct NUM and TLBI RANGE > operand are provided for each scale level. With the changes, [-1 31] > instead of [-1 30] can be returned from the macro, meaning the TLBs for > 0x200000 pages (8GB memory) can be flushed in one shoot at scale 3. The > macro TLBI_RANGE_MASK is dropped since no one uses it any more. > > Fixes: d1d3aa98b1d4 ("arm64: tlb: Use the TLBI RANGE feature in arm64") > Cc: stable@kernel.org # v5.10+ I don't think this is right. The problem only occurs since 117940aa6e5f8 ("KVM: arm64: Define kvm_tlb_flush_vmid_range()"), which is the only case where we try to use NUM=31 (the rest of the kernel is using (MAX_TLBI_RANGE_PAGES - 1), which results in NUM=30 at most). Also, before e2768b798a19 ("arm64/mm: Modify range-based tlbi to decrement scale"), we used a different algorithm to perform the invalidation (increasing scale instead of decreasing), so this probably doesn't hit the same way. In any case, this is a KVM-only problem that will never show before v6.6. So 5.10 really isn't a place where we need to backport anything. > Reported-by: Yihuang Yu > Suggested-by: Marc Zyngier > Signed-off-by: Gavin Shan > --- > v2: Improve __TLBI_RANGE_NUM() as Marc suggested > --- > arch/arm64/include/asm/tlbflush.h | 13 +++++++++---- > 1 file changed, 9 insertions(+), 4 deletions(-) > > diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h > index 3b0e8248e1a4..cd9b71c30366 100644 > --- a/arch/arm64/include/asm/tlbflush.h > +++ b/arch/arm64/include/asm/tlbflush.h > @@ -161,12 +161,17 @@ static inline unsigned long get_trans_granule(void) > #define MAX_TLBI_RANGE_PAGES __TLBI_RANGE_PAGES(31, 3) > > /* > - * Generate 'num' values from -1 to 30 with -1 rejected by the > + * Generate 'num' values from -1 to 31 with -1 rejected by the > * __flush_tlb_range() loop below. > */ > -#define TLBI_RANGE_MASK GENMASK_ULL(4, 0) > -#define __TLBI_RANGE_NUM(pages, scale) \ > - ((((pages) >> (5 * (scale) + 1)) & TLBI_RANGE_MASK) - 1) > +#define __TLBI_RANGE_NUM(pages, scale) \ > + ({ \ > + int __pages = min((pages), \ > + __TLBI_RANGE_PAGES(31, (scale))); \ > + int __numplus1 = __pages >> (5 * (scale) + 1); \ > + \ > + (__numplus1 - 1); \ > + }) This was only intended as a way to convey the general idea. __numplus1 can obviously be removed and the right-shifting expression promoted as the return value. Next, the comments in this file need adjustments to reflect the supported invalidation range, as my original patch did (plus some more). Finally, and since we can now handle the full range of invalidation, it would make sense to fix __flush_tlb_range_nosync() to allow MAX_TLBI_RANGE_PAGES ranges (potentially in a separate patch). In the end, my sandbox contains the following, which should probably be split in 3 patches: diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h index 3b0e8248e1a4..bcbe697ed191 100644 --- a/arch/arm64/include/asm/tlbflush.h +++ b/arch/arm64/include/asm/tlbflush.h @@ -142,16 +142,22 @@ static inline unsigned long get_trans_granule(void) * EL1, Inner Shareable". * */ +#define TLBIR_ASID_MASK GENMASK_ULL(63, 48) +#define TLBIR_TG_MASK GENMASK_ULL(47, 46) +#define TLBIR_SCALE_MASK GENMASK_ULL(45, 44) +#define TLBIR_NUM_MASK GENMASK_ULL(43, 39) +#define TLBIR_TTL_MASK GENMASK_ULL(38, 37) +#define TLBIR_BADDR_MASK GENMASK_ULL(36, 0) + #define __TLBI_VADDR_RANGE(baddr, asid, scale, num, ttl) \ ({ \ - unsigned long __ta = (baddr); \ + unsigned long __ta = FIELD_PREP(TLBIR_BADDR_MASK, (baddr)); \ unsigned long __ttl = (ttl >= 1 && ttl <= 3) ? ttl : 0; \ - __ta &= GENMASK_ULL(36, 0); \ - __ta |= __ttl << 37; \ - __ta |= (unsigned long)(num) << 39; \ - __ta |= (unsigned long)(scale) << 44; \ - __ta |= get_trans_granule() << 46; \ - __ta |= (unsigned long)(asid) << 48; \ + __ta |= FIELD_PREP(TLBIR_TTL_MASK, __ttl); \ + __ta |= FIELD_PREP(TLBIR_NUM_MASK, (unsigned long)(num)); \ + __ta |= FIELD_PREP(TLBIR_SCALE_MASK, (unsigned long)(scale)); \ + __ta |= FIELD_PREP(TLBIR_TG_MASK, get_trans_granule()); \ + __ta |= FIELD_PREP(TLBIR_ASID_MASK, (unsigned long)(asid)); \ __ta; \ }) @@ -161,12 +167,17 @@ static inline unsigned long get_trans_granule(void) #define MAX_TLBI_RANGE_PAGES __TLBI_RANGE_PAGES(31, 3) /* - * Generate 'num' values from -1 to 30 with -1 rejected by the - * __flush_tlb_range() loop below. + * Generate 'num' values from -1 to 31 with -1 rejected by the + * __flush_tlb_range() loop below. Its return value is only + * significant for a maximum of MAX_TLBI_RANGE_PAGES pages. If 'pages' + * is more than that, you must iterate over the overall range. */ -#define TLBI_RANGE_MASK GENMASK_ULL(4, 0) -#define __TLBI_RANGE_NUM(pages, scale) \ - ((((pages) >> (5 * (scale) + 1)) & TLBI_RANGE_MASK) - 1) +#define __TLBI_RANGE_NUM(pages, scale) \ + ({ \ + int __pages = min((pages), \ + __TLBI_RANGE_PAGES(31, (scale))); \ + (__pages >> (5 * (scale) + 1)) - 1; \ + }) /* * TLB Invalidation @@ -379,10 +390,6 @@ static inline void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch) * 3. If there is 1 page remaining, flush it through non-range operations. Range * operations can only span an even number of pages. We save this for last to * ensure 64KB start alignment is maintained for the LPA2 case. - * - * Note that certain ranges can be represented by either num = 31 and - * scale or num = 0 and scale + 1. The loop below favours the latter - * since num is limited to 30 by the __TLBI_RANGE_NUM() macro. */ #define __flush_tlb_range_op(op, start, pages, stride, \ asid, tlb_level, tlbi_user, lpa2) \ @@ -437,11 +444,11 @@ static inline void __flush_tlb_range_nosync(struct vm_area_struct *vma, * When not uses TLB range ops, we can handle up to * (MAX_DVM_OPS - 1) pages; * When uses TLB range ops, we can handle up to - * (MAX_TLBI_RANGE_PAGES - 1) pages. + * MAX_TLBI_RANGE_PAGES pages. */ if ((!system_supports_tlb_range() && (end - start) >= (MAX_DVM_OPS * stride)) || - pages >= MAX_TLBI_RANGE_PAGES) { + pages > MAX_TLBI_RANGE_PAGES) { flush_tlb_mm(vma->vm_mm); return; } Thanks, M. -- Without deviation from the norm, progress is not possible.