Received: by 2002:a05:7412:b995:b0:f9:9502:5bb8 with SMTP id it21csp7711826rdb; Thu, 4 Jan 2024 05:33:52 -0800 (PST) X-Google-Smtp-Source: AGHT+IEB/Q7DY5Ym94iVvYXm77z/0YQOV+HKFr66k9vbDdtCphL6k2XczoV6Uo6/SGztfI5XkBjC X-Received: by 2002:a2e:9653:0:b0:2cc:3f63:2013 with SMTP id z19-20020a2e9653000000b002cc3f632013mr339633ljh.41.1704375232752; Thu, 04 Jan 2024 05:33:52 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1704375232; cv=none; d=google.com; s=arc-20160816; b=gFjH7Av0yTnRGvxyPcHf8hhqw9LdGMvZNIklzjEQcm5xGmxqSfXO6EPvtEUdkL6RCf sv4+g26/KjD5yCyx5NWqhA4VEE7Qlm39kA5crR/i4peWIZlMVBMy70Q9P5sZJbPyhUYD tvFKS92/clH02oydQ+/ALp64C28ENnGWjB8B3/ultYYZge/ypuLEYhCwwBvsLtuaytRw TExxsstUHssRjhVcn7qcFcdJYqXBWM92PT5SASXe7hieJyApmZFkXKDEVu0fSeXrOpk2 eUNtH8LxnK27QLgNTHiiw+rS21x0vORfUAWdo1yY6h1WVeRc0N3atMBt/4QMgnjP0hOi 2qBw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:list-unsubscribe:list-subscribe :list-id:precedence:dkim-signature; bh=g9nciiriJpb7tENlNvSwNphXFtDHdU2SZI62Llv8IA4=; fh=F06sx/43MD7E1uEDnoX9raUHIxMbn4PSwEeh1aJmny0=; b=KGj8lHcfJpil2VqYptsfmBmU6zquDfrAIkjmDD0qXQmY0CkfG5g9PhHKQ+0odVi1Ts XIBzYyXEBH+YOfmwVaxd33pNSnCuLcFL3pdZ6SzTY1UdjxNVc9n5qT5EIA23XjmlvK0H GoYe1Hh+OBt79m6EKsBgvyl+PKY+zDjL5mAoBcq+vDX130xUmjxXHyJdKEj0SuzCQWQx KsfZGzaIJOzTP99az/TfTLJM5hm5/XBPAWBckkIUmifQWDBWOuFrABgZdjHFcEe6u9pQ UaqfjMQSP7hatYcwUR4x3/ZyEYCn/rDTV6EECQHDyoqssQZgDvxtz/LZFvyUlFGTog/+ rLhw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@rivosinc-com.20230601.gappssmtp.com header.s=20230601 header.b=Eb2oxIEN; spf=pass (google.com: domain of linux-kernel+bounces-16704-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-16704-linux.lists.archive=gmail.com@vger.kernel.org" Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [2604:1380:4601:e00::3]) by mx.google.com with ESMTPS id s10-20020a50ab0a000000b005557d5feb7fsi7015418edc.353.2024.01.04.05.33.52 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 04 Jan 2024 05:33:52 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-16704-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; Authentication-Results: mx.google.com; dkim=pass header.i=@rivosinc-com.20230601.gappssmtp.com header.s=20230601 header.b=Eb2oxIEN; spf=pass (google.com: domain of linux-kernel+bounces-16704-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-16704-linux.lists.archive=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 41D761F23908 for ; Thu, 4 Jan 2024 13:33:52 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 2B8B9224CA; Thu, 4 Jan 2024 13:33:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=rivosinc-com.20230601.gappssmtp.com header.i=@rivosinc-com.20230601.gappssmtp.com header.b="Eb2oxIEN" X-Original-To: linux-kernel@vger.kernel.org Received: from mail-ed1-f51.google.com (mail-ed1-f51.google.com [209.85.208.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E8245224C2 for ; Thu, 4 Jan 2024 13:33:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=rivosinc.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=rivosinc.com Received: by mail-ed1-f51.google.com with SMTP id 4fb4d7f45d1cf-555bd21f9fdso659307a12.0 for ; Thu, 04 Jan 2024 05:33:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20230601.gappssmtp.com; s=20230601; t=1704375221; x=1704980021; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=g9nciiriJpb7tENlNvSwNphXFtDHdU2SZI62Llv8IA4=; b=Eb2oxIENQkiMEoCKcb9owXt0YFJuf2PvUTAUwmw+8frGGapY06bx2LcALzYzvqXjyU hYt68pVDKgIUTmbXUgF0u1w1HiAKlcL+ToRqi9r+un/9GSTWPWPC+9JixwCTK9B0KaI5 EhPaDZ/9RQC9l4IQAOLzvYYSwpiAaVLemWgMPsNKV7IHHzVVDfHiU7NQFjBAVnMP5ar+ L6kbbTkarIvu2CSY+dEXEMuentQOCZUFu3IVUubd6nZ3xixcMDbK/H+oCuN7LtuZUBSP LQXTB86sOd6VegThj/QctnFixJ0ItegVBIpShOqVMacAcKu4gAKEdzoqCYDNRXMHVAo8 LMNg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1704375221; x=1704980021; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=g9nciiriJpb7tENlNvSwNphXFtDHdU2SZI62Llv8IA4=; b=codCr/BhbIva3uYjVQegnOh8PGe4X7msvznr73Je8MPbuFNif5IODsRcgusptkkHRI KjP43nD6a+f2mz4whYKpoiWO5ogDrx9Gpi6asR9vqmIyglxMLBot+yC/LJyuFgygg9wT ESaJUNu/QRv7Sbqh+pa/C9Sw/z6IY2zmZMW/lmxcKyTE73pn+7YzNlzOBcknqg5yjRtZ leBkyJtGITcTYvmVq+dH1CG+cO+VyCMts0LmHNr4HC3BofZVS5ExEC9jqqBAtlTgPrxh NmpLqbUXLaNAAWXGt1GVektrv2nWT2R0QiFROcE6jwAOefKLVaPbCT2vHKvba38hSjLH yTrw== X-Gm-Message-State: AOJu0Yz274yi3eV+zQpywlqNNDvwgrHBO/kGbMP7C0BO57YnyRjPF/Sj KqXt7c9jamXLCsvCXc5VX33vqmAG/fZj1uLHVdIZ7JnuGF3+UU/GrKNjMrdV X-Received: by 2002:a17:906:7683:b0:a23:49f1:52e9 with SMTP id o3-20020a170906768300b00a2349f152e9mr348415ejm.146.1704375221096; Thu, 04 Jan 2024 05:33:41 -0800 (PST) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20240102141851.105144-1-alexghiti@rivosinc.com> <9cd39ef1-9750-4e4d-9df1-3dad0a7ea015@sifive.com> In-Reply-To: <9cd39ef1-9750-4e4d-9df1-3dad0a7ea015@sifive.com> From: Alexandre Ghiti Date: Thu, 4 Jan 2024 14:33:30 +0100 Message-ID: Subject: Re: [PATCH] riscv: Add support for BATCHED_UNMAP_TLB_FLUSH To: Samuel Holland Cc: Paul Walmsley , Palmer Dabbelt , Albert Ou , linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Samuel, On Tue, Jan 2, 2024 at 11:13=E2=80=AFPM Samuel Holland wrote: > > Hi Alex, > > On 2024-01-02 8:18 AM, Alexandre Ghiti wrote: > > Allow to defer the flushing of the TLB when unmapping pges, which allow= s > > to reduce the numbers of IPI and the number of sfence.vma. > > > > The ubenchmarch used in commit 43b3dfdd0455 ("arm64: support > > batched/deferred tlb shootdown during page reclamation/migration") show= s > > good performance improvement and perf reports an important decrease in > > time spent flushing the tlb (results come from qemu): > > > > Before this patch: > > > > real 2m1.135s > > user 0m0.980s > > sys 2m0.096s > > > > 4.83% batch_tlb [kernel.kallsyms] [k] __flush_tlb_range > > > > After this patch: > > > > real 1m0.543s > > user 0m1.059s > > sys 0m59.489s > > > > 0.14% batch_tlb [kernel.kallsyms] [k] __flush_tlb_range > > That's a great improvement! > > > Signed-off-by: Alexandre Ghiti > > --- > > arch/riscv/Kconfig | 1 + > > arch/riscv/include/asm/tlbbatch.h | 15 +++++++ > > arch/riscv/include/asm/tlbflush.h | 10 +++++ > > arch/riscv/mm/tlbflush.c | 71 ++++++++++++++++++++++--------- > > 4 files changed, 77 insertions(+), 20 deletions(-) > > create mode 100644 arch/riscv/include/asm/tlbbatch.h > > > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig > > index 7603bd8ab333..aa07bd43b138 100644 > > --- a/arch/riscv/Kconfig > > +++ b/arch/riscv/Kconfig > > @@ -53,6 +53,7 @@ config RISCV > > select ARCH_USE_MEMTEST > > select ARCH_USE_QUEUED_RWLOCKS > > select ARCH_USES_CFI_TRAPS if CFI_CLANG > > + select ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH if SMP && MMU > > Is the SMP dependency because the batching is only useful with multiple C= PUs, or > just because tlbflush.c is only compiled in SMP configurations (which is > resolved by [1])? > For now, yes, I considered that only the gain of the IPI is worthwhile hence the restriction for SMP. > [1]: > https://lore.kernel.org/linux-riscv/20240102220134.3229156-1-samuel.holla= nd@sifive.com/ > > > select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT if MMU > > select ARCH_WANT_FRAME_POINTERS > > select ARCH_WANT_GENERAL_HUGETLB if !RISCV_ISA_SVNAPOT > > diff --git a/arch/riscv/include/asm/tlbbatch.h b/arch/riscv/include/asm= /tlbbatch.h > > new file mode 100644 > > index 000000000000..46014f70b9da > > --- /dev/null > > +++ b/arch/riscv/include/asm/tlbbatch.h > > @@ -0,0 +1,15 @@ > > +/* SPDX-License-Identifier: GPL-2.0-only */ > > +/* > > + * Copyright (C) 2023 Rivos Inc. > > + */ > > + > > +#ifndef _ASM_RISCV_TLBBATCH_H > > +#define _ASM_RISCV_TLBBATCH_H > > + > > +#include > > + > > +struct arch_tlbflush_unmap_batch { > > + struct cpumask cpumask; > > +}; > > + > > +#endif /* _ASM_RISCV_TLBBATCH_H */ > > diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm= /tlbflush.h > > index 8f3418c5f172..f0b731ccc0c2 100644 > > --- a/arch/riscv/include/asm/tlbflush.h > > +++ b/arch/riscv/include/asm/tlbflush.h > > @@ -46,6 +46,16 @@ void flush_tlb_kernel_range(unsigned long start, uns= igned long end); > > void flush_pmd_tlb_range(struct vm_area_struct *vma, unsigned long sta= rt, > > unsigned long end); > > #endif > > + > > +#ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH > > +bool arch_tlbbatch_should_defer(struct mm_struct *mm); > > +void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch= , > > + struct mm_struct *mm, > > + unsigned long uaddr); > > +void arch_flush_tlb_batched_pending(struct mm_struct *mm); > > +void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch); > > +#endif /* CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH */ > > + > > #else /* CONFIG_SMP && CONFIG_MMU */ > > > > #define flush_tlb_all() local_flush_tlb_all() > > diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c > > index e6659d7368b3..bb623bca0a7d 100644 > > --- a/arch/riscv/mm/tlbflush.c > > +++ b/arch/riscv/mm/tlbflush.c > > @@ -93,29 +93,23 @@ static void __ipi_flush_tlb_range_asid(void *info) > > local_flush_tlb_range_asid(d->start, d->size, d->stride, d->asid)= ; > > } > > > > -static void __flush_tlb_range(struct mm_struct *mm, unsigned long star= t, > > - unsigned long size, unsigned long stride) > > +static void __flush_tlb_range(struct cpumask *cmask, unsigned long asi= d, > > + unsigned long start, unsigned long size, > > + unsigned long stride) > > { > > struct flush_tlb_range_data ftd; > > - const struct cpumask *cmask; > > - unsigned long asid =3D FLUSH_TLB_NO_ASID; > > bool broadcast; > > > > - if (mm) { > > - unsigned int cpuid; > > + if (cpumask_empty(cmask)) > > + return; > > > > - cmask =3D mm_cpumask(mm); > > - if (cpumask_empty(cmask)) > > - return; > > + if (cmask !=3D cpu_online_mask) { > > + unsigned int cpuid; > > > > cpuid =3D get_cpu(); > > /* check if the tlbflush needs to be sent to other CPUs *= / > > broadcast =3D cpumask_any_but(cmask, cpuid) < nr_cpu_ids; > > - > > - if (static_branch_unlikely(&use_asid_allocator)) > > - asid =3D atomic_long_read(&mm->context.id) & asid= _mask; > > } else { > > - cmask =3D cpu_online_mask; > > broadcast =3D true; > > } > > > > @@ -135,25 +129,34 @@ static void __flush_tlb_range(struct mm_struct *m= m, unsigned long start, > > local_flush_tlb_range_asid(start, size, stride, asid); > > } > > > > - if (mm) > > + if (cmask !=3D cpu_online_mask) > > put_cpu(); > > } > > > > +static inline unsigned long get_mm_asid(struct mm_struct *mm) > > +{ > > + return static_branch_unlikely(&use_asid_allocator) ? > > + atomic_long_read(&mm->context.id) & asid_mask : F= LUSH_TLB_NO_ASID; > > +} > > + > > void flush_tlb_mm(struct mm_struct *mm) > > { > > - __flush_tlb_range(mm, 0, FLUSH_TLB_MAX_SIZE, PAGE_SIZE); > > + __flush_tlb_range(mm_cpumask(mm), get_mm_asid(mm), > > + 0, FLUSH_TLB_MAX_SIZE, PAGE_SIZE); > > } > > > > void flush_tlb_mm_range(struct mm_struct *mm, > > unsigned long start, unsigned long end, > > unsigned int page_size) > > { > > - __flush_tlb_range(mm, start, end - start, page_size); > > + __flush_tlb_range(mm_cpumask(mm), get_mm_asid(mm), > > + start, end - start, page_size); > > } > > > > void flush_tlb_page(struct vm_area_struct *vma, unsigned long addr) > > { > > - __flush_tlb_range(vma->vm_mm, addr, PAGE_SIZE, PAGE_SIZE); > > + __flush_tlb_range(mm_cpumask(vma->vm_mm), get_mm_asid(vma->vm_mm)= , > > + addr, PAGE_SIZE, PAGE_SIZE); > > } > > > > void flush_tlb_range(struct vm_area_struct *vma, unsigned long start, > > @@ -185,18 +188,46 @@ void flush_tlb_range(struct vm_area_struct *vma, = unsigned long start, > > } > > } > > > > - __flush_tlb_range(vma->vm_mm, start, end - start, stride_size); > > + __flush_tlb_range(mm_cpumask(vma->vm_mm), get_mm_asid(vma->vm_mm)= , > > + start, end - start, stride_size); > > } > > > > void flush_tlb_kernel_range(unsigned long start, unsigned long end) > > { > > - __flush_tlb_range(NULL, start, end - start, PAGE_SIZE); > > + __flush_tlb_range((struct cpumask *)cpu_online_mask, FLUSH_TLB_NO= _ASID, > > + start, end - start, PAGE_SIZE); > > } > > > > #ifdef CONFIG_TRANSPARENT_HUGEPAGE > > void flush_pmd_tlb_range(struct vm_area_struct *vma, unsigned long sta= rt, > > unsigned long end) > > { > > - __flush_tlb_range(vma->vm_mm, start, end - start, PMD_SIZE); > > + __flush_tlb_range(mm_cpumask(vma->vm_mm), get_mm_asid(vma->vm_mm)= , > > + start, end - start, PMD_SIZE); > > } > > #endif > > + > > +#ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH > > This condition is necessarily true if the file is being compiled. Indeed, I'll remove that, thanks. > > > +bool arch_tlbbatch_should_defer(struct mm_struct *mm) > > +{ > > + return true; > > +} > > + > > +void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch= , > > + struct mm_struct *mm, > > + unsigned long uaddr) > > +{ > > + cpumask_or(&batch->cpumask, &batch->cpumask, mm_cpumask(mm)); > > +} > > + > > +void arch_flush_tlb_batched_pending(struct mm_struct *mm) > > +{ > > + flush_tlb_mm(mm); > > +} > > + > > +void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch) > > +{ > > + __flush_tlb_range(&batch->cpumask, FLUSH_TLB_NO_ASID, 0, > > The batching appears to be limited to within a single mm, so we could sav= e the > ASID inside struct arch_tlbflush_unmap_batch and use it here. The batching can be used when reclaiming pages (see shrink_folio_list() -> try_to_unmap() -> try_to_unmap_one()), so that could be pages from multiple processes. I'm working on a follow-up patch that keeps the page addresses and mm to avoid the global sfence.vma here (up to a certain threshold that we already use), but since this version *seemed* to perform well, I sent it first. Thanks, Alex > > Regards, > Samuel > > > + FLUSH_TLB_MAX_SIZE, PAGE_SIZE); > > +} > > +#endif /* CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH */ >