Received: by 2002:a05:7412:8d10:b0:f3:1519:9f41 with SMTP id bj16csp4581771rdb; Tue, 12 Dec 2023 03:58:02 -0800 (PST) X-Google-Smtp-Source: AGHT+IEGZp8FO1eZz4f2kAu0TtCJi/EYl4WbtJ22stDUlQRL16F1/B62hYwA0HEA/sHYsXHkX6uM X-Received: by 2002:a17:902:e741:b0:1d0:6ffe:a1f with SMTP id p1-20020a170902e74100b001d06ffe0a1fmr7302519plf.125.1702382281731; Tue, 12 Dec 2023 03:58:01 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702382281; cv=none; d=google.com; s=arc-20160816; b=mUtsbFyfKF4iDfAzDj2OP4Tzyr00tKY/eWCEkopH8gBcdzIxjS6CZINqcK+TYaOArF 9OO5pDbvCA4rs34Tte2M0fya7tCJmWRju6Xy9PSpYaV7ixtRnaM0JMhqNzlHRfwuxn2q RzWPF7KnIb9k4lYegBSiMevZWx1nX7cx50+0WYwZcLTMXlQDWKaBMxjZX3MK0wJR8LzV KwEPVi0IIE3hU5engkKKEwLXT8R12BTHJzlxo59Rp7lCUMv7S3B8ik6DaqfRHWvWWYYX 7o6VSzGx8vIbdqBepqT5JixZceQfWx+Riv12/zSk5Yd+wFqYKT98ohlVgJG6AVnn2jhL T0OA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id; bh=KbttHYcCQMUAFhzQk1KYeAk1liOe0GMFpG1NVms8NFI=; fh=sjRO0h08Rt53ZYj/hAB0dQsagBDTUYGBEHH0mv5vZY4=; b=Ywtabcbao9mubukVSrKL10WyiiQEYLYQ+H4NUV+QuH496tPUUIvzaUXWkarAlTdpU+ 27YSTIZ1KGZPkfe2ILoLMzdHMANKAP8gqzPuLgN+PkU9ZlsK16wkPXzTXYGkOdRQqWo3 oeqY3DbpmU7EJsF985qxECqg7ssYlnR2PuuUY9RzKu5/UDD2IWK+qMbWE+C37iwwYH6R js5ZL5MFQngsIDROAMsu+tCMH7Rt1xiq8mBzZw1TungwfmG5qsF6NRVsWVT74dH9EUMb iuROn0ozRTEryuYgR7WbKxpec6bjwtgD5kRk70gRi7n4S0f5L/cDwUwOcPBSzE3oaytB QiGQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from howler.vger.email (howler.vger.email. [2620:137:e000::3:4]) by mx.google.com with ESMTPS id k10-20020a170902694a00b001cfcc0ca772si7691060plt.123.2023.12.12.03.58.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Dec 2023 03:58:01 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) client-ip=2620:137:e000::3:4; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by howler.vger.email (Postfix) with ESMTP id BF88080C4D5C; Tue, 12 Dec 2023 03:57:58 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at howler.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231961AbjLLL5o (ORCPT + 99 others); Tue, 12 Dec 2023 06:57:44 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39062 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229963AbjLLL5n (ORCPT ); Tue, 12 Dec 2023 06:57:43 -0500 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 3010FC3 for ; Tue, 12 Dec 2023 03:57:49 -0800 (PST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 494A5143D; Tue, 12 Dec 2023 03:58:35 -0800 (PST) Received: from [10.1.39.183] (XHFQ2J9959.cambridge.arm.com [10.1.39.183]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 9B5AE3F762; Tue, 12 Dec 2023 03:57:45 -0800 (PST) Message-ID: <97489e94-ea4e-40a3-9e56-d5f7d1219e81@arm.com> Date: Tue, 12 Dec 2023 11:57:44 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v3 02/15] mm: Batch-clear PTE ranges during zap_pte_range() Content-Language: en-GB To: Alistair Popple Cc: Catalin Marinas , Will Deacon , Ard Biesheuvel , Marc Zyngier , Oliver Upton , James Morse , Suzuki K Poulose , Zenghui Yu , Andrey Ryabinin , Alexander Potapenko , Andrey Konovalov , Dmitry Vyukov , Vincenzo Frascino , Andrew Morton , Anshuman Khandual , Matthew Wilcox , Yu Zhao , Mark Rutland , David Hildenbrand , Kefeng Wang , John Hubbard , Zi Yan , Barry Song <21cnbao@gmail.com>, Yang Shi , linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20231204105440.61448-1-ryan.roberts@arm.com> <20231204105440.61448-3-ryan.roberts@arm.com> <87h6kta3ap.fsf@nvdebian.thelocal> From: Ryan Roberts In-Reply-To: <87h6kta3ap.fsf@nvdebian.thelocal> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on howler.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (howler.vger.email [0.0.0.0]); Tue, 12 Dec 2023 03:57:58 -0800 (PST) On 08/12/2023 01:30, Alistair Popple wrote: > > Ryan Roberts writes: > >> Convert zap_pte_range() to clear a set of ptes in a batch. A given batch >> maps a physically contiguous block of memory, all belonging to the same >> folio. This will likely improve performance by a tiny amount due to >> removing duplicate calls to mark the folio dirty and accessed. And also >> provides us with a future opportunity to batch the rmap removal. >> >> However, the primary motivation for this change is to reduce the number >> of tlb maintenance operations that the arm64 backend has to perform >> during exit and other syscalls that cause zap_pte_range() (e.g. munmap, >> madvise(DONTNEED), etc.), as it is about to add transparent support for >> the "contiguous bit" in its ptes. By clearing ptes using the new >> clear_ptes() API, the backend doesn't have to perform an expensive >> unfold operation when a PTE being cleared is part of a contpte block. >> Instead it can just clear the whole block immediately. >> >> This change addresses the core-mm refactoring only, and introduces >> clear_ptes() with a default implementation that calls >> ptep_get_and_clear_full() for each pte in the range. Note that this API >> returns the pte at the beginning of the batch, but with the dirty and >> young bits set if ANY of the ptes in the cleared batch had those bits >> set; this information is applied to the folio by the core-mm. Given the >> batch is garranteed to cover only a single folio, collapsing this state > > Nit: s/garranteed/guaranteed/ > >> does not lose any useful information. >> >> A separate change will implement clear_ptes() in the arm64 backend to >> realize the performance improvement as part of the work to enable >> contpte mappings. >> >> Signed-off-by: Ryan Roberts >> --- >> include/asm-generic/tlb.h | 9 ++++++ >> include/linux/pgtable.h | 26 ++++++++++++++++ >> mm/memory.c | 63 ++++++++++++++++++++++++++------------- >> mm/mmu_gather.c | 14 +++++++++ >> 4 files changed, 92 insertions(+), 20 deletions(-) > > > >> diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c >> index 4f559f4ddd21..57b4d5f0dfa4 100644 >> --- a/mm/mmu_gather.c >> +++ b/mm/mmu_gather.c >> @@ -47,6 +47,20 @@ static bool tlb_next_batch(struct mmu_gather *tlb) >> return true; >> } >> >> +unsigned int tlb_get_guaranteed_space(struct mmu_gather *tlb) >> +{ >> + struct mmu_gather_batch *batch = tlb->active; >> + unsigned int nr_next = 0; >> + >> + /* Allocate next batch so we can guarrantee at least one batch. */ >> + if (tlb_next_batch(tlb)) { >> + tlb->active = batch; > > Rather than calling tlb_next_batch(tlb) and then undoing some of what it > does I think it would be clearer to factor out the allocation part of > tlb_next_batch(tlb) into a separate function (eg. tlb_alloc_batch) that > you can call from both here and tlb_next_batch(). As per my email against patch 1, I have some perf regressions to iron out for microbenchmarks; one issue is that this code forces the allocation of a page for a batch even when we are only modifying a single pte (which would previously fit in the embedded batch). So I've renamed this function to tlb_reserve_space(int nr). If it already has enough room, it will jsut return immediately. Else it will keep calling tlb_next_batch() in a loop until space has been allocated. Then after the loop we set tlb->active back to the original batch. Given the new potential need to loop a couple of times, and the need to build up that linked list, I think it works nicely without refactoring tlb_next_batch(). > > Otherwise I think this overall direction looks better than trying to > play funny games in the arch layer as it's much clearer what's going on > to core-mm code. > > - Alistair > >> + nr_next = batch->next->max; >> + } >> + >> + return batch->max - batch->nr + nr_next; >> +} >> + >> #ifdef CONFIG_SMP >> static void tlb_flush_rmap_batch(struct mmu_gather_batch *batch, struct vm_area_struct *vma) >> { >