Received: by 2002:a05:7412:b10a:b0:f3:1519:9f41 with SMTP id az10csp2621428rdb; Mon, 4 Dec 2023 02:56:20 -0800 (PST) X-Google-Smtp-Source: AGHT+IHnkumNLDXRZUCWZoWZ67/9fZEljrmiGkQg0QE01xuDg/IzfqteYk1/iWp4uwTYSeTIGzLU X-Received: by 2002:a05:6358:524d:b0:170:17eb:14b3 with SMTP id c13-20020a056358524d00b0017017eb14b3mr1711097rwa.35.1701687380272; Mon, 04 Dec 2023 02:56:20 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701687380; cv=none; d=google.com; s=arc-20160816; b=ITl2kZ6lcVrsbvUPVEZcN+w08/SKf4azje4FTlxoDaFJDJ5pW6ALOuzzpS99Tupa5W Qbb+Cw4sDyEreuA15UfMy8zFkeM9Aefj+MJMTdNvaqXiIRjMvrzqW8mFVRSUijTKSUn+ HyK1nj+rbOac/TdM4ieGfNt32XrbVqY63WztdyakdzotnIOlsdMETI8bMrJysDsQnfHP 2ZzUIhd1XAoBYIkzlpIdeSE4wmb8SDbkFQfzQyJGl7Q8yhRlJpIS06Hlc6CNN/RMUyC1 RyQkqAdxVLEVhvAPM6iyQktViIhxOtvq4TUgO6DqpV3MHT4DDzXV16nu5EHrhzHeGCll d6MQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=EQA7si1ROP03u6D9py244k+Ey/Ri2m2/5MH2b5lgCqc=; fh=a3gZESXuO1ptbCtzvzEnZL5xUl+WC4Vy5hoF2yaOXSs=; b=SpPPtAUilzkydc+u9s0W77GIGFrxkonIzIv6QMvhHDXAZTKE9yaKHSRCjUq0LnG+v/ AHzQonvGordhCHwmwUg8w5Ez39nuP4MJok/4yfyHeMYZhYh5wMWnxh+4Wg7bCCN47zVB bfrs+PPmBbfu9d9lsLa0Tr8Bp0dZr3+EwDCP3QAdgj8LEKzr6IwxXxXFXe1g6iqPCtZw RDNIohUZTrlOdkwMOmJL86t2/yjAvBvB0p8F6akkFjc3m8cf+v+n70wKjnKWY7Lx+R4L il7hJLqOa9S0XkzUqTn7KmDNXE35bpSVA+AvLbsOLWxuncN+SkHJZbXG3rPpJ5k9yex+ kCpA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from agentk.vger.email (agentk.vger.email. [2620:137:e000::3:2]) by mx.google.com with ESMTPS id bt15-20020a056a00438f00b006ce563b23cdsi1039422pfb.136.2023.12.04.02.56.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 04 Dec 2023 02:56:20 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) client-ip=2620:137:e000::3:2; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id 8013D805E417; Mon, 4 Dec 2023 02:56:15 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1343975AbjLDKzD (ORCPT + 99 others); Mon, 4 Dec 2023 05:55:03 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38324 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235330AbjLDKy6 (ORCPT ); Mon, 4 Dec 2023 05:54:58 -0500 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 1B0A6CD for ; Mon, 4 Dec 2023 02:55:04 -0800 (PST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 1DD87169C; Mon, 4 Dec 2023 02:55:51 -0800 (PST) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 6A2433F6C4; Mon, 4 Dec 2023 02:55:00 -0800 (PST) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Ard Biesheuvel , Marc Zyngier , Oliver Upton , James Morse , Suzuki K Poulose , Zenghui Yu , Andrey Ryabinin , Alexander Potapenko , Andrey Konovalov , Dmitry Vyukov , Vincenzo Frascino , Andrew Morton , Anshuman Khandual , Matthew Wilcox , Yu Zhao , Mark Rutland , David Hildenbrand , Kefeng Wang , John Hubbard , Zi Yan , Barry Song <21cnbao@gmail.com>, Alistair Popple , Yang Shi Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v3 02/15] mm: Batch-clear PTE ranges during zap_pte_range() Date: Mon, 4 Dec 2023 10:54:27 +0000 Message-Id: <20231204105440.61448-3-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231204105440.61448-1-ryan.roberts@arm.com> References: <20231204105440.61448-1-ryan.roberts@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Mon, 04 Dec 2023 02:56:15 -0800 (PST) Convert zap_pte_range() to clear a set of ptes in a batch. A given batch maps a physically contiguous block of memory, all belonging to the same folio. This will likely improve performance by a tiny amount due to removing duplicate calls to mark the folio dirty and accessed. And also provides us with a future opportunity to batch the rmap removal. However, the primary motivation for this change is to reduce the number of tlb maintenance operations that the arm64 backend has to perform during exit and other syscalls that cause zap_pte_range() (e.g. munmap, madvise(DONTNEED), etc.), as it is about to add transparent support for the "contiguous bit" in its ptes. By clearing ptes using the new clear_ptes() API, the backend doesn't have to perform an expensive unfold operation when a PTE being cleared is part of a contpte block. Instead it can just clear the whole block immediately. This change addresses the core-mm refactoring only, and introduces clear_ptes() with a default implementation that calls ptep_get_and_clear_full() for each pte in the range. Note that this API returns the pte at the beginning of the batch, but with the dirty and young bits set if ANY of the ptes in the cleared batch had those bits set; this information is applied to the folio by the core-mm. Given the batch is garranteed to cover only a single folio, collapsing this state does not lose any useful information. A separate change will implement clear_ptes() in the arm64 backend to realize the performance improvement as part of the work to enable contpte mappings. Signed-off-by: Ryan Roberts --- include/asm-generic/tlb.h | 9 ++++++ include/linux/pgtable.h | 26 ++++++++++++++++ mm/memory.c | 63 ++++++++++++++++++++++++++------------- mm/mmu_gather.c | 14 +++++++++ 4 files changed, 92 insertions(+), 20 deletions(-) diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h index 129a3a759976..b84ba3aa1f6e 100644 --- a/include/asm-generic/tlb.h +++ b/include/asm-generic/tlb.h @@ -75,6 +75,9 @@ * boolean indicating if the queue is (now) full and a call to * tlb_flush_mmu() is required. * + * tlb_get_guaranteed_space() returns the minimum garanteed number of pages + * that can be queued without overflow. + * * tlb_remove_page() and tlb_remove_page_size() imply the call to * tlb_flush_mmu() when required and has no return value. * @@ -263,6 +266,7 @@ struct mmu_gather_batch { extern bool __tlb_remove_page_size(struct mmu_gather *tlb, struct encoded_page *page, int page_size); +extern unsigned int tlb_get_guaranteed_space(struct mmu_gather *tlb); #ifdef CONFIG_SMP /* @@ -273,6 +277,11 @@ extern bool __tlb_remove_page_size(struct mmu_gather *tlb, extern void tlb_flush_rmaps(struct mmu_gather *tlb, struct vm_area_struct *vma); #endif +#else +static inline unsigned int tlb_get_guaranteed_space(struct mmu_gather *tlb) +{ + return 1; +} #endif /* diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 1c50f8a0fdde..e998080eb7ae 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -635,6 +635,32 @@ static inline void ptep_set_wrprotects(struct mm_struct *mm, } #endif +#ifndef clear_ptes +struct mm_struct; +static inline pte_t clear_ptes(struct mm_struct *mm, + unsigned long address, pte_t *ptep, + int full, unsigned int nr) +{ + unsigned int i; + pte_t pte; + pte_t orig_pte = ptep_get_and_clear_full(mm, address, ptep, full); + + for (i = 1; i < nr; i++) { + address += PAGE_SIZE; + ptep++; + pte = ptep_get_and_clear_full(mm, address, ptep, full); + + if (pte_dirty(pte)) + orig_pte = pte_mkdirty(orig_pte); + + if (pte_young(pte)) + orig_pte = pte_mkyoung(orig_pte); + } + + return orig_pte; +} +#endif + /* * On some architectures hardware does not set page access bit when accessing * memory page, it is responsibility of software setting this bit. It brings diff --git a/mm/memory.c b/mm/memory.c index 8a87a488950c..60f030700a3f 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1515,6 +1515,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, pte_t *start_pte; pte_t *pte; swp_entry_t entry; + int nr; tlb_change_page_size(tlb, PAGE_SIZE); init_rss_vec(rss); @@ -1527,6 +1528,9 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, do { pte_t ptent = ptep_get(pte); struct page *page; + int i; + + nr = 1; if (pte_none(ptent)) continue; @@ -1535,45 +1539,64 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, break; if (pte_present(ptent)) { - unsigned int delay_rmap; + unsigned int delay_rmap = 0; + bool tlb_full = false; + struct folio *folio = NULL; page = vm_normal_page(vma, addr, ptent); if (unlikely(!should_zap_page(details, page))) continue; - ptent = ptep_get_and_clear_full(mm, addr, pte, - tlb->fullmm); + + if (likely(page)) { + folio = page_folio(page); + nr = folio_nr_pages_cont_mapped(folio, page, + pte, addr, end, + ptent, true, &i, &i); + nr = min_t(int, nr, tlb_get_guaranteed_space(tlb)); + } + + ptent = clear_ptes(mm, addr, pte, tlb->fullmm, nr); arch_check_zapped_pte(vma, ptent); - tlb_remove_tlb_entry(tlb, pte, addr); - zap_install_uffd_wp_if_needed(vma, addr, pte, details, - ptent); + + for (i = 0; i < nr; i++) { + unsigned long subaddr = addr + PAGE_SIZE * i; + + tlb_remove_tlb_entry(tlb, &pte[i], subaddr); + zap_install_uffd_wp_if_needed(vma, subaddr, + &pte[i], details, ptent); + } if (unlikely(!page)) { ksm_might_unmap_zero_page(mm, ptent); continue; } - delay_rmap = 0; - if (!PageAnon(page)) { + if (!folio_test_anon(folio)) { if (pte_dirty(ptent)) { - set_page_dirty(page); + folio_mark_dirty(folio); if (tlb_delay_rmap(tlb)) { delay_rmap = 1; force_flush = 1; } } if (pte_young(ptent) && likely(vma_has_recency(vma))) - mark_page_accessed(page); + folio_mark_accessed(folio); } - rss[mm_counter(page)]--; - if (!delay_rmap) { - page_remove_rmap(page, vma, false); - if (unlikely(page_mapcount(page) < 0)) - print_bad_pte(vma, addr, ptent, page); + for (i = 0; i < nr; i++, page++) { + rss[mm_counter(page)]--; + if (!delay_rmap) { + page_remove_rmap(page, vma, false); + if (unlikely(page_mapcount(page) < 0)) + print_bad_pte(vma, addr, ptent, page); + } + if (unlikely(__tlb_remove_page(tlb, page, delay_rmap))) { + tlb_full = true; + force_flush = 1; + addr += PAGE_SIZE * (i + 1); + break; + } } - if (unlikely(__tlb_remove_page(tlb, page, delay_rmap))) { - force_flush = 1; - addr += PAGE_SIZE; + if (unlikely(tlb_full)) break; - } continue; } @@ -1624,7 +1647,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, } pte_clear_not_present_full(mm, addr, pte, tlb->fullmm); zap_install_uffd_wp_if_needed(vma, addr, pte, details, ptent); - } while (pte++, addr += PAGE_SIZE, addr != end); + } while (pte += nr, addr += PAGE_SIZE * nr, addr != end); add_mm_rss_vec(mm, rss); arch_leave_lazy_mmu_mode(); diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c index 4f559f4ddd21..57b4d5f0dfa4 100644 --- a/mm/mmu_gather.c +++ b/mm/mmu_gather.c @@ -47,6 +47,20 @@ static bool tlb_next_batch(struct mmu_gather *tlb) return true; } +unsigned int tlb_get_guaranteed_space(struct mmu_gather *tlb) +{ + struct mmu_gather_batch *batch = tlb->active; + unsigned int nr_next = 0; + + /* Allocate next batch so we can guarrantee at least one batch. */ + if (tlb_next_batch(tlb)) { + tlb->active = batch; + nr_next = batch->next->max; + } + + return batch->max - batch->nr + nr_next; +} + #ifdef CONFIG_SMP static void tlb_flush_rmap_batch(struct mmu_gather_batch *batch, struct vm_area_struct *vma) { -- 2.25.1