Received: by 2002:a05:7208:31d3:b0:81:e143:7c29 with SMTP id v19csp2633250rbd; Tue, 9 Apr 2024 12:25:09 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCU2ldJI/fHHFE5CJ1Vhx3Kpdb5/268v2oami+QRrI4PNWWstOB8VmKPRwWyDY/aHro1xHK6vV+et7XROxZHtZeCf6R7WB59j3VzcQ2pzQ== X-Google-Smtp-Source: AGHT+IFW2K1B36PVfGNIL8jr8FypyBC+jU25sTsCdrHcoyCPHjZQAcvo2+B5dc4uOVKBGXVoWuba X-Received: by 2002:a05:6358:2c84:b0:186:1152:d743 with SMTP id l4-20020a0563582c8400b001861152d743mr804188rwm.15.1712690708614; Tue, 09 Apr 2024 12:25:08 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1712690708; cv=pass; d=google.com; s=arc-20160816; b=gGnmz9UMZFILYpA+L3FN8LvtqIOkov04sswgXN+16hl7toWXOrsabbRTYBk9L63SMx AIfhWW79noYi5eE5kJQiIL0BK02+EHurb+4D3LjmJYFLmUTrNMgvup72wm55mCyFvzf7 yNT7wcuVjM3vZgTAG9A3jVQU/nOxxW1l/J9n0TjCWGA9cbmzoTCGQnVS+MW8H0j9UmeY Bv8HtGNPLa+rODgXyZNPw3DvJ1RsawZEkzg8zTbdtlS9TyWImcXgL1/xzq/ijPB3tmuD LuF2ywVSEUC61SZIeo4YHqhSQ9S+kcxCyKmp0DJtLVGlotGp5wTqk2s2ZbpgJyo9DvX9 Pkag== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:message-id:date:subject:cc:to :from:dkim-signature; bh=TMgTsHOMAcz2aHp/mH0puI+1mQGKc2ZZZ7iHFFMIZj4=; fh=6UXOkRl8PLWnLRlJFkZ1FKxRpZrfO06mwvecZ/lFyqY=; b=p1kcK1rAlQV2tYHbhJpMXMeU6BsssxLyQQ+2NyFen3CllxSmyOu9Vii2smFBd9FxC9 FVouYVBZyJ1WYiGGWCJQhZ8a0a0ooblYNdO2LT8Gv2b7o54fLYgQL8iPb+iAxXSMuB2G teIxMy2u2Bhilol/AEx/qOHUuMIfMfGijCPlx7BRPZkqTwq78uMUhKSLYpfAsiZYQxr9 BgvqrKepaWDJX4YsVgDz63tR4EZWWHuCYSGQCJehmRe1JXvQPQHVkhLHX1Ki1GaqEA7m 8w8t75MhANT+yUna5Gud1mdL/wVfriDQbtK2s7SC82JfF5iacTqF7MSUdFSia8+iwDGL MSXw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=S0pg5XIR; arc=pass (i=1 spf=pass spfdomain=redhat.com dkim=pass dkdomain=redhat.com dmarc=pass fromdomain=redhat.com); spf=pass (google.com: domain of linux-kernel+bounces-137516-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-137516-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [2604:1380:40f1:3f00::1]) by mx.google.com with ESMTPS id m76-20020a633f4f000000b005e4a7a59ed8si9299774pga.548.2024.04.09.12.25.08 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 09 Apr 2024 12:25:08 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-137516-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) client-ip=2604:1380:40f1:3f00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=S0pg5XIR; arc=pass (i=1 spf=pass spfdomain=redhat.com dkim=pass dkdomain=redhat.com dmarc=pass fromdomain=redhat.com); spf=pass (google.com: domain of linux-kernel+bounces-137516-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-137516-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 085B1B22E27 for ; Tue, 9 Apr 2024 19:23:48 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 166DF157484; Tue, 9 Apr 2024 19:23:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="S0pg5XIR" Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DA56E15746E for ; Tue, 9 Apr 2024 19:23:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712690620; cv=none; b=h8nkjukZXk+YcD5G+H26i+tEa3K4iQzlv7bOCtIMXMVjeo2RdXvYuY9Hii62VLhnHTMvHOL4HLAosLGM3Rbf45QT6ZtLeLPYZUMq+QsshfXTKa/KFflu+Gcw3FnIl48AcjKwbfyVqv03LtksQ9e6BNPJGP3eQysDXcAN6/ipBwM= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712690620; c=relaxed/simple; bh=7q15DdmyUVZxJOE4lhL7AMeazKQwnySJ3OuJZt7ZlpY=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=cAv4GJPJo8yV0B3Slt53EaOhR//opT7ySg19HY9CAMpFevTqK2+nabMmfURnhsX8BnWEfUeAgt0mLQGyvNhtMxS/bvB/gvHBoCus3cq5nrBBakc0ngeMA+liM7e0Q551NS3nf/z/9h3IoZH1Py03X9E/y/01HFrN0P/NUJlNQ9k= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=S0pg5XIR; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1712690616; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=TMgTsHOMAcz2aHp/mH0puI+1mQGKc2ZZZ7iHFFMIZj4=; b=S0pg5XIRCB2kVLgmcaFlK4DcQ4h7WKRdhasQaJKE7vnA3gSZ3qDSjfi/ge1QtVICZ8kNLP XLxCc/fqmAseS4bW00kU5HWoXhc6UXW2AqwDxuU4GDoejwkXHHggvxFz9PWvzHvjqwXrd6 GrKdg/vC8xl73wLBvek8QUNW7so7yxo= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-694-RG_wslZHPLGPeceHYFOmDw-1; Tue, 09 Apr 2024 15:23:33 -0400 X-MC-Unique: RG_wslZHPLGPeceHYFOmDw-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 397963C100CD; Tue, 9 Apr 2024 19:23:32 +0000 (UTC) Received: from t14s.redhat.com (unknown [10.39.192.106]) by smtp.corp.redhat.com (Postfix) with ESMTP id 43E8F40AE787; Tue, 9 Apr 2024 19:23:22 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, linux-doc@vger.kernel.org, cgroups@vger.kernel.org, linux-sh@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, David Hildenbrand , Andrew Morton , "Matthew Wilcox (Oracle)" , Peter Xu , Ryan Roberts , Yin Fengwei , Yang Shi , Zi Yan , Jonathan Corbet , Hugh Dickins , Yoshinori Sato , Rich Felker , John Paul Adrian Glaubitz , Chris Zankel , Max Filippov , Muchun Song , Miaohe Lin , Naoya Horiguchi , Richard Chang Subject: [PATCH v1 00/18] mm: mapcount for large folios + page_mapcount() cleanups Date: Tue, 9 Apr 2024 21:22:43 +0200 Message-ID: <20240409192301.907377-1-david@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.2 This series tracks the mapcount of large folios in a single value, so it can be read efficiently and atomically, just like the mapcount of small folios. folio_mapcount() is then used in a couple more places, most notably to reduce false negatives in folio_likely_mapped_shared(), and many users of page_mapcount() are cleaned up (that's maybe why you got CCed on the full series, sorry sh+xtensa folks! :) ). The remaining s390x user and one KSM user of page_mapcount() are getting removed separately on the list right now. I have patches to handle the other KSM one, the khugepaged one and the kpagecount one; as they are not as "obvious", I will send them out separately in the future. Once that is all in place, I'm planning on moving page_mapcount() into fs/proc/task_mmu.c, the remaining user for the time being (and we can discuss at LSF/MM details on that :) ). I proposed the mapcount for large folios (previously called total mapcount) originally in part of [1] and I later included it in [2] where it is a requirement. In the meantime, I changed the patch a bit so I dropped all RB's. During the discussion of [1], Peter Xu correctly raised that this additional tracking might affect the performance when PMD->PTE remapping THPs. In the meantime. I addressed that by batching RMAP operations during fork(), unmap/zap and when PMD->PTE remapping THPs. Running some of my micro-benchmarks [3] (fork,munmap,cow-byte,remap) on 1 GiB of memory backed by folios with the same order, I observe the following on an Intel(R) Xeon(R) Silver 4210R CPU @ 2.40GHz tuned for reproducible results as much as possible: Standard deviation is mostly < 1%, except for order-9, where it's < 2% for fork() and munmap(). (1) Small folios are not affected (< 1%) in all 4 microbenchmarks. (2) Order-4 folios are not affected (< 1%) in all 4 microbenchmarks. A bit weird comapred to the other orders ... (3) PMD->PTE remapping of order-9 THPs is not affected (< 1%) (4) COW-byte (COWing a single page by writing a single byte) is not affected for any order (< 1 %). The page copy_fault overhead dominates everything. (5) fork() is mostly not affected (< 1%), except order-2, where we have a slowdown of ~4%. Already for order-3 folios, we're down to a slowdown of < 1%. (6) munmap() sees a slowdown by < 3% for some orders (order-5, order-6, order-9), but less for others (< 1% for order-4 and order-8, < 2% for order-2, order-3, order-7). Especially the fork() and munmap() benchmark are sensitive to each added instruction and other system noise, so I suspect some of the change and observed weirdness (order-4) is due to code layout changes and other factors, but not really due to the added atomics. So in the common case where we can batch, the added atomics don't really make a big difference, especially in light of the recent improvements for large folios that we recently gained due to batching. Surprisingly, for some cases where we cannot batch (e.g., COW), the added atomics don't seem to matter, because other overhead dominates. My fork and munmap micro-benchmarks don't cover cases where we cannot batch-process bigger parts of large folios. As this is not the common case, I'm not worrying about that right now. Future work is batching RMAP operations during swapout and folio migration. Not CCing everybody (e.g., cgroups folks just because of the doc updated) recommended by get_maintainers, to reduce noise. Tested on x86-64, compile-tested on a bunch of other archs. Will do more testing in the upcoming days. [1] https://lore.kernel.org/all/20230809083256.699513-1-david@redhat.com/ [2] https://lore.kernel.org/all/20231124132626.235350-1-david@redhat.com/ [3] https://gitlab.com/davidhildenbrand/scratchspace/-/raw/main/pte-mapped-folio-benchmarks.c?ref_type=heads Cc: Andrew Morton Cc: "Matthew Wilcox (Oracle)" Cc: Peter Xu Cc: Ryan Roberts Cc: Yin Fengwei Cc: Yang Shi Cc: Zi Yan Cc: Jonathan Corbet Cc: Hugh Dickins Cc: Yoshinori Sato Cc: Rich Felker Cc: John Paul Adrian Glaubitz Cc: Chris Zankel Cc: Max Filippov Cc: Muchun Song Cc: Miaohe Lin Cc: Naoya Horiguchi Cc: Richard Chang David Hildenbrand (18): mm: allow for detecting underflows with page_mapcount() again mm/rmap: always inline anon/file rmap duplication of a single PTE mm/rmap: add fast-path for small folios when adding/removing/duplicating mm: track mapcount of large folios in single value mm: improve folio_likely_mapped_shared() using the mapcount of large folios mm: make folio_mapcount() return 0 for small typed folios mm/memory: use folio_mapcount() in zap_present_folio_ptes() mm/huge_memory: use folio_mapcount() in zap_huge_pmd() sanity check mm/memory-failure: use folio_mapcount() in hwpoison_user_mappings() mm/page_alloc: use folio_mapped() in __alloc_contig_migrate_range() mm/migrate: use folio_likely_mapped_shared() in add_page_for_migration() sh/mm/cache: use folio_mapped() in copy_from_user_page() mm/filemap: use folio_mapcount() in filemap_unaccount_folio() mm/migrate_device: use folio_mapcount() in migrate_vma_check_page() trace/events/page_ref: trace the raw page mapcount value xtensa/mm: convert check_tlb_entry() to sanity check folios mm/debug: print only page mapcount (excluding folio entire mapcount) in __dump_folio() Documentation/admin-guide/cgroup-v1/memory.rst: don't reference page_mapcount() .../admin-guide/cgroup-v1/memory.rst | 4 +- Documentation/mm/transhuge.rst | 12 +-- arch/sh/mm/cache.c | 2 +- arch/xtensa/mm/tlb.c | 11 +-- include/linux/mm.h | 77 +++++++++++-------- include/linux/mm_types.h | 5 +- include/linux/rmap.h | 40 +++++++++- include/trace/events/page_ref.h | 4 +- mm/debug.c | 12 +-- mm/filemap.c | 2 +- mm/huge_memory.c | 2 +- mm/hugetlb.c | 4 +- mm/internal.h | 3 + mm/khugepaged.c | 2 +- mm/memory-failure.c | 4 +- mm/memory.c | 3 +- mm/migrate.c | 2 +- mm/migrate_device.c | 12 +-- mm/page_alloc.c | 12 ++- mm/rmap.c | 60 +++++++-------- 20 files changed, 163 insertions(+), 110 deletions(-) -- 2.44.0