Received: by 2002:a05:7412:a9a2:b0:e2:908c:2ebd with SMTP id o34csp2625340rdh; Mon, 30 Oct 2023 02:59:51 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEnlYvdc3btJ67CV6YayYerxbvzCPffaAxgpinnwQ287p3r6X9wDoU0NiDgixO5v9NEMOh5 X-Received: by 2002:a05:6a20:7d87:b0:133:8784:15f7 with SMTP id v7-20020a056a207d8700b00133878415f7mr13208305pzj.14.1698659991091; Mon, 30 Oct 2023 02:59:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1698659991; cv=none; d=google.com; s=arc-20160816; b=h2TkwIJkq7uwdyl0aArGsxGpccwICANxdnF6yn/3mHLlr1R1qo5HQ3AmhvOrsomwyr tV3KZ9ul3bk5RaTL9LIYQmLCOe+ElY3UxeKUUiMb2uGC/oy3NvvH5jL+T8d8O/8JsrwW HJ/TH7/5IoACGpIfuKAD2xzPMqu6vMLDhVlzvJul4ERLCfrqGm0WJd5q8IOx8HV/gKti zk/SDgLjNnODwzbbFa1tt2+QReJQnQM5xUP+0vdUYI6fhEbmcGFRZMXG1Tm4QQpDB2l0 DzkWXzAFWidaUzskKhOmlKxa+pCs6PXAlSc3II2K91eyHRuPQii8UGRCDKAIGm9zDOAq k7GQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=/lO9tH2EuL/jkzDhpAin40wX75h4EI8uAlzLa2taUns=; fh=4WPHAD7c5VwG4+6hN2HJAO7H/g6oq5C3ouE1TxQnp9A=; b=R5/ltOcentU/Cw59KXH5QQrWxrGgBJ5njeS+hYPOOW/cyg9VZvOMA1glzLVJbvxUsi FUUXz65CYOcogcfdfeqqpdX8PNUHaSDVU9KEm/LHMxftEmvSy9kLWJVaV4dSh6nsc+OD 9SQ+rhK7vRtPKlbxrH8HFUUoIM/K86sx/QNLH4+0sVxgsgQ/rDC7yoREsyshRJn9ujVD cVUaJkDvSjZ0NA/9g7kkIsC8CPZR7rIE5xSovq9OG3ZzSwAggpsDApBNR80vMasKEI29 qqSjkK/GldCGruv/UnlKv6ZNLicYgG4s7kc92PT/daytUo0VLu1ex3HUE2KmPKGoy4zI jo1Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from lipwig.vger.email (lipwig.vger.email. [23.128.96.33]) by mx.google.com with ESMTPS id r20-20020a6560d4000000b005b89359f859si4615806pgv.777.2023.10.30.02.59.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 30 Oct 2023 02:59:51 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) client-ip=23.128.96.33; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id 983C680AC8E7; Mon, 30 Oct 2023 02:59:48 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232837AbjJ3J6u (ORCPT + 99 others); Mon, 30 Oct 2023 05:58:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55614 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232701AbjJ3J62 (ORCPT ); Mon, 30 Oct 2023 05:58:28 -0400 Received: from invmail4.hynix.com (exvmail4.hynix.com [166.125.252.92]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 0AF05D4E for ; Mon, 30 Oct 2023 02:58:09 -0700 (PDT) X-AuditID: a67dfc5b-d85ff70000001748-11-653f7e308b6a Date: Mon, 30 Oct 2023 18:58:03 +0900 From: Byungchul Park To: David Hildenbrand Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel_team@skhynix.com, akpm@linux-foundation.org, ying.huang@intel.com, namit@vmware.com, xhao@linux.alibaba.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com Subject: Re: [v3 2/3] mm: Defer TLB flush by keeping both src and dst folios at migration Message-ID: <20231030095803.GA81877@system.software.com> References: <20231030072540.38631-1-byungchul@sk.com> <20231030072540.38631-3-byungchul@sk.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.4 (2018-02-28) X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFlrDIsWRmVeSWpSXmKPExsXC9ZZnoa5BnX2qwaVfahZz1q9hs/i84R+b xYsN7YwWX9f/YrZ4+qmPxeLyrjlsFvfW/Ge1OL9rLavFjqX7mCwuHVjAZHF910NGi+O9B5gs Nm+aymzx+wdQ3ZwpVhYnZ01mcRDw+N7ax+KxYFOpx+YVWh6L97xk8ti0qpPNY9OnSewe786d Y/c4MeM3i8fOh5Ye804Gerzfd5XNY+svO4/Pm+Q83s1/yxbAF8Vlk5Kak1mWWqRvl8CV8eNm G3PBCd2KH0efMzUwblbsYuTkkBAwkXhyZwkrjL338Ww2EJtFQFXiZMs+ZhCbTUBd4saNn2C2 iICGxKa2DUA2FwezwDsmiXk9T1lAEsIC0RKf9r5iB7F5BSwk2n/tYQcpEhKYyiixc+cxFoiE oMTJmU/AbGYBLYkb/14ydTFyANnSEsv/cYCEOQXsJJoefgBbJiqgLHFg23EmkDkSAqvYJWb2 XmaHuFRS4uCKGywTGAVmIRk7C8nYWQhjFzAyr2IUyswry03MzDHRy6jMy6zQS87P3cQIjMNl tX+idzB+uhB8iFGAg1GJhzcg3C5ViDWxrLgy9xCjBAezkggvs6NNqhBvSmJlVWpRfnxRaU5q 8SFGaQ4WJXFeo2/lKUIC6YklqdmpqQWpRTBZJg5OqQZGO+EdxgKW4r6dfVM7vhc3RTW4Fe53 2LZuz1fJhbPOR5/kTWLN/f5n8iPhXY+db/xYnLl3h9TqPKtHIpla6kXebNd9zL4/26VyQ3Tx 8u64jLttoRVXsq/MC3dg5uZYlN7gozivof7eZDnLVMEDYsdSCr83iiUYzvG7yP9dvYRzKvck QfbsjC1KLMUZiYZazEXFiQCRJJ3HvwIAAA== X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFprMIsWRmVeSWpSXmKPExsXC5WfdrGtQZ59qMP+WgMWc9WvYLD5v+Mdm 8WJDO6PF1/W/mC2efupjsTg89ySrxeVdc9gs7q35z2pxftdaVosdS/cxWVw6sIDJ4vquh4wW x3sPMFls3jSV2eL3D6C6OVOsLE7OmsziIOjxvbWPxWPBplKPzSu0PBbvecnksWlVJ5vHpk+T 2D3enTvH7nFixm8Wj50PLT3mnQz0eL/vKpvH4hcfmDy2/rLz+LxJzuPd/LdsAfxRXDYpqTmZ ZalF+nYJXBk/brYxF5zQrfhx9DlTA+NmxS5GTg4JAROJvY9ns4HYLAKqEidb9jGD2GwC6hI3 bvwEs0UENCQ2tW0Asrk4mAXeMUnM63nKApIQFoiW+LT3FTuIzStgIdH+aw87SJGQwFRGiZ07 j7FAJAQlTs58AmYzC2hJ3Pj3kqmLkQPIlpZY/o8DJMwpYCfR9PAD2DJRAWWJA9uOM01g5J2F pHsWku5ZCN0LGJlXMYpk5pXlJmbmmOoVZ2dU5mVW6CXn525iBEbVsto/E3cwfrnsfohRgINR iYc3INwuVYg1say4MvcQowQHs5IIL7OjTaoQb0piZVVqUX58UWlOavEhRmkOFiVxXq/w1AQh gfTEktTs1NSC1CKYLBMHp1QD4x2997G9K6YlL+e8mHLE781K3ZKwQoED6csPF8V2f+W4/Hme 4oonEhGZXxsnBYSbn397MNJm9hrmL3p/9le+FZjXE7jO0m/SUTaP2F4P/ue57P9nsSf+rftQ 5P3Qu2zfuQ369qoTpQWm1blUnXGq21m40bNYTzra1S7KJHCJ54Sru6a89fhyQImlOCPRUIu5 qDgRAPjNvt2mAgAA X-CFilter-Loop: Reflected X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Mon, 30 Oct 2023 02:59:48 -0700 (PDT) On Mon, Oct 30, 2023 at 09:00:56AM +0100, David Hildenbrand wrote: > On 30.10.23 08:25, Byungchul Park wrote: > > Implementation of CONFIG_MIGRC that stands for 'Migration Read Copy'. > > We always face the migration overhead at either promotion or demotion, > > while working with tiered memory e.g. CXL memory and found out TLB > > shootdown is a quite big one that is needed to get rid of if possible. > > > > Fortunately, TLB flush can be defered or even skipped if both source and > > destination of folios during migration are kept until all TLB flushes > > required will have been done, of course, only if the target PTE entries > > have read only permission, more precisely speaking, don't have write > > permission. Otherwise, no doubt the folio might get messed up. > > > > To achieve that: > > > > 1. For the folios that map only to non-writable TLB entries, prevent > > TLB flush at migration by keeping both source and destination > > folios, which will be handled later at a better time. > > > > 2. When any non-writable TLB entry changes to writable e.g. through > > fault handler, give up CONFIG_MIGRC mechanism so as to perform > > TLB flush required right away. > > > > 3. Temporarily stop migrc from working when the system is in very > > high memory pressure e.g. direct reclaim needed. > > > > The measurement result: > > > > Architecture - x86_64 > > QEMU - kvm enabled, host cpu > > Numa - 2 nodes (16 CPUs 1GB, no CPUs 8GB) > > Linux Kernel - v6.6-rc5, numa balancing tiering on, demotion enabled > > Benchmark - XSBench -p 50000000 (-p option makes the runtime longer) > > > > run 'perf stat' using events: > > 1) itlb.itlb_flush > > 2) tlb_flush.dtlb_thread > > 3) tlb_flush.stlb_any > > 4) dTLB-load-misses > > 5) dTLB-store-misses > > 6) iTLB-load-misses > > > > run 'cat /proc/vmstat' and pick: > > 1) numa_pages_migrated > > 2) pgmigrate_success > > 3) nr_tlb_remote_flush > > 4) nr_tlb_remote_flush_received > > 5) nr_tlb_local_flush_all > > 6) nr_tlb_local_flush_one > > > > BEFORE - mainline v6.6-rc5 > > ------------------------------------------ > > $ perf stat -a \ > > -e itlb.itlb_flush \ > > -e tlb_flush.dtlb_thread \ > > -e tlb_flush.stlb_any \ > > -e dTLB-load-misses \ > > -e dTLB-store-misses \ > > -e iTLB-load-misses \ > > ./XSBench -p 50000000 > > > > Performance counter stats for 'system wide': > > > > 20953405 itlb.itlb_flush > > 114886593 tlb_flush.dtlb_thread > > 88267015 tlb_flush.stlb_any > > 115304095543 dTLB-load-misses > > 163904743 dTLB-store-misses > > 608486259 iTLB-load-misses > > > > 556.787113849 seconds time elapsed > > > > $ cat /proc/vmstat > > > > ... > > numa_pages_migrated 3378748 > > pgmigrate_success 7720310 > > nr_tlb_remote_flush 751464 > > nr_tlb_remote_flush_received 10742115 > > nr_tlb_local_flush_all 21899 > > nr_tlb_local_flush_one 740157 > > ... > > > > AFTER - mainline v6.6-rc5 + CONFIG_MIGRC > > ------------------------------------------ > > $ perf stat -a \ > > -e itlb.itlb_flush \ > > -e tlb_flush.dtlb_thread \ > > -e tlb_flush.stlb_any \ > > -e dTLB-load-misses \ > > -e dTLB-store-misses \ > > -e iTLB-load-misses \ > > ./XSBench -p 50000000 > > > > Performance counter stats for 'system wide': > > > > 4353555 itlb.itlb_flush > > 72482780 tlb_flush.dtlb_thread > > 68226458 tlb_flush.stlb_any > > 114331610808 dTLB-load-misses > > 116084771 dTLB-store-misses > > 377180518 iTLB-load-misses > > > > 552.667718220 seconds time elapsed > > > > $ cat /proc/vmstat > > > > So, an improvement of 0.74% ? How stable are the results? Serious question: I'm getting very stable result. > worth the churn? Yes, ultimately the time wise improvement should be observed. However, I've been focusing on the numbers of TLB flushes and TLB misses because better result in terms of total time will be followed depending on the test condition. We can see the result if we test with a system that: 1. has more CPUs that would induce a crazy number of IPIs. 2. has slow memories that makes TLB miss overhead bigger. 3. runs workloads that is harmful at TLB miss and IPI storm. 4. runs workloads that causes heavier numa migrations. 5. runs workloads that has a lot of read only permission mappings. 6. and so on. I will share the results once I manage to meet the conditions. By the way, I should've added IPI reduction because it also has super big delta :) > Or did I get the numbers wrong? > > > #define node_present_pages(nid) (NODE_DATA(nid)->node_present_pages) > > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h > > index 5c02720c53a5..1ca2ac91aa14 100644 > > --- a/include/linux/page-flags.h > > +++ b/include/linux/page-flags.h > > @@ -135,6 +135,9 @@ enum pageflags { > > #ifdef CONFIG_ARCH_USES_PG_ARCH_X > > PG_arch_2, > > PG_arch_3, > > +#endif > > +#ifdef CONFIG_MIGRC > > + PG_migrc, /* Page has its copy under migrc's control */ > > #endif > > __NR_PAGEFLAGS, > > @@ -589,6 +592,10 @@ TESTCLEARFLAG(Young, young, PF_ANY) > > PAGEFLAG(Idle, idle, PF_ANY) > > #endif > > +#ifdef CONFIG_MIGRC > > +PAGEFLAG(Migrc, migrc, PF_ANY) > > +#endif > > I assume you know this: new pageflags are frowned upon. Sorry for that. I really didn't want to add a new headache. Byungchul