Received: by 2002:a05:6a10:9afc:0:0:0:0 with SMTP id t28csp183846pxm; Tue, 1 Mar 2022 18:06:10 -0800 (PST) X-Google-Smtp-Source: ABdhPJxNuYe9rvcM9aOb7qfSOBRT5Ce2MeW4ox/sqTn83OMPamh7WzyEY3Y7ijuD9Wvl2BBl/HEA X-Received: by 2002:a05:6a02:19a:b0:378:4f44:b1da with SMTP id bj26-20020a056a02019a00b003784f44b1damr18947461pgb.568.1646186770293; Tue, 01 Mar 2022 18:06:10 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1646186770; cv=none; d=google.com; s=arc-20160816; b=tQHcE5wO5W5ACuy5eY16da0ORq4et6uH4Fa/HxgwureoigN1QKdJioesYSh54j5wU8 sJpVnvHEzgIfGSg5I5+M0k0/6tFuJzuX225KBs8YH+Y7mWz6klrs9TzU1SEYGLk4ZaOk L+8uhvcTVQUQXMoAKcxSheuFZwLwUicl0+ZlWDrP9f9TL5m/cKkCnB/dr0w8/Q5x/5fi ltTe5SRyTLn2cxxbq7Z3sgogGnMpgAiKV4j5cuuwhItJclcvrW4/gJBg2kmmC1HGu2+2 jsC/ReUrQ6/O2sxjqT4USgcjjcV+60llMgCvqi1u9WPkZ/AiYTaqt2RxOgp/kiilAbmd hDhg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=4g38lO84UEtzS1FMCpfYGL1X2pOxy5GyUBDE0k9I5bU=; b=A+0oQlQa7pwG+G+/YSoSDVlqETgEBqbjYcfoM7y5580ZOCtvve9gKG5B317j0AhcMA rcHyoCdcKrUGSE6pUBcTLW11nqyeqhV0OTGz3hNvdrdlXyYy9JlYA0ZGOJ4gIebC+x2u 87j6b2yCb6goWkfKgCTUd1NQgAYlIbxKKPlNrZOlnrrBYkphFq3X2w+YZEFXLdzlC6V5 koe/JlzdUhybqtBMwLkFJ4pSd2c6Swf21ed/m+H7M3OE5MQQr+sXupREzZoWIopRWnp/ lHvzNy4zR/Lt4cPWohUiEb2OcJGyzuk1nDaxcrsT3Qj9pJOXBU/c7Mc9AyKPjzFlOlkL fnew== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=g3YkwGFh; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b5-20020a056a0002c500b004f64e2d661csi793390pft.360.2022.03.01.18.05.41; Tue, 01 Mar 2022 18:06:10 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=g3YkwGFh; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238626AbiCAWEM (ORCPT + 99 others); Tue, 1 Mar 2022 17:04:12 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40472 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235007AbiCAWEI (ORCPT ); Tue, 1 Mar 2022 17:04:08 -0500 Received: from mail-ed1-x533.google.com (mail-ed1-x533.google.com [IPv6:2a00:1450:4864:20::533]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F1AC5403D9 for ; Tue, 1 Mar 2022 14:03:25 -0800 (PST) Received: by mail-ed1-x533.google.com with SMTP id p4so8763360edi.1 for ; Tue, 01 Mar 2022 14:03:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=4g38lO84UEtzS1FMCpfYGL1X2pOxy5GyUBDE0k9I5bU=; b=g3YkwGFhbNdKjyR0fFhLMMbf1kNDXQtjEFJZ6TJ0CZjmncWLXYuxpSn8ZngEIbLzDt Xbw9oWxk9uw85L3P7TsyLIIbnhgs779zzaWNaUDS3esO3WjHcJe3M7FfIxKQsacEqpwO 7EdKZIYT2PqSrpwsaYl7VxttgzZJhXMgy0gxg6earvDtLTSAhRtoZb6XDfdiCV0d91oy qF9Aa7UkV+/Jb3YuBk1K0oSRhpybtqzjrQ/b/H5fM4spJIFHsd0RO0VDbLLnJVVc0ZBD 4Noxbl35v8SHxRwCcup9zOjvWzQSFrmqaJAubdGOHnPSSRrYbWoAhkWYCCoD4iI9eQT0 fHIQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=4g38lO84UEtzS1FMCpfYGL1X2pOxy5GyUBDE0k9I5bU=; b=B18+En3m9xDG1TBIXugSmrCyndu4upXTDaRNEqteMNSx+TL8Y/fg0yxkudN1q+KZ4I FMf3yChMdRKclMUE42a6ucVyjdt3v1DmisKB8doVaop6BO9SOjbYbMSx+NEaiX8JBYPP ZhPec/xIL97Fo6/BkCEgNLmcZ5YYW/iYd/XCvRz2OQV+LFufQi012KZG3bgU3WeTWVeA VOAVY3UGZb+rN1vMngTWXfIMCSky0zU1KXLkxTouhv8A3cxHSrG323Raj1LUsa2Zyvqx QDkGsrIGQOkF5Wg6OedBvO2WhCXf6rgtvjLt1REIOFtQZqRe0BVu2eHApgC1rp4BRD8H un0A== X-Gm-Message-State: AOAM530YVtVE9OlZCRzs5/k4iYf3hWmE0C50bloT6JAxFqOsD0vwZyjZ xiOmaFFeIHuT1kVOgDAnOVTdaE0gKqLb+gdCyYw= X-Received: by 2002:aa7:ca5a:0:b0:410:9259:2e6f with SMTP id j26-20020aa7ca5a000000b0041092592e6fmr26313239edt.105.1646172204498; Tue, 01 Mar 2022 14:03:24 -0800 (PST) MIME-Version: 1.0 References: <20220301085329.3210428-1-ying.huang@intel.com> <20220301085329.3210428-4-ying.huang@intel.com> In-Reply-To: <20220301085329.3210428-4-ying.huang@intel.com> From: Yang Shi Date: Tue, 1 Mar 2022 14:03:11 -0800 Message-ID: Subject: Re: [PATCH -V14 3/3] memory tiering: skip to scan fast memory To: Huang Ying Cc: Peter Zijlstra , Mel Gorman , Andrew Morton , Linux MM , Linux Kernel Mailing List , Feng Tang , Dave Hansen , Baolin Wang , Johannes Weiner , Oscar Salvador , Michal Hocko , Rik van Riel , Zi Yan , Wei Xu , Shakeel Butt , zhongjiang-ali Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Mar 1, 2022 at 12:54 AM Huang Ying wrote: > > If the NUMA balancing isn't used to optimize the page placement among > sockets but only among memory types, the hot pages in the fast memory > node couldn't be migrated (promoted) to anywhere. So it's unnecessary > to scan the pages in the fast memory node via changing their PTE/PMD > mapping to be PROT_NONE. So that the page faults could be avoided > too. > > In the test, if only the memory tiering NUMA balancing mode is enabled, the > number of the NUMA balancing hint faults for the DRAM node is reduced to > almost 0 with the patch. While the benchmark score doesn't change > visibly. Reviewed-by: Yang Shi > > Signed-off-by: "Huang, Ying" > Suggested-by: Dave Hansen > Tested-by: Baolin Wang > Reviewed-by: Baolin Wang > Acked-by: Johannes Weiner > Reviewed-by: Oscar Salvador > Cc: Andrew Morton > Cc: Michal Hocko > Cc: Rik van Riel > Cc: Mel Gorman > Cc: Peter Zijlstra > Cc: Yang Shi > Cc: Zi Yan > Cc: Wei Xu > Cc: Shakeel Butt > Cc: zhongjiang-ali > Cc: linux-kernel@vger.kernel.org > Cc: linux-mm@kvack.org > --- > mm/huge_memory.c | 30 +++++++++++++++++++++--------- > mm/mprotect.c | 13 ++++++++++++- > 2 files changed, 33 insertions(+), 10 deletions(-) > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 406a3c28c026..9ce126cb0cfd 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -34,6 +34,7 @@ > #include > #include > #include > +#include > > #include > #include > @@ -1766,17 +1767,28 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, > } > #endif > > - /* > - * Avoid trapping faults against the zero page. The read-only > - * data is likely to be read-cached on the local CPU and > - * local/remote hits to the zero page are not interesting. > - */ > - if (prot_numa && is_huge_zero_pmd(*pmd)) > - goto unlock; > + if (prot_numa) { > + struct page *page; > + /* > + * Avoid trapping faults against the zero page. The read-only > + * data is likely to be read-cached on the local CPU and > + * local/remote hits to the zero page are not interesting. > + */ > + if (is_huge_zero_pmd(*pmd)) > + goto unlock; > > - if (prot_numa && pmd_protnone(*pmd)) > - goto unlock; > + if (pmd_protnone(*pmd)) > + goto unlock; > > + page = pmd_page(*pmd); > + /* > + * Skip scanning top tier node if normal numa > + * balancing is disabled > + */ > + if (!(sysctl_numa_balancing_mode & NUMA_BALANCING_NORMAL) && > + node_is_toptier(page_to_nid(page))) > + goto unlock; > + } > /* > * In case prot_numa, we are under mmap_read_lock(mm). It's critical > * to not clear pmd intermittently to avoid race with MADV_DONTNEED > diff --git a/mm/mprotect.c b/mm/mprotect.c > index 0138dfcdb1d8..2fe03e695c81 100644 > --- a/mm/mprotect.c > +++ b/mm/mprotect.c > @@ -29,6 +29,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -83,6 +84,7 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd, > */ > if (prot_numa) { > struct page *page; > + int nid; > > /* Avoid TLB flush if possible */ > if (pte_protnone(oldpte)) > @@ -109,7 +111,16 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd, > * Don't mess with PTEs if page is already on the node > * a single-threaded process is running on. > */ > - if (target_node == page_to_nid(page)) > + nid = page_to_nid(page); > + if (target_node == nid) > + continue; > + > + /* > + * Skip scanning top tier node if normal numa > + * balancing is disabled > + */ > + if (!(sysctl_numa_balancing_mode & NUMA_BALANCING_NORMAL) && > + node_is_toptier(nid)) > continue; > } > > -- > 2.30.2 >