Received: by 2002:a05:6a10:83d0:0:0:0:0 with SMTP id o16csp55023pxh; Thu, 7 Apr 2022 13:48:23 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzZyVqaTYqHSxY7AyW2V8ZUTvsfYkZP06BF1qb8idYh+8eD1gsRmZMaRCt7WoIAszAoc2cT X-Received: by 2002:aca:5e84:0:b0:2ec:9c1d:fc77 with SMTP id s126-20020aca5e84000000b002ec9c1dfc77mr6566315oib.291.1649364503042; Thu, 07 Apr 2022 13:48:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1649364503; cv=none; d=google.com; s=arc-20160816; b=ACUAwY0tsiUCbb1JUWIrNqqHZrR/PJVpiMXwswLwXpafD+FXCHpp/iJUz2ajpuRwWO TABbWenwdCchQ7IlIOh18zVjizRkhY2vUvwGXXh04VthcARl0DM8R39J0Q6RyRE3Ek4r WXMh5k7rRTk34Iii/cwhjHA/y5h8MC5ULPsVEbosES+LinlUv39sb3SxFf6L3PdguIvk oB6QujEWawMppKWLiYtqRtpT96/JW5T/yqlOY8lYpg6EQtT8wKBJvusVpEkzhDeHlZfC qYvlEWk7TcyhtQvCXPQNq6K05MEy6pH/2EQOsc0DO3rZssbNzu6TFYQC6op0sY5CL3Id oAoA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=HPJ/gxwWTPMsjVq1wqt/856o/PLEpZi5P1LCQp3vihA=; b=mW+egwrSvUip5ZeZZIHBSZKZ0/dUIQBXFh7K/Ai+5FiFQM9HK9xxG/7hASKoajHLXC c+gj0cKaQIcwXywOF3yRoi+y492F50XypDZI15mskHA/tjwiMhLowFVnfx1qaxZjv5oL dxehz3383+E6LNesVWdz+Usg4c1P2amvtRdhj7fOeX/2Yf/VdF9FKUoMpQ12qMuOghPR Bi5qtZZHwHmP5sNjfYK73GBgtsyLi0MBAaP4vosPtcl2krbXlluFyi4cYkRsTvNNFbY8 rU/F4IJ7osfUS8v1KWYoLmZJ7i/OoZE++DXaQdxY8jKB28XoDscd2ZAdzocFBlC50spU Udag== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=b0Fftw2g; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id fq16-20020a0568710b1000b000e1ecf9cfe8si945896oab.17.2022.04.07.13.48.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Apr 2022 13:48:23 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=b0Fftw2g; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 037AF26C2C1; Thu, 7 Apr 2022 12:55:25 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346406AbiDGRba (ORCPT + 99 others); Thu, 7 Apr 2022 13:31:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36118 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1346421AbiDGRao (ORCPT ); Thu, 7 Apr 2022 13:30:44 -0400 Received: from mail-pl1-x629.google.com (mail-pl1-x629.google.com [IPv6:2607:f8b0:4864:20::629]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 50EF329C9E for ; Thu, 7 Apr 2022 10:28:17 -0700 (PDT) Received: by mail-pl1-x629.google.com with SMTP id o20so5580277pla.13 for ; Thu, 07 Apr 2022 10:28:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=HPJ/gxwWTPMsjVq1wqt/856o/PLEpZi5P1LCQp3vihA=; b=b0Fftw2gYiSz51wMYQnmyWiSCHVpRn1cZ2/G1Tb72pQ9W/97f89BBukh/dVUpOTJiD k4psjM6QujlBWLsJ7MjiCngT/pCWFjO7Y7wjEtK8+Iqsaotw1ekB0brOF+mo9/r44Bld tX+zELWubhSNeeQdAf6ydkz8LmwPLvMUJaKduE7leSwYi4nH06HrCbW1Zdi9MH78J2x6 //KLsIz9xT/XZYKDMqRh0Ufc4P69lRMPTA5DN8idndiqK2hrf7Kb8GyxlhhTHVpCBApx xwRFpcGrksg3DskkJPZkPRytVgCyNxVw7Fmq6SqgR+sZrIiHNF+xX7vPp+vOsNvvLLfw yMJQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=HPJ/gxwWTPMsjVq1wqt/856o/PLEpZi5P1LCQp3vihA=; b=Rj7dMvT8iOsTf/FOFizqhqqTVKz0PYiZtF3LoWdgW9gkVctRfbLjKkRK5SRfgeB+2Q q8Doom7aKuQbxeXS+LifLTamwzqlqJ/174LatfOlu0cRHYZPBIwnMnqsO+XJXIDsIv4a LZWjevChsVzp0qO6jGS2ymTex+4UncO0FrzvKjgbFT1Qf1IoCoXq9x9bcGo/6AHNaCxS i7mZB1pnb2N6s7MOVkCZrz6BqdXb2yQV6AAjEuSlt5drZAITgplC1QFIzRe+XjRJjn+I 1CrPZgQJ4fsR7ba5gevPO0v4iPWG4x0S/hiITf/Kcc2HYPW1tI1CDLYWVoYvahGAVBre xiAw== X-Gm-Message-State: AOAM531bSos6mW9Dm8KkHmtK/HcrloZN7vSW1nYN8F9kONP9iJtUosoH PP1DdjVXaNnDxUgwfTvKLJIleEZKdT/A8dL39Nc= X-Received: by 2002:a17:90a:5298:b0:1ca:7fb3:145 with SMTP id w24-20020a17090a529800b001ca7fb30145mr17083799pjh.200.1649352482048; Thu, 07 Apr 2022 10:28:02 -0700 (PDT) MIME-Version: 1.0 References: <20220407020953.475626-1-shy828301@gmail.com> In-Reply-To: From: Yang Shi Date: Thu, 7 Apr 2022 10:27:50 -0700 Message-ID: Subject: Re: [PATCH] mm: swap: determine swap device by using page nid To: Michal Hocko Cc: Huang Ying , Andrew Morton , Linux MM , Linux Kernel Mailing List , Aaron Lu Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RDNS_NONE, SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Apr 7, 2022 at 12:52 AM Michal Hocko wrote: > > [Cc Aaron who has introduced the per node swap changes] > > On Wed 06-04-22 19:09:53, Yang Shi wrote: > > The swap devices are linked to per node priority lists, the swap device > > closer to the node has higher priority on that node's priority list. > > This is supposed to improve I/O latency, particularly for some fast > > devices. But the current code gets nid by calling numa_node_id() which > > actually returns the nid that the reclaimer is running on instead of the > > nid that the page belongs to. > > > > Pass the page's nid dow to get_swap_pages() in order to pick up the > > right swap device. But it doesn't work for the swap slots cache which > > is per cpu. We could skip swap slots cache if the current node is not > > the page's node, but it may be overkilling. So keep using the current > > node's swap slots cache. The issue was found by visual code inspection > > so it is not sure how much improvement could be achieved due to lack of > > suitable testing device. But anyway the current code does violate the > > design. > > Do you have any perf numbers for this change? No, it was found by visual code inspection and offline discussion with Huang Ying. > > > Cc: Huang Ying > > Signed-off-by: Yang Shi > > --- > > include/linux/swap.h | 3 ++- > > mm/swap_slots.c | 7 ++++--- > > mm/swapfile.c | 5 ++--- > > 3 files changed, 8 insertions(+), 7 deletions(-) > > > > diff --git a/include/linux/swap.h b/include/linux/swap.h > > index 27093b477c5f..e442cf6b61ea 100644 > > --- a/include/linux/swap.h > > +++ b/include/linux/swap.h > > @@ -497,7 +497,8 @@ extern void si_swapinfo(struct sysinfo *); > > extern swp_entry_t get_swap_page(struct page *page); > > extern void put_swap_page(struct page *page, swp_entry_t entry); > > extern swp_entry_t get_swap_page_of_type(int); > > -extern int get_swap_pages(int n, swp_entry_t swp_entries[], int entry_size); > > +extern int get_swap_pages(int n, swp_entry_t swp_entries[], int entry_size, > > + int node); > > extern int add_swap_count_continuation(swp_entry_t, gfp_t); > > extern void swap_shmem_alloc(swp_entry_t); > > extern int swap_duplicate(swp_entry_t); > > diff --git a/mm/swap_slots.c b/mm/swap_slots.c > > index 2b5531840583..a1c5cf6a4302 100644 > > --- a/mm/swap_slots.c > > +++ b/mm/swap_slots.c > > @@ -264,7 +264,7 @@ static int refill_swap_slots_cache(struct swap_slots_cache *cache) > > cache->cur = 0; > > if (swap_slot_cache_active) > > cache->nr = get_swap_pages(SWAP_SLOTS_CACHE_SIZE, > > - cache->slots, 1); > > + cache->slots, 1, numa_node_id()); > > > > return cache->nr; > > } > > @@ -305,12 +305,13 @@ swp_entry_t get_swap_page(struct page *page) > > { > > swp_entry_t entry; > > struct swap_slots_cache *cache; > > + int nid = page_to_nid(page); > > > > entry.val = 0; > > > > if (PageTransHuge(page)) { > > if (IS_ENABLED(CONFIG_THP_SWAP)) > > - get_swap_pages(1, &entry, HPAGE_PMD_NR); > > + get_swap_pages(1, &entry, HPAGE_PMD_NR, nid); > > goto out; > > } > > > > @@ -342,7 +343,7 @@ swp_entry_t get_swap_page(struct page *page) > > goto out; > > } > > > > - get_swap_pages(1, &entry, 1); > > + get_swap_pages(1, &entry, 1, nid); > > out: > > if (mem_cgroup_try_charge_swap(page, entry)) { > > put_swap_page(page, entry); > > diff --git a/mm/swapfile.c b/mm/swapfile.c > > index 63c61f8b2611..151fffe0fd60 100644 > > --- a/mm/swapfile.c > > +++ b/mm/swapfile.c > > @@ -1036,13 +1036,13 @@ static void swap_free_cluster(struct swap_info_struct *si, unsigned long idx) > > swap_range_free(si, offset, SWAPFILE_CLUSTER); > > } > > > > -int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_size) > > +int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_size, > > + int node) > > { > > unsigned long size = swap_entry_size(entry_size); > > struct swap_info_struct *si, *next; > > long avail_pgs; > > int n_ret = 0; > > - int node; > > > > /* Only single cluster request supported */ > > WARN_ON_ONCE(n_goal > 1 && size == SWAPFILE_CLUSTER); > > @@ -1060,7 +1060,6 @@ int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_size) > > atomic_long_sub(n_goal * size, &nr_swap_pages); > > > > start_over: > > - node = numa_node_id(); > > plist_for_each_entry_safe(si, next, &swap_avail_heads[node], avail_lists[node]) { > > /* requeue si to after same-priority siblings */ > > plist_requeue(&si->avail_lists[node], &swap_avail_heads[node]); > > -- > > 2.26.3 > > -- > Michal Hocko > SUSE Labs