Received: by 2002:ac0:e34a:0:0:0:0:0 with SMTP id g10csp725645imn; Thu, 28 Jul 2022 13:59:25 -0700 (PDT) X-Google-Smtp-Source: AGRyM1vwxwDTNFrreaJLon/fBwXTRxQ7DGm98/5aWQuDV/TeL5Afokp4k8idZQe9gSjRkK3viHOl X-Received: by 2002:a05:6402:5008:b0:437:7f01:82a with SMTP id p8-20020a056402500800b004377f01082amr768331eda.220.1659041965603; Thu, 28 Jul 2022 13:59:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1659041965; cv=none; d=google.com; s=arc-20160816; b=ZbeGN0Yr2B/VJG9co59Cjg5uRDPYjD1fYgTxAbxuNYS4N9OHjc8bzp8CeclUYtlNLB 7MGr3Ux7PsGuOOrmt62akO84zpY1c5IEFJtRSX2Dh+41VHfMh/R1yxFunv3cDH90JICp 00E+pGqzEf81I54G/js2ZRY2bcuVx6cm0YznT3Xdz1GqTUJAUnLGy1xR3A1PX2KXRkyD jYVPcoSSvkXE3kpZN4/4BhdULiEIA3naAOVLpHNdB3ViK021D8lk+KOYV4d8VoR//9NE 5zP6K7EOWoWZ9VsEdnVOsz8DdzIIKzbAV6W+xzRTBvwX0dnIsPX7jhvKqi2S69XoNbWW YN1w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version:reply-to :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=o9jWwsLlYZnS171yrNbtFKKGeaY8IhQRPNlofd9rtnc=; b=YzgPA6SEDUasFwg1/qPwwdryLAx/e1ftn12sXQy/1H2VYhVh9tBFv9gP02nT1+p0Gy 9r0pzzADXPyHqoolOlAqVuAXeEn41x7L0FSESCyiKn51Bf89DqAYdTBCRHy8bRx7vEBr qBtczPdwPd4r//4UedFpH+zfBDi9lB2qhY8onyuuSabYrP/TsY1x9eEn66Ym9ycmWhwJ TpaoGBjPqSFCTaMuU8+dazslSJC0d39/W0RV3x0kb8KKsnhTB4vWc6QO3kXWsH/8PFJ9 u77eH2bNG4Mk6Oy5XDqD54mQpl1i/1o/68ps08f2Xeos6N+iG4Dac6ELFj/mA/MKE/lx Fz7A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b="PC9ISe/T"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id v20-20020a056402349400b0043bea5026e7si1641057edc.119.2022.07.28.13.58.59; Thu, 28 Jul 2022 13:59:25 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b="PC9ISe/T"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232830AbiG1Upr (ORCPT + 99 others); Thu, 28 Jul 2022 16:45:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46766 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232683AbiG1Upj (ORCPT ); Thu, 28 Jul 2022 16:45:39 -0400 Received: from mail-pg1-x531.google.com (mail-pg1-x531.google.com [IPv6:2607:f8b0:4864:20::531]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1B8196C110 for ; Thu, 28 Jul 2022 13:45:33 -0700 (PDT) Received: by mail-pg1-x531.google.com with SMTP id e132so2436638pgc.5 for ; Thu, 28 Jul 2022 13:45:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; bh=o9jWwsLlYZnS171yrNbtFKKGeaY8IhQRPNlofd9rtnc=; b=PC9ISe/Tv2ntpg8FscV4ebolxpt+AhXtqalCdJ/HM/qaDIdf1rxD2vBYVhSyfRLxvQ GHNNXjMEXEikSN1uQJEUc4Zhe57ufRlQnqw9TudVYtELg/+5ehUxd4ndJQJ5R6PRGeeT r+ufm10YR1mOAqOW40t42Qr6bFNcdDvwRbzl9OkfnTCh+iEnGxt/DT6muqSIGo96CuOf et+sFqcvlWWFb9bXbnS+xT11zv2W1sj53OfXveVBtRe+8b7qrKDj27Z/dZd3xwMT1a5G exxkBIZstIbrCsALmdrvZOuGiqwEUnFgdG1pJx0C5F7hTIdLNEc4O30Ottfvj3r5Ihoq Gx8g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:reply-to:mime-version:content-transfer-encoding; bh=o9jWwsLlYZnS171yrNbtFKKGeaY8IhQRPNlofd9rtnc=; b=MrlQwgAmALJonkpvHpWC/Jug+K7irc3YKLAyH9ulLX/VfYujHvw+4eEEqMOaaTBlgr 7FUrWGaVlDLRayHPfJatWgr3HOpSub8fJP+SeJKVS7ZctRPHArH1alAlDky2Kyg3HYbw l71/AHex4awtqIMZFtr8E5fJLJp9gLnvdxJZR5kkpr0clF7zCWaMs1h8LPgPPd+M0abE SlqjBR/Zr4uehYa2OzMW+0FdjXEAQZrMa1Rm8NHWSPttIwtW1gL/fubwxJTBWHlSTrmh MMOd6SULoUuPWF5PTKpjLHGbV9r7UAtcxaagpjL+s8DSQY5AkzMII8gY9fE7xbggmCJc aWPg== X-Gm-Message-State: AJIora8/Xpkn4+zIUCsvuo5LtcdDa2RrZwFoFaB4dH6nIiXthSFsnArV +EB61X1YWFlXgUts8Mtvwbg= X-Received: by 2002:a05:6a00:1d26:b0:52b:f8ab:6265 with SMTP id a38-20020a056a001d2600b0052bf8ab6265mr340380pfx.54.1659041131938; Thu, 28 Jul 2022 13:45:31 -0700 (PDT) Received: from KASONG-MB0.tencent.com ([114.254.3.190]) by smtp.gmail.com with ESMTPSA id 21-20020a170902c11500b0016c40f8cb58sm1787304pli.81.2022.07.28.13.45.30 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 28 Jul 2022 13:45:31 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Andrew Morton , Kairui Song Subject: [RFC PATCH 6/7] mm: introduce CONFIG_ARCH_PCP_RSS_USE_CPUMASK Date: Fri, 29 Jul 2022 04:45:10 +0800 Message-Id: <20220728204511.56348-7-ryncsn@gmail.com> X-Mailer: git-send-email 2.35.2 In-Reply-To: <20220728204511.56348-1-ryncsn@gmail.com> References: <20220728204511.56348-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Kairui Song If the arch related code can provide helpers to bind the RSS cache to mm_cpumask, then the syncing code can just rely on that instead of doing full CPU synchronization. This speed up the reading/mm_exit by a lot. Signed-off-by: Kairui Song --- arch/Kconfig | 3 ++ kernel/sched/core.c | 3 +- mm/memory.c | 94 ++++++++++++++++++++++++++++----------------- 3 files changed, 64 insertions(+), 36 deletions(-) diff --git a/arch/Kconfig b/arch/Kconfig index 71b9272acb28..8df45b6346ae 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -1403,6 +1403,9 @@ config ARCH_HAS_ELFCORE_COMPAT config ARCH_HAS_PARANOID_L1D_FLUSH bool +config ARCH_PCP_RSS_USE_CPUMASK + bool + config DYNAMIC_SIGFRAME bool diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 11df67bb52ee..6f7991caf24b 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5143,7 +5143,8 @@ context_switch(struct rq *rq, struct task_struct *prev, prepare_lock_switch(rq, next, rf); /* Cache new active_mm */ - switch_pcp_rss_cache_no_irq(next->active_mm); + if (!IS_ENABLED(CONFIG_ARCH_PCP_RSS_USE_CPUMASK)) + switch_pcp_rss_cache_no_irq(next->active_mm); /* Here we just switch the register state and the stack. */ switch_to(prev, next, prev); diff --git a/mm/memory.c b/mm/memory.c index 09d7d193da51..a819009aa3e0 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -188,9 +188,16 @@ unsigned long get_mm_counter(struct mm_struct *mm, int member) { int cpu; long ret, update, sync_count; + const struct cpumask *mm_mask; ret = atomic_long_read(&mm->rss_stat.count[member]); - for_each_possible_cpu(cpu) { + + if (IS_ENABLED(CONFIG_ARCH_PCP_RSS_USE_CPUMASK)) + mm_mask = mm_cpumask(mm); + else + mm_mask = cpu_possible_mask; + + for_each_cpu(cpu, mm_mask) { if (READ_ONCE(per_cpu(cpu_rss_cache.mm, cpu)) != mm) continue; sync_count = READ_ONCE(per_cpu(cpu_rss_cache.sync_count, cpu)); @@ -217,12 +224,18 @@ unsigned long get_mm_rss(struct mm_struct *mm) { int cpu; long ret, update, sync_count; + const struct cpumask *mm_mask; ret = atomic_long_read(&mm->rss_stat.count[MM_FILEPAGES]), + atomic_long_read(&mm->rss_stat.count[MM_ANONPAGES]), + atomic_long_read(&mm->rss_stat.count[MM_SHMEMPAGES]); - for_each_possible_cpu(cpu) { + if (IS_ENABLED(CONFIG_ARCH_PCP_RSS_USE_CPUMASK)) + mm_mask = mm_cpumask(mm); + else + mm_mask = cpu_possible_mask; + + for_each_cpu(cpu, mm_mask) { if (READ_ONCE(per_cpu(cpu_rss_cache.mm, cpu)) != mm) continue; sync_count = READ_ONCE(per_cpu(cpu_rss_cache.sync_count, cpu)); @@ -266,10 +279,13 @@ void switch_pcp_rss_cache_no_irq(struct mm_struct *next_mm) if (cpu_mm == NULL) goto commit_done; - /* Race with check_discard_rss_cache */ - if (cpu_mm != cmpxchg(this_cpu_ptr(&cpu_rss_cache.mm), cpu_mm, - __pcp_rss_mm_mark(cpu_mm))) - goto commit_done; + /* Arch will take care of cache invalidation */ + if (!IS_ENABLED(CONFIG_ARCH_PCP_RSS_USE_CPUMASK)) { + /* Race with check_discard_rss_cache */ + if (cpu_mm != cmpxchg(this_cpu_ptr(&cpu_rss_cache.mm), cpu_mm, + __pcp_rss_mm_mark(cpu_mm))) + goto commit_done; + } for (int i = 0; i < NR_MM_COUNTERS; i++) { count = this_cpu_read(cpu_rss_cache.count[i]); @@ -328,46 +344,54 @@ static void check_discard_rss_cache(struct mm_struct *mm) long cached_count[NR_MM_COUNTERS] = { 0 }; struct mm_struct *cpu_mm; - /* Invalidate the RSS cache on every CPU */ - for_each_possible_cpu(cpu) { - cpu_mm = READ_ONCE(per_cpu(cpu_rss_cache.mm, cpu)); - if (__pcp_rss_mm_unmark(cpu_mm) != mm) - continue; - - /* - * If not being flusehd, try read-in the counter and mark it NULL, - * once cache's mm is set NULL, counter are considered invalided - */ - if (cpu_mm != __pcp_rss_mm_mark(cpu_mm)) { - long count[NR_MM_COUNTERS]; - - for (int i = 0; i < NR_MM_COUNTERS; i++) - count[i] = READ_ONCE(per_cpu(cpu_rss_cache.count[i], cpu)); + /* Arch will take care of cache invalidation */ + if (!IS_ENABLED(CONFIG_ARCH_PCP_RSS_USE_CPUMASK)) { + /* Invalidate the RSS cache on every CPU */ + for_each_possible_cpu(cpu) { + cpu_mm = READ_ONCE(per_cpu(cpu_rss_cache.mm, cpu)); + if (__pcp_rss_mm_unmark(cpu_mm) != mm) + continue; /* - * If successfully set to NULL, the owner CPU is not flushing it, counters - * are uncommiteed and untouched during this period, since a dying mm won't - * be accouted anymore + * If not being flusehd, try read-in the counter and mark it NULL, + * once cache's mm is set NULL, counter are considered invalided. */ - cpu_mm = cmpxchg(&per_cpu(cpu_rss_cache.mm, cpu), mm, NULL); - if (cpu_mm == mm) { + if (cpu_mm != __pcp_rss_mm_mark(cpu_mm)) { + long count[NR_MM_COUNTERS]; + for (int i = 0; i < NR_MM_COUNTERS; i++) - cached_count[i] += count[i]; - continue; + count[i] = READ_ONCE(per_cpu(cpu_rss_cache.count[i], cpu)); + + /* + * If successfully set to NULL, the owner CPU is not flushing it, + * counters are uncommitted and untouched during this period, since + * a dying mm won't be accouted anymore. + */ + cpu_mm = cmpxchg(&per_cpu(cpu_rss_cache.mm, cpu), mm, NULL); + if (cpu_mm == mm) { + for (int i = 0; i < NR_MM_COUNTERS; i++) + cached_count[i] += count[i]; + continue; + } } - } - /* It's being flushed, just busy wait as the critial section is really short */ - do { - cpu_relax(); - cpu_mm = READ_ONCE(per_cpu(cpu_rss_cache.mm, cpu)); - } while (cpu_mm == __pcp_rss_mm_mark(mm)); + /* + * It's being flushed, just busy wait as the critial section + * is really short. + */ + do { + cpu_relax(); + cpu_mm = READ_ONCE(per_cpu(cpu_rss_cache.mm, cpu)); + } while (cpu_mm == __pcp_rss_mm_mark(mm)); + } } for (int i = 0; i < NR_MM_COUNTERS; i++) { long val = atomic_long_read(&mm->rss_stat.count[i]); - val += cached_count[i]; + if (!IS_ENABLED(CONFIG_ARCH_PCP_RSS_USE_CPUMASK)) { + val += cached_count[i]; + } if (unlikely(val)) { pr_alert("BUG: Bad rss-counter state mm:%p type:%s val:%ld\n", -- 2.35.2