Received: by 2002:a05:6358:bb9e:b0:b9:5105:a5b4 with SMTP id df30csp2989737rwb; Mon, 5 Sep 2022 04:59:54 -0700 (PDT) X-Google-Smtp-Source: AA6agR4GGPS0UYTwNOX4uPUIKGQD3p0TLUi+duGK1BKRkt31Pn/ihztOUU4Svl4DLqNW09ry5fBZ X-Received: by 2002:a17:90a:d585:b0:1f4:f9a5:22a9 with SMTP id v5-20020a17090ad58500b001f4f9a522a9mr19539363pju.49.1662379194463; Mon, 05 Sep 2022 04:59:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1662379194; cv=none; d=google.com; s=arc-20160816; b=hdfAYLxaWMs4w8J+TITkxBkF8AFLvEJh9puJWfpJ5/9rhO0LYuXPM6xVFlOlLRAQfP T4J0rcJUxUG9D5J8IKBbRE6yWJVBU0zaOXui3fP7XLwtktFeVIEar15O1vXM4K3V38eA FZjR2GuGp9+8gxpbiMVvCwyuC04gOhMLwRtg4BwIPKZG1edYOTh61vKdxLpNCd1wJoHg 4XpFg/vswAYPOlqB9pGXVczM53i/BEc+fl2jfkQmFSQZFjS5izLYutip4vZhHYBUMoVe xoi4DxXCHFyKeDo15NGHih40lO/31eFoCKHywwQJxDkBrBootrYjdaBtA/+AbSQZjBWw E2cQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=gqOpUgOp+m/JBLdSmwI0V7WEU5HODWc+qLjleENDRHc=; b=gHsC/LfMXwVw3RnRU1cY/kkVXL3P4BVFqsmRVowmvt3uRLoj3c307d4RgaSHiPkh4c JpLYBIqrqVBcK9K/6ctAoMeRpguFLsWwvW0z5WIJ9Dyh7Q3BznWOLuJwJuy6P4Nw9IuQ FlK8LLgwJHX67TV5/bkPgImV5FQJruFTwbRMzGfH/BW6UTeEnT62OhUwBcLc4iHdAmNC yx4E9lU3TS1jvYV38xyqKlLS1fR3JFf05a7yBu/5SsHPl7+fw5fCptAob/zwSd7qpiyo yQAeH56l+P4cCuFa+K7kcDzzlac53ibyUnk/4UlV7vgrpwZxuXVc3PdeNyRCUpibb8An tmDA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=GpEQ9IT9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id m65-20020a632644000000b0042b8b0284cdsi10648141pgm.97.2022.09.05.04.59.41; Mon, 05 Sep 2022 04:59:54 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=GpEQ9IT9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237987AbiIELQ3 (ORCPT + 99 others); Mon, 5 Sep 2022 07:16:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49458 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237516AbiIELQX (ORCPT ); Mon, 5 Sep 2022 07:16:23 -0400 Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 196CE564EC for ; Mon, 5 Sep 2022 04:16:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1662376582; x=1693912582; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=I0osSsi98bC4WEjJPI5+11kFjHGkl9xg1bMtejrSa1A=; b=GpEQ9IT9tpG/ZPSXt2VT1xntrFBrzXDCfgEoykUO/luB2asZY6t6qZso yehA+myN9nqKxFxnraZmFd/oLmCtcVgGS3sVaxoRhJZ1QIUIYFVb9fGJx CF4T2+3fsJ/3YHWNqsM1kyW8aAMMODehLGlPX4a/E9JZBnjyYq+gyhDzX 85vSsRKhSnrSJIpFWsWJocdJOEzg3lIv/EJQqmuc7RTi+4m3jyzSuSjFG WntdtdVc/O3dOHTm4+By4JoSDGVTeSO0gQS3x1ZUXaoZbQa+2Qb/shm/v xFwJKm9mwZsDOhM1bqCBVQ6kzVZDQm6HLjaM+n/R7i42dm92WIqvo+EkG g==; X-IronPort-AV: E=McAfee;i="6500,9779,10460"; a="358083315" X-IronPort-AV: E=Sophos;i="5.93,291,1654585200"; d="scan'208";a="358083315" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Sep 2022 04:16:21 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.93,291,1654585200"; d="scan'208";a="643774499" Received: from linux-pnp-server-13.sh.intel.com ([10.239.176.176]) by orsmga008.jf.intel.com with ESMTP; 05 Sep 2022 04:16:17 -0700 From: Jiebin Sun To: akpm@linux-foundation.org, vasily.averin@linux.dev, shakeelb@google.com, dennis@kernel.org, tj@kernel.org, cl@linux.com, ebiederm@xmission.com, legion@kernel.org, manfred@colorfullife.com, alexander.mikhalitsyn@virtuozzo.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: tim.c.chen@intel.com, feng.tang@intel.com, ying.huang@intel.com, tianyou.li@intel.com, wangyang.guo@intel.com, jiebin.sun@intel.com Subject: [PATCH v2 2/2] ipc/msg: mitigate the lock contention with percpu counter Date: Tue, 6 Sep 2022 03:35:15 +0800 Message-Id: <20220905193516.846647-2-jiebin.sun@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220905193516.846647-1-jiebin.sun@intel.com> References: <20220902152243.479592-1-jiebin.sun@intel.com> <20220905193516.846647-1-jiebin.sun@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.5 required=5.0 tests=BAYES_00,DATE_IN_FUTURE_06_12, DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, RCVD_IN_DNSWL_MED,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The msg_bytes and msg_hdrs atomic counters are frequently updated when IPC msg queue is in heavy use, causing heavy cache bounce and overhead. Change them to percpu_counters greatly improve the performance. Since there is one unique ipc namespace, additional memory cost is minimal. Reading of the count done in msgctl call, which is infrequent. So the need to sum up the counts in each CPU is infrequent. Apply the patch and test the pts/stress-ng-1.4.0 -- system v message passing (160 threads). Score gain: 3.38x CPU: ICX 8380 x 2 sockets Core number: 40 x 2 physical cores Benchmark: pts/stress-ng-1.4.0 -- system v message passing (160 threads) Signed-off-by: Jiebin Sun --- include/linux/ipc_namespace.h | 5 ++-- ipc/msg.c | 44 ++++++++++++++++++++++++----------- ipc/namespace.c | 5 +++- ipc/util.h | 4 ++-- 4 files changed, 39 insertions(+), 19 deletions(-) diff --git a/include/linux/ipc_namespace.h b/include/linux/ipc_namespace.h index e3e8c8662b49..e8240cf2611a 100644 --- a/include/linux/ipc_namespace.h +++ b/include/linux/ipc_namespace.h @@ -11,6 +11,7 @@ #include #include #include +#include struct user_namespace; @@ -36,8 +37,8 @@ struct ipc_namespace { unsigned int msg_ctlmax; unsigned int msg_ctlmnb; unsigned int msg_ctlmni; - atomic_t msg_bytes; - atomic_t msg_hdrs; + struct percpu_counter percpu_msg_bytes; + struct percpu_counter percpu_msg_hdrs; size_t shm_ctlmax; size_t shm_ctlall; diff --git a/ipc/msg.c b/ipc/msg.c index a0d05775af2c..87c30decb23f 100644 --- a/ipc/msg.c +++ b/ipc/msg.c @@ -39,6 +39,7 @@ #include #include #include +#include #include #include @@ -285,10 +286,10 @@ static void freeque(struct ipc_namespace *ns, struct kern_ipc_perm *ipcp) rcu_read_unlock(); list_for_each_entry_safe(msg, t, &msq->q_messages, m_list) { - atomic_dec(&ns->msg_hdrs); + percpu_counter_add_local(&ns->percpu_msg_hdrs, -1); free_msg(msg); } - atomic_sub(msq->q_cbytes, &ns->msg_bytes); + percpu_counter_add_local(&ns->percpu_msg_bytes, -(msq->q_cbytes)); ipc_update_pid(&msq->q_lspid, NULL); ipc_update_pid(&msq->q_lrpid, NULL); ipc_rcu_putref(&msq->q_perm, msg_rcu_free); @@ -495,17 +496,18 @@ static int msgctl_info(struct ipc_namespace *ns, int msqid, msginfo->msgssz = MSGSSZ; msginfo->msgseg = MSGSEG; down_read(&msg_ids(ns).rwsem); - if (cmd == MSG_INFO) { + if (cmd == MSG_INFO) msginfo->msgpool = msg_ids(ns).in_use; - msginfo->msgmap = atomic_read(&ns->msg_hdrs); - msginfo->msgtql = atomic_read(&ns->msg_bytes); + max_idx = ipc_get_maxidx(&msg_ids(ns)); + up_read(&msg_ids(ns).rwsem); + if (cmd == MSG_INFO) { + msginfo->msgmap = percpu_counter_sum(&ns->percpu_msg_hdrs); + msginfo->msgtql = percpu_counter_sum(&ns->percpu_msg_bytes); } else { msginfo->msgmap = MSGMAP; msginfo->msgpool = MSGPOOL; msginfo->msgtql = MSGTQL; } - max_idx = ipc_get_maxidx(&msg_ids(ns)); - up_read(&msg_ids(ns).rwsem); return (max_idx < 0) ? 0 : max_idx; } @@ -935,8 +937,8 @@ static long do_msgsnd(int msqid, long mtype, void __user *mtext, list_add_tail(&msg->m_list, &msq->q_messages); msq->q_cbytes += msgsz; msq->q_qnum++; - atomic_add(msgsz, &ns->msg_bytes); - atomic_inc(&ns->msg_hdrs); + percpu_counter_add_local(&ns->percpu_msg_bytes, msgsz); + percpu_counter_add_local(&ns->percpu_msg_hdrs, 1); } err = 0; @@ -1159,8 +1161,8 @@ static long do_msgrcv(int msqid, void __user *buf, size_t bufsz, long msgtyp, in msq->q_rtime = ktime_get_real_seconds(); ipc_update_pid(&msq->q_lrpid, task_tgid(current)); msq->q_cbytes -= msg->m_ts; - atomic_sub(msg->m_ts, &ns->msg_bytes); - atomic_dec(&ns->msg_hdrs); + percpu_counter_add_local(&ns->percpu_msg_bytes, -(msg->m_ts)); + percpu_counter_add_local(&ns->percpu_msg_hdrs, -1); ss_wakeup(msq, &wake_q, false); goto out_unlock0; @@ -1297,20 +1299,34 @@ COMPAT_SYSCALL_DEFINE5(msgrcv, int, msqid, compat_uptr_t, msgp, } #endif -void msg_init_ns(struct ipc_namespace *ns) +int msg_init_ns(struct ipc_namespace *ns) { + int ret; + ns->msg_ctlmax = MSGMAX; ns->msg_ctlmnb = MSGMNB; ns->msg_ctlmni = MSGMNI; - atomic_set(&ns->msg_bytes, 0); - atomic_set(&ns->msg_hdrs, 0); + ret = percpu_counter_init(&ns->percpu_msg_bytes, 0, GFP_KERNEL); + if (ret) + goto fail_msg_bytes; + ret = percpu_counter_init(&ns->percpu_msg_hdrs, 0, GFP_KERNEL); + if (ret) + goto fail_msg_hdrs; ipc_init_ids(&ns->ids[IPC_MSG_IDS]); + return 0; + + fail_msg_hdrs: + percpu_counter_destroy(&ns->percpu_msg_bytes); + fail_msg_bytes: + return ret; } #ifdef CONFIG_IPC_NS void msg_exit_ns(struct ipc_namespace *ns) { + percpu_counter_destroy(&ns->percpu_msg_bytes); + percpu_counter_destroy(&ns->percpu_msg_hdrs); free_ipcs(ns, &msg_ids(ns), freeque); idr_destroy(&ns->ids[IPC_MSG_IDS].ipcs_idr); rhashtable_destroy(&ns->ids[IPC_MSG_IDS].key_ht); diff --git a/ipc/namespace.c b/ipc/namespace.c index e1fcaedba4fa..8316ea585733 100644 --- a/ipc/namespace.c +++ b/ipc/namespace.c @@ -66,8 +66,11 @@ static struct ipc_namespace *create_ipc_ns(struct user_namespace *user_ns, if (!setup_ipc_sysctls(ns)) goto fail_mq; + err = msg_init_ns(ns); + if (err) + goto fail_put; + sem_init_ns(ns); - msg_init_ns(ns); shm_init_ns(ns); return ns; diff --git a/ipc/util.h b/ipc/util.h index 2dd7ce0416d8..1b0086c6346f 100644 --- a/ipc/util.h +++ b/ipc/util.h @@ -64,7 +64,7 @@ static inline void mq_put_mnt(struct ipc_namespace *ns) { } #ifdef CONFIG_SYSVIPC void sem_init_ns(struct ipc_namespace *ns); -void msg_init_ns(struct ipc_namespace *ns); +int msg_init_ns(struct ipc_namespace *ns); void shm_init_ns(struct ipc_namespace *ns); void sem_exit_ns(struct ipc_namespace *ns); @@ -72,7 +72,7 @@ void msg_exit_ns(struct ipc_namespace *ns); void shm_exit_ns(struct ipc_namespace *ns); #else static inline void sem_init_ns(struct ipc_namespace *ns) { } -static inline void msg_init_ns(struct ipc_namespace *ns) { } +static inline int msg_init_ns(struct ipc_namespace *ns) { return 0;} static inline void shm_init_ns(struct ipc_namespace *ns) { } static inline void sem_exit_ns(struct ipc_namespace *ns) { } -- 2.31.1