Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752706AbdHBIwi (ORCPT ); Wed, 2 Aug 2017 04:52:38 -0400 Received: from mga11.intel.com ([192.55.52.93]:4729 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752558AbdHBIwg (ORCPT ); Wed, 2 Aug 2017 04:52:36 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.41,310,1498546800"; d="scan'208";a="294597222" From: "Huang, Ying" To: Peter Zijlstra Cc: linux-kernel@vger.kernel.org, Huang Ying , Ingo Molnar , Michael Ellerman , Borislav Petkov , Thomas Gleixner , Juergen Gross , Aaron Lu Subject: [PATCH 3/3] IPI: Avoid to use 2 cache lines for one call_single_data Date: Wed, 2 Aug 2017 16:52:20 +0800 Message-Id: <20170802085220.4315-4-ying.huang@intel.com> X-Mailer: git-send-email 2.13.2 In-Reply-To: <20170802085220.4315-1-ying.huang@intel.com> References: <20170802085220.4315-1-ying.huang@intel.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2140 Lines: 55 From: Huang Ying struct call_single_data is used in IPI to transfer information between CPUs. Its size is bigger than sizeof(unsigned long) and less than cache line size. Now, it is allocated with no any alignment requirement. This makes it possible for allocated call_single_data to cross 2 cache lines. So that double the number of the cache lines that need to be transferred among CPUs. This is resolved by aligning the allocated call_single_data with cache line size. To test the effect of the patch, we use the vm-scalability multiple thread swap test case (swap-w-seq-mt). The test will create multiple threads and each thread will eat memory until all RAM and part of swap is used, so that huge number of IPI will be triggered when unmapping memory. In the test, the throughput of memory writing improves ~5% compared with misaligned call_single_data because of faster IPI. Signed-off-by: "Huang, Ying" Cc: Peter Zijlstra Cc: Ingo Molnar Cc: Michael Ellerman Cc: Borislav Petkov Cc: Thomas Gleixner Cc: Juergen Gross Cc: Aaron Lu --- kernel/smp.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/kernel/smp.c b/kernel/smp.c index 3061483cb3ad..81d9ae08eb6e 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -51,7 +51,7 @@ int smpcfd_prepare_cpu(unsigned int cpu) free_cpumask_var(cfd->cpumask); return -ENOMEM; } - cfd->csd = alloc_percpu(struct call_single_data); + cfd->csd = alloc_percpu_aligned(struct call_single_data); if (!cfd->csd) { free_cpumask_var(cfd->cpumask); free_cpumask_var(cfd->cpumask_ipi); @@ -269,7 +269,9 @@ int smp_call_function_single(int cpu, smp_call_func_t func, void *info, int wait) { struct call_single_data *csd; - struct call_single_data csd_stack = { .flags = CSD_FLAG_LOCK | CSD_FLAG_SYNCHRONOUS }; + struct call_single_data csd_stack ____cacheline_aligned = { + .flags = CSD_FLAG_LOCK | CSD_FLAG_SYNCHRONOUS + }; int this_cpu; int err; -- 2.13.2