Received: by 2002:a05:6a10:9848:0:0:0:0 with SMTP id x8csp3188343pxf; Mon, 5 Apr 2021 05:44:53 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyauqUh9eF6GMeA2p+Cwe7tc4bjKlijeAkReuuE3SQYawIYV3QkekLn1kESnjotxWyk0HgW X-Received: by 2002:a50:d84e:: with SMTP id v14mr32077793edj.357.1617626693447; Mon, 05 Apr 2021 05:44:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1617626693; cv=none; d=google.com; s=arc-20160816; b=TzEQcc7qVpuSXtVQ7JvaSutK5tP7zxdXH/RPQa7fzwYGZGik9phTbOIUtwe2R5US97 7aDC2KS+mSg4q5a015Pd3MOm00iSogN4rGatyE5BhbbyWTGXTmi5o8iSzMzaUGFtCc8s jSOrohFbAtdIk/WDQt+NcayIZIa7OW6EQvIkisNR7kkkSbPMyEEvWBGhZRWA4vcADATc LoCCd57D/lolQVerOcebCaKyeEIthmpUSeBbk7E2sxq8di6bZrbJKbj80fUgSMjyIycH mCpxObsaiOdaXiE7DHoUxZKHt7swXGyrmPWUbCOaH1LA2gokQxtsn8nPbJqJdqCkQCU/ lNww== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=wgE2QXGtA2k2RulkdY0P2NdfBR8aIuV3HvpXm28vy0Y=; b=lbiIPwuNgtpF0DyUR4GO8fxZhTZ6bodpdvhrPwJVYVkHp9IKzI0kqKN575QX+EvEki lCwwsUs62JszFnlTwX3TtsiqClSfTfNd4Fv31OzvgjKvazFvjxVtlDCQEJeAwU3PcBCa mE2TRsDAk1HWQCPM3K4cwGguwF8sDPUGYdpc8oKsaOOJGO07UUce2FEbOnCIDK33ltMw 7cDYrlfQcgRphToOd3ULH6W1+iRWFLW76uXnK9eMXbwqHu0/iJyjhG6+z9VrLuigUOP0 pjMIYhGQo5blGj7rgBbWio+t9APF71eEqY2LS54sdbXq511m4a8t7UZDG7odZvty6YUb wz5g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=DAVPdO6y; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id y1si13287460edp.405.2021.04.05.05.44.30; Mon, 05 Apr 2021 05:44:53 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=DAVPdO6y; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236660AbhDEJAv (ORCPT + 99 others); Mon, 5 Apr 2021 05:00:51 -0400 Received: from mail.kernel.org ([198.145.29.99]:41622 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237149AbhDEJAK (ORCPT ); Mon, 5 Apr 2021 05:00:10 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 6BE146139C; Mon, 5 Apr 2021 09:00:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1617613204; bh=Uol5QkrYCVpENYxW4B5q85cK/toUMgsrSk4MmVoh6uE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=DAVPdO6yNqQD6D+7BfZjYqrGyvA+pkpGFWpCzeV4ca6+AuTtUOx/iRpiwFGbUIKc8 CuPCHPjTTvBrPoLg6av8xiKKEpfwXau+BPK+y+zRiNFhRuW44tmFt8VXpW/MLsUZYS tFBuPD2+sdPY5ym/22/PAywvC9qQJsENd6hqUcJk= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Aaron Lu , kernel test robot , Johannes Weiner , Michal Hocko , Tejun Heo , Andrew Morton , Linus Torvalds , Sasha Levin Subject: [PATCH 4.14 34/52] mem_cgroup: make sure moving_account, move_lock_task and stat_cpu in the same cacheline Date: Mon, 5 Apr 2021 10:54:00 +0200 Message-Id: <20210405085023.107144420@linuxfoundation.org> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210405085021.996963957@linuxfoundation.org> References: <20210405085021.996963957@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Aaron Lu commit e81bf9793b1861d74953ef041b4f6c7faecc2dbd upstream. The LKP robot found a 27% will-it-scale/page_fault3 performance regression regarding commit e27be240df53("mm: memcg: make sure memory.events is uptodate when waking pollers"). What the test does is: 1 mkstemp() a 128M file on a tmpfs; 2 start $nr_cpu processes, each to loop the following: 2.1 mmap() this file in shared write mode; 2.2 write 0 to this file in a PAGE_SIZE step till the end of the file; 2.3 unmap() this file and repeat this process. 3 After 5 minutes, check how many loops they managed to complete, the higher the better. The commit itself looks innocent enough as it merely changed some event counting mechanism and this test didn't trigger those events at all. Perf shows increased cycles spent on accessing root_mem_cgroup->stat_cpu in count_memcg_event_mm()(called by handle_mm_fault()) and in __mod_memcg_state() called by page_add_file_rmap(). So it's likely due to the changed layout of 'struct mem_cgroup' that either make stat_cpu falling into a constantly modifying cacheline or some hot fields stop being in the same cacheline. I verified this by moving memory_events[] back to where it was: : --- a/include/linux/memcontrol.h : +++ b/include/linux/memcontrol.h : @@ -205,7 +205,6 @@ struct mem_cgroup { : int oom_kill_disable; : : /* memory.events */ : - atomic_long_t memory_events[MEMCG_NR_MEMORY_EVENTS]; : struct cgroup_file events_file; : : /* protect arrays of thresholds */ : @@ -238,6 +237,7 @@ struct mem_cgroup { : struct mem_cgroup_stat_cpu __percpu *stat_cpu; : atomic_long_t stat[MEMCG_NR_STAT]; : atomic_long_t events[NR_VM_EVENT_ITEMS]; : + atomic_long_t memory_events[MEMCG_NR_MEMORY_EVENTS]; : : unsigned long socket_pressure; And performance restored. Later investigation found that as long as the following 3 fields moving_account, move_lock_task and stat_cpu are in the same cacheline, performance will be good. To avoid future performance surprise by other commits changing the layout of 'struct mem_cgroup', this patch makes sure the 3 fields stay in the same cacheline. One concern of this approach is, moving_account and move_lock_task could be modified when a process changes memory cgroup while stat_cpu is a always read field, it might hurt to place them in the same cacheline. I assume it is rare for a process to change memory cgroup so this should be OK. Link: https://lkml.kernel.org/r/20180528114019.GF9904@yexl-desktop Link: http://lkml.kernel.org/r/20180601071115.GA27302@intel.com Signed-off-by: Aaron Lu Reported-by: kernel test robot Cc: Johannes Weiner Cc: Michal Hocko Cc: Tejun Heo Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sasha Levin --- include/linux/memcontrol.h | 23 +++++++++++++++++++---- 1 file changed, 19 insertions(+), 4 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 6503a9ca27c1..c7876eadd206 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -155,6 +155,15 @@ enum memcg_kmem_state { KMEM_ONLINE, }; +#if defined(CONFIG_SMP) +struct memcg_padding { + char x[0]; +} ____cacheline_internodealigned_in_smp; +#define MEMCG_PADDING(name) struct memcg_padding name; +#else +#define MEMCG_PADDING(name) +#endif + /* * The memory controller data structure. The memory controller controls both * page cache and RSS per cgroup. We would eventually like to provide @@ -202,7 +211,6 @@ struct mem_cgroup { int oom_kill_disable; /* memory.events */ - atomic_long_t memory_events[MEMCG_NR_MEMORY_EVENTS]; struct cgroup_file events_file; /* protect arrays of thresholds */ @@ -222,19 +230,26 @@ struct mem_cgroup { * mem_cgroup ? And what type of charges should we move ? */ unsigned long move_charge_at_immigrate; + /* taken only while moving_account > 0 */ + spinlock_t move_lock; + unsigned long move_lock_flags; + + MEMCG_PADDING(_pad1_); + /* * set > 0 if pages under this cgroup are moving to other cgroup. */ atomic_t moving_account; - /* taken only while moving_account > 0 */ - spinlock_t move_lock; struct task_struct *move_lock_task; - unsigned long move_lock_flags; /* memory.stat */ struct mem_cgroup_stat_cpu __percpu *stat_cpu; + + MEMCG_PADDING(_pad2_); + atomic_long_t stat[MEMCG_NR_STAT]; atomic_long_t events[NR_VM_EVENT_ITEMS]; + atomic_long_t memory_events[MEMCG_NR_MEMORY_EVENTS]; unsigned long socket_pressure; -- 2.30.2