Received: by 2002:a25:7ec1:0:0:0:0:0 with SMTP id z184csp714982ybc; Tue, 19 Nov 2019 08:07:53 -0800 (PST) X-Google-Smtp-Source: APXvYqwoxken0BhC2fPPObu3/cKIoHFU/1sdIscAMwZutYsD7PFUjyeyI/jaSfTHz1bwLusKkuPi X-Received: by 2002:a67:cfc9:: with SMTP id h9mr20818151vsm.63.1574179673443; Tue, 19 Nov 2019 08:07:53 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1574179673; cv=none; d=google.com; s=arc-20160816; b=RrKl2ZGKg3xW9BUcrcMDujTm6ElYsniYXRQIkg9t4PxzEx2g0BYsR05zIA9Bm/+ofP eux6si8k+7M909Ro80JbH6TgyyR9ptFRfZWfzc+KTyuEJLDJKld0Sr9uuApcHGfOCX6t cQVHBfHRgCWFe5pKM7u+7z4KaGNwkWcUgBpSnzaZblxoCC4bBHgxeFDPeRLnxfZM7Fbn tN8JZ1lSLnAaVbPD4kNgzMgRfnfJ+TflazGlGeTVAFszEy/oxeSsRZ+fObi850HUaphN N+iooP0kgRA1zET3xJEYv8KqtRN2YozN2jg5YiuePrcVfI/1SeRfi8i36TJvKgDE0NAc E3uQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=ERJD83QXVwqnYgYCzdb0bwkiShGx05+e+BAlf5zZbPQ=; b=Buge9PPDOq4cTnq+s5IceBmas2xbFMx4h0ipU7yQgY7A8K0aM8DxUuEtT9D8E6J/3X xYHn8NDlx1eQOMmIod+0NgyrCk7QhjkIecpLwwAEXNQeiJscmhKdY5OHSujQZYHHHEFE p8nC/0xJvbJSFDQLuU8mwt0PawRG128dJJ+/raIszo8iGbTneEzFUA6lkmBmqTGUi1k4 WtkCUTzzv1WBBMKIEhwJkUVYcyhZJDoEwBWhpHrUkjDW8dvxZflbgYUcw0IXg+5aAMQT TqDNK0Ar1vutNwC4iBqUWqjMVeV/W1qf3lEcJX1TEBxThSnSz88K6cAsRKIrNmyi2ZYd BnTg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cmpxchg-org.20150623.gappssmtp.com header.s=20150623 header.b="diRMc+R/"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w2si16813132edc.300.2019.11.19.08.07.27; Tue, 19 Nov 2019 08:07:53 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@cmpxchg-org.20150623.gappssmtp.com header.s=20150623 header.b="diRMc+R/"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728368AbfKSQFA (ORCPT + 99 others); Tue, 19 Nov 2019 11:05:00 -0500 Received: from mail-qk1-f194.google.com ([209.85.222.194]:37898 "EHLO mail-qk1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727509AbfKSQE7 (ORCPT ); Tue, 19 Nov 2019 11:04:59 -0500 Received: by mail-qk1-f194.google.com with SMTP id e2so18268635qkn.5 for ; Tue, 19 Nov 2019 08:04:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to :user-agent; bh=ERJD83QXVwqnYgYCzdb0bwkiShGx05+e+BAlf5zZbPQ=; b=diRMc+R/wv6O/zk5sdqMsYxdrOTmEtcK0piBIQlHb+EQf9wKsiMHHHi50G+BtrGxIp g8XHgXZ4v0DxazaNoJ7EPyK/XdKNVRXKTKahGH1S3OCcVJSD1f1j0CBDGEnMkQmBr18f 5ZEe/pCPanQf8JN/v1mRs9ndYKZ7ySQWOozgfKll3FDFUTncarXPng3kASzB4s7lJllW S1yfmOkaJPz3MxL6xQwkHXy4hk10MW5mWKr+5VeSD3abLpxHDAVozL6iitP1R5gp/8iL 3HXiy6FzPmMHX86MujgxMaCQn+LSk8cJs4Na3TUGePFMc08J7yCgWW+vk9Ipbt7W7sXr BIGA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to:user-agent; bh=ERJD83QXVwqnYgYCzdb0bwkiShGx05+e+BAlf5zZbPQ=; b=uGCiJrNMeJPlmmPVUXo4VYHt8s7iY2IMmvuxTSGw816Z53sOijiE8Xg4C12LqrjPaR Xxb+wmaeZC+Puci/wVkOcEKr7Uet3f0CCgNim3Bi86QLKfZ2ywVFzxeXwmrzOsvq7SAh Ed7wkTjySCM9tBtjPcWg6TI7BDFzfyvIQMVQx8XnWJLzCY69+tvpFAz1sysSilFAtl+V WCBEZJ5SCpeBYHXz4yRfsmQ6p/ENHUjm1EnIQEdn7sNQwNS5bTP0aLZ7/vA0IJCjIG5r tQMJsISqDmytA+bgcAHe1hr6G6JG7vRN7RYD/BEAUfjXDtU4xzUxrSy7vF5ZHP1OV8mW rHkg== X-Gm-Message-State: APjAAAWee58iVgsGoNFybhOht6mJa787hYT8Y4v6lSaYkbtMy3B57t+C c/reDVBlrc7XNFkSGA1JqRn0dQ== X-Received: by 2002:a37:6105:: with SMTP id v5mr30624171qkb.40.1574179498147; Tue, 19 Nov 2019 08:04:58 -0800 (PST) Received: from localhost ([2620:10d:c091:500::c7ac]) by smtp.gmail.com with ESMTPSA id a18sm10612742qkc.2.2019.11.19.08.04.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 19 Nov 2019 08:04:57 -0800 (PST) Date: Tue, 19 Nov 2019 11:04:56 -0500 From: Johannes Weiner To: Alex Shi Cc: cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org, mgorman@techsingularity.net, tj@kernel.org, hughd@google.com, khlebnikov@yandex-team.ru, daniel.m.jordan@oracle.com, yang.shi@linux.alibaba.com, willy@infradead.org, shakeelb@google.com, Michal Hocko , Vladimir Davydov , Roman Gushchin , Chris Down , Thomas Gleixner , Vlastimil Babka , Qian Cai , Andrey Ryabinin , "Kirill A. Shutemov" , =?iso-8859-1?B?Suly9G1l?= Glisse , Andrea Arcangeli , David Rientjes , "Aneesh Kumar K.V" , swkhack , "Potyra, Stefan" , Mike Rapoport , Stephen Rothwell , Colin Ian King , Jason Gunthorpe , Mauro Carvalho Chehab , Peng Fan , Nikolay Borisov , Ira Weiny , Kirill Tkhai , Yafang Shao Subject: Re: [PATCH v4 3/9] mm/lru: replace pgdat lru_lock with lruvec lock Message-ID: <20191119160456.GD382712@cmpxchg.org> References: <1574166203-151975-1-git-send-email-alex.shi@linux.alibaba.com> <1574166203-151975-4-git-send-email-alex.shi@linux.alibaba.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1574166203-151975-4-git-send-email-alex.shi@linux.alibaba.com> User-Agent: Mutt/1.12.2 (2019-09-21) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Nov 19, 2019 at 08:23:17PM +0800, Alex Shi wrote: > This patchset move lru_lock into lruvec, give a lru_lock for each of > lruvec, thus bring a lru_lock for each of memcg per node. > > This is the main patch to replace per node lru_lock with per memcg > lruvec lock. > > We introduce function lock_page_lruvec, it's same as vanilla pgdat lock > when memory cgroup unset, w/o memcg, the function will keep repin the > lruvec's lock to guard from page->mem_cgroup changes in page > migrations between memcgs. (Thanks Hugh Dickins and Konstantin > Khlebnikov reminder on this. Than the core logical is same as their > previous patchs) > > According to Daniel Jordan's suggestion, I run 64 'dd' with on 32 > containers on my 2s* 8 core * HT box with the modefied case: > https://git.kernel.org/pub/scm/linux/kernel/git/wfg/vm-scalability.git/tree/case-lru-file-readtwice > > With this and later patches, the dd performance is 144MB/s, the vanilla > kernel performance is 123MB/s. 17% performance increased. > > Signed-off-by: Alex Shi > Cc: Johannes Weiner > Cc: Michal Hocko > Cc: Vladimir Davydov > Cc: Andrew Morton > Cc: Roman Gushchin > Cc: Shakeel Butt > Cc: Chris Down > Cc: Thomas Gleixner > Cc: Mel Gorman > Cc: Vlastimil Babka > Cc: Qian Cai > Cc: Andrey Ryabinin > Cc: "Kirill A. Shutemov" > Cc: "J?r?me Glisse" > Cc: Andrea Arcangeli > Cc: Yang Shi > Cc: David Rientjes > Cc: "Aneesh Kumar K.V" > Cc: swkhack > Cc: "Potyra, Stefan" > Cc: Mike Rapoport > Cc: Stephen Rothwell > Cc: Colin Ian King > Cc: Jason Gunthorpe > Cc: Mauro Carvalho Chehab > Cc: Matthew Wilcox > Cc: Peng Fan > Cc: Nikolay Borisov > Cc: Ira Weiny > Cc: Kirill Tkhai > Cc: Yafang Shao > Cc: Konstantin Khlebnikov > Cc: Hugh Dickins > Cc: Tejun Heo > Cc: linux-kernel@vger.kernel.org > Cc: linux-mm@kvack.org > Cc: cgroups@vger.kernel.org > --- > include/linux/memcontrol.h | 24 +++++++++++++++ > include/linux/mmzone.h | 2 ++ > mm/compaction.c | 67 ++++++++++++++++++++++++++++------------- > mm/huge_memory.c | 15 ++++------ > mm/memcontrol.c | 75 +++++++++++++++++++++++++++++++++++----------- > mm/mlock.c | 31 ++++++++++--------- > mm/mmzone.c | 1 + > mm/page_idle.c | 5 ++-- > mm/swap.c | 74 +++++++++++++++++++-------------------------- > mm/vmscan.c | 58 +++++++++++++++++------------------ > 10 files changed, 214 insertions(+), 138 deletions(-) > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > index 5b86287fa069..9538253998a6 100644 > --- a/include/linux/memcontrol.h > +++ b/include/linux/memcontrol.h > @@ -418,6 +418,10 @@ static inline struct lruvec *mem_cgroup_lruvec(struct mem_cgroup *memcg, > > struct lruvec *mem_cgroup_page_lruvec(struct page *, struct pglist_data *); > > +struct lruvec *lock_page_lruvec_irq(struct page *, struct pglist_data *); > +struct lruvec *lock_page_lruvec_irqsave(struct page *, struct pglist_data *, > + unsigned long*); > + > struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p); > > struct mem_cgroup *get_mem_cgroup_from_mm(struct mm_struct *mm); > @@ -901,6 +905,26 @@ static inline struct lruvec *mem_cgroup_page_lruvec(struct page *page, > return &pgdat->__lruvec; > } > > +static inline struct lruvec *lock_page_lruvec_irq(struct page *page, > + struct pglist_data *pgdat) > +{ > + struct lruvec *lruvec = mem_cgroup_page_lruvec(page, pgdat); > + > + spin_lock_irq(&lruvec->lru_lock); > + > + return lruvec; While this works in practice, it looks wrong because it doesn't follow the mem_cgroup_page_lruvec() rules. Please open-code spin_lock_irq(&pgdat->__lruvec->lru_lock) instead. > @@ -1246,6 +1245,46 @@ struct lruvec *mem_cgroup_page_lruvec(struct page *page, struct pglist_data *pgd > return lruvec; > } > > +struct lruvec *lock_page_lruvec_irq(struct page *page, > + struct pglist_data *pgdat) > +{ > + struct lruvec *lruvec; > + > +again: > + rcu_read_lock(); > + lruvec = mem_cgroup_page_lruvec(page, pgdat); > + spin_lock_irq(&lruvec->lru_lock); > + rcu_read_unlock(); The spinlock doesn't prevent the lruvec from being freed. You deleted the rules from the mem_cgroup_page_lruvec() documentation, but they still apply: if the page is already !PageLRU() by the time you get here, it could get reclaimed or migrated to another cgroup, and that can free the memcg/lruvec. Merely having the lru_lock held does not prevent this. Either the page needs to be locked, or the page needs to be PageLRU with the lru_lock held to prevent somebody else from isolating it. Otherwise, the lruvec is not safe to use.