Received: by 2002:a25:7ec1:0:0:0:0:0 with SMTP id z184csp206691ybc; Fri, 15 Nov 2019 20:39:45 -0800 (PST) X-Google-Smtp-Source: APXvYqxbQvkN9O9mQ+aHAKEdu2n/5xGpKnWY8hAUn3qjduHGH/JKHrUTuCM0ROeiwH1ZMaCXexSk X-Received: by 2002:a17:906:3285:: with SMTP id 5mr6396030ejw.143.1573879185684; Fri, 15 Nov 2019 20:39:45 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1573879185; cv=none; d=google.com; s=arc-20160816; b=y4+wXVyFqgLPgz0EyD2jCjN9uLcaVK9pzyA9yft2vIoQ5KL4OpwlFTWCBwfJf6Nnz9 knFUTrV7uMUfD0eUzUED1Y8c5JTLu9ZT1fqntoLw86s0Fw1Y/XAXG6B/ut6OO3le3IWT RgGUE+6P7xObGHEPTNOiZ0sdV1Fre1E4+QTpi5MT9FXtD5pdpiIp6S6SiXylIjKjO+WR SC5dH+t6bREYQ5ygMzBGJNOdK52wGYwDnV+E8yFLdlAf1CKuT/yRrX7qjnr4gCz/AksT focrnYxQPfEs3+ydK5Ttg4viyecyoWwXppXR9S7PteK9X8CHBsuZ0KusqJzlKZrsUL0h 752A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=NcGo6y3a7xQQOSK5TngZP8uR7CqYMj91tUhuMqf+9Ns=; b=EYLQdIgWOruNNWXxHsGA+rewl/6QVAgpPsZGWGJsCeO8jKrzgnCNoYccpGkN32V+bX AY9bBpdvR9M2lfig5OfTcyjAR/gFnl3Yf+sQTkubFJnJYfV3DGfjyGESCgzttcnFclQd lcm2+xch4Tf4ItYkKZSHLoY7venrqiCCbBP2JC8gXpo23H4cl7qV56cWDCI52OBvBd3v ALq7qiPxfNefjKvXK3puPtLRd+lPp82l+PCrVFaGN+KiuCnrIdc2ixYnDkAl6h/uSjes A/hwEk4zyvunfyA9zfDFnWcYzrdAf2TTSWKmeA5FGCjDaJSe5/qQ2Nxbe9iuXpo1cfVY WQeg== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b=KJtGujYN; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l1si5460795edw.274.2019.11.15.20.39.20; Fri, 15 Nov 2019 20:39:45 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b=KJtGujYN; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727386AbfKPEiP (ORCPT + 99 others); Fri, 15 Nov 2019 23:38:15 -0500 Received: from bombadil.infradead.org ([198.137.202.133]:34932 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727323AbfKPEiP (ORCPT ); Fri, 15 Nov 2019 23:38:15 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=In-Reply-To:Content-Type:MIME-Version :References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=NcGo6y3a7xQQOSK5TngZP8uR7CqYMj91tUhuMqf+9Ns=; b=KJtGujYNRdJwEveOIs3XLtio4 XHcsmagpcP1yaA2N+7UrIE8V88n7OW4Do7IfTCYeR+d4q8v+G3r8Zir+Cp+wqGKc7HcyXxooyXm1i jUE+uYz62ZT+621BjFs/KBU3haXSa51WqN5sxZOr6xU5RH76Y/Sz5sUAwn8r5wYzwdM5mfHtQUUs2 ZrfmLVkLLS3h61l7VCJuzpkz68DUwA0bDfpS9kSDMtYs2EMRC5HTkOW19cMl/t9SIEyiGCRFhHR5D Twd3LrtrHICHuFYKljsa+iVV+hFFYEI2DXYCf/BJPck0X727I4xYwpczptlMf5e6Dk3CiwXvaYxO3 s/HDUXKow==; Received: from willy by bombadil.infradead.org with local (Exim 4.92.3 #3 (Red Hat Linux)) id 1iVpqY-00079E-6F; Sat, 16 Nov 2019 04:38:06 +0000 Date: Fri, 15 Nov 2019 20:38:06 -0800 From: Matthew Wilcox To: Alex Shi Cc: cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org, mgorman@techsingularity.net, tj@kernel.org, hughd@google.com, khlebnikov@yandex-team.ru, daniel.m.jordan@oracle.com, yang.shi@linux.alibaba.com, Johannes Weiner , Michal Hocko , Vladimir Davydov , Roman Gushchin , Shakeel Butt , Chris Down , Thomas Gleixner , Vlastimil Babka , Qian Cai , Andrey Ryabinin , "Kirill A. Shutemov" , =?iso-8859-1?B?Suly9G1l?= Glisse , Andrea Arcangeli , David Rientjes , "Aneesh Kumar K.V" , swkhack , "Potyra, Stefan" , Mike Rapoport , Stephen Rothwell , Colin Ian King , Jason Gunthorpe , Mauro Carvalho Chehab , Peng Fan , Nikolay Borisov , Ira Weiny , Kirill Tkhai , Yafang Shao Subject: Re: [PATCH v3 3/7] mm/lru: replace pgdat lru_lock with lruvec lock Message-ID: <20191116043806.GD20752@bombadil.infradead.org> References: <1573874106-23802-1-git-send-email-alex.shi@linux.alibaba.com> <1573874106-23802-4-git-send-email-alex.shi@linux.alibaba.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1573874106-23802-4-git-send-email-alex.shi@linux.alibaba.com> User-Agent: Mutt/1.12.1 (2019-06-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Nov 16, 2019 at 11:15:02AM +0800, Alex Shi wrote: > This is the main patch to replace per node lru_lock with per memcg > lruvec lock. It also fold the irqsave flags into lruvec. I have to say, I don't love the part where we fold the irqsave flags into the lruvec. I know it saves us an argument, but it opens up the possibility of mismatched expectations. eg we currently have: static void __split_huge_page(struct page *page, struct list_head *list, struct lruvec *lruvec, pgoff_t end) { ... spin_unlock_irqrestore(&lruvec->lru_lock, lruvec->irqflags); so if we introduce a new caller, we have to be certain that this caller is also using lock_page_lruvec_irqsave() and not lock_page_lruvec_irq(). I can't think of a way to make the compiler enforce that, and if we don't, then we can get some odd crashes with interrupts being unexpectedly enabled or disabled, depending on how ->irqflags was used last. So it makes the code more subtle. And that's not a good thing. > +static inline struct lruvec *lock_page_lruvec_irq(struct page *page, > + struct pglist_data *pgdat) > +{ > + struct lruvec *lruvec = mem_cgroup_page_lruvec(page, pgdat); > + > + spin_lock_irq(&lruvec->lru_lock); > + > + return lruvec; > +} ... > +static struct lruvec *lock_page_lru(struct page *page, int *isolated) > { > pg_data_t *pgdat = page_pgdat(page); > + struct lruvec *lruvec = lock_page_lruvec_irq(page, pgdat); > > - spin_lock_irq(&pgdat->lru_lock); > if (PageLRU(page)) { > - struct lruvec *lruvec; > > - lruvec = mem_cgroup_page_lruvec(page, pgdat); > ClearPageLRU(page); > del_page_from_lru_list(page, lruvec, page_lru(page)); > *isolated = 1; > } else > *isolated = 0; > + > + return lruvec; > } But what if the page is !PageLRU? What lruvec did we just lock? According to the comments on mem_cgroup_page_lruvec(), * This function is only safe when following the LRU page isolation * and putback protocol: the LRU lock must be held, and the page must * either be PageLRU() or the caller must have isolated/allocated it. and now it's being called in order to find out which LRU lock to take. So this comment needs to be updated, if it's wrong, or this patch has a race.