Received: by 2002:a25:e74b:0:0:0:0:0 with SMTP id e72csp1543873ybh; Mon, 20 Jul 2020 00:30:54 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwQBYgVrZu+Qfk19Dz4osZEE1TBTQ2zyx/f+rGF4WDTHdTX5b9cqe+8RRRVGKxOcC317zcm X-Received: by 2002:aa7:da89:: with SMTP id q9mr20637859eds.273.1595230254281; Mon, 20 Jul 2020 00:30:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1595230254; cv=none; d=google.com; s=arc-20160816; b=DOcevO9ogZF78o1MMPSKf/hVrtDHb1L5ajARVsY0c6ozE10JZT7gc3ya1FIhBU4g07 1vGeE6byBundRWMste+NQkiPRzxNNMevcCxOj/wvGqPsAe+laZRSRBsTun+Vl8oMWx0j Me2c5SnEDxPMc+ZgIRIYP/6Y/FnSbTFtuScuPqa/7uvl6LkBONRfcQxl1FCc7r368Me3 /B4tycNPj5FsUdGC2P3iy1Ma9hXY3P8zmOYjuglgfUHibxB4UA6h5Wsd4cFkzP/An2XB rvXFq+RaxCTBC2RWgGmdlvEDqrI0Y2+nAUEdm5QgsUZctS68cFjiszSX+VBqIf/2SHVt PzwQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:in-reply-to :mime-version:user-agent:date:message-id:references:to:from:subject; bh=jpCZCEjiaKIl0Mrrmo8hVNR566ttKl0lDMRYbSaw2F0=; b=HCSRsPPSmBsktmrKtA4CU1VaPzYObIzzgBh/Y1Kc+084fIkhqN6QJaGV/e9Yya8yRd UOYGz7CFPP4OftwS4OhPsVCUknVCt/BymFrn44jyG3y8yym97Z/TU19Jz4C9NARMoMOm c3Z6rafIGlvE3khpB4eBE3abht0O8Cm+EtU8wZwsrHsnIpT7CsduGjtQvsCgi0JRsNID UM58QJShNg/vNUeKjLp0rnEgfBEJRitFeyUFhlkM17kP1ICSOcT1qQAr0/GnK9ka4kuS QedkGtKszg4VwIlbzJg5c3pRwBggX262p3bvI3C2khuEetS5u+hFrbp8LUaNiMGETK32 8SLw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id h22si8831863eja.596.2020.07.20.00.30.31; Mon, 20 Jul 2020 00:30:54 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726611AbgGTHaW (ORCPT + 99 others); Mon, 20 Jul 2020 03:30:22 -0400 Received: from out30-131.freemail.mail.aliyun.com ([115.124.30.131]:60734 "EHLO out30-131.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725815AbgGTHaV (ORCPT ); Mon, 20 Jul 2020 03:30:21 -0400 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R221e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e01358;MF=alex.shi@linux.alibaba.com;NM=1;PH=DS;RN=17;SR=0;TI=SMTPD_---0U3DnCGw_1595230215; Received: from IT-FVFX43SYHV2H.local(mailfrom:alex.shi@linux.alibaba.com fp:SMTPD_---0U3DnCGw_1595230215) by smtp.aliyun-inc.com(127.0.0.1); Mon, 20 Jul 2020 15:30:16 +0800 Subject: Re: [PATCH v16 00/22] per memcg lru_lock From: Alex Shi To: akpm@linux-foundation.org, mgorman@techsingularity.net, tj@kernel.org, hughd@google.com, khlebnikov@yandex-team.ru, daniel.m.jordan@oracle.com, yang.shi@linux.alibaba.com, willy@infradead.org, hannes@cmpxchg.org, lkp@intel.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, shakeelb@google.com, iamjoonsoo.kim@lge.com, richard.weiyang@gmail.com, kirill@shutemov.name References: <1594429136-20002-1-git-send-email-alex.shi@linux.alibaba.com> Message-ID: <04a2ad99-5d88-a300-6430-7a8da0946f04@linux.alibaba.com> Date: Mon, 20 Jul 2020 15:30:14 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:68.0) Gecko/20100101 Thunderbird/68.7.0 MIME-Version: 1.0 In-Reply-To: <1594429136-20002-1-git-send-email-alex.shi@linux.alibaba.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org I am preparing/testing the patch v17 according comments from Hugh Dickins and Alexander Duyck. Many thanks for line by line review and patient suggestion! Please drop me any more comments or concern of any patches! Thanks a lot! Alex 在 2020/7/11 上午8:58, Alex Shi 写道: > The new version which bases on v5.8-rc4. Add 2 more patchs: > 'mm/thp: remove code path which never got into' > 'mm/thp: add tail pages into lru anyway in split_huge_page()' > and modified 'mm/mlock: reorder isolation sequence during munlock' > > Current lru_lock is one for each of node, pgdat->lru_lock, that guard for > lru lists, but now we had moved the lru lists into memcg for long time. Still > using per node lru_lock is clearly unscalable, pages on each of memcgs have > to compete each others for a whole lru_lock. This patchset try to use per > lruvec/memcg lru_lock to repleace per node lru lock to guard lru lists, make > it scalable for memcgs and get performance gain. > > Currently lru_lock still guards both lru list and page's lru bit, that's ok. > but if we want to use specific lruvec lock on the page, we need to pin down > the page's lruvec/memcg during locking. Just taking lruvec lock first may be > undermined by the page's memcg charge/migration. To fix this problem, we could > take out the page's lru bit clear and use it as pin down action to block the > memcg changes. That's the reason for new atomic func TestClearPageLRU. > So now isolating a page need both actions: TestClearPageLRU and hold the > lru_lock. > > The typical usage of this is isolate_migratepages_block() in compaction.c > we have to take lru bit before lru lock, that serialized the page isolation > in memcg page charge/migration which will change page's lruvec and new > lru_lock in it. > > The above solution suggested by Johannes Weiner, and based on his new memcg > charge path, then have this patchset. (Hugh Dickins tested and contributed much > code from compaction fix to general code polish, thanks a lot!). > > The patchset includes 3 parts: > 1, some code cleanup and minimum optimization as a preparation. > 2, use TestCleanPageLRU as page isolation's precondition > 3, replace per node lru_lock with per memcg per node lru_lock > > Following Daniel Jordan's suggestion, I have run 208 'dd' with on 104 > containers on a 2s * 26cores * HT box with a modefied case: > https://git.kernel.org/pub/scm/linux/kernel/git/wfg/vm-scalability.git/tree/case-lru-file-readtwice > With this patchset, the readtwice performance increased about 80% > in concurrent containers. > > Thanks Hugh Dickins and Konstantin Khlebnikov, they both brought this > idea 8 years ago, and others who give comments as well: Daniel Jordan, > Mel Gorman, Shakeel Butt, Matthew Wilcox etc. > > Thanks for Testing support from Intel 0day and Rong Chen, Fengguang Wu, > and Yun Wang. Hugh Dickins also shared his kbuild-swap case. Thanks!