Received: by 2002:a05:6a10:16a7:0:0:0:0 with SMTP id gp39csp358995pxb; Wed, 11 Nov 2020 05:38:31 -0800 (PST) X-Google-Smtp-Source: ABdhPJyfnDgZ+ZxzdEByxJQlHAEOU4Bxg6L3Cas/DC5lhZb5oVRHWg2v2Z+KAbdToFMq/QOqaowV X-Received: by 2002:a17:906:4bcb:: with SMTP id x11mr2556933ejv.538.1605101910866; Wed, 11 Nov 2020 05:38:30 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1605101910; cv=none; d=google.com; s=arc-20160816; b=c1CiBmpNL2htOWhxgo3hfE//XRTr49QAyGMnYNS35Kj6Utd+LcaI81I96LQ6jFLJgh VQzeu0u2UnDYTU72Qs00AFhpba/ST6WStoUyQ0HodsjvCGPH9rkmF8f23miESBFRam1f DI4aqJNDKcOrUohJxb3FZrnV6uKU/7i0Sr60l4Gh4F1dpCdXJ5WrMIfAwL/n6EXB4ren yEGGZI7rmBy/zUWpitlHJypseZb+tGLaALlFBL9cNZp1hLbq1+kheIgmSfknQzz95ki9 o3b6X08yGqoJ1XiIlEYxIudYbrHYJuOlM4qvtVXQ7mweozUYelL0ROxZUnVTbJpwmFEP TQQA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:subject:from :references:cc:to; bh=JUK+BDmiO1AvzxuRPiPOCwLUcB88LPB2zkRZwJpJAB8=; b=JqEqBQ2ZJLpEsNeozbzFLHi5GlMWiXitOORWKFzaLosfGKstk++P5R4wghRaeGzOWz aIAFmgnaXV/kbjaktKCwXAGP2eRH/6QHa+Pr2TSjF6MXJAhCXlUXf06P+rimnJDG5vl2 w0NzqwcJgUZ/qNsBkIfsBehPHqB4EBv4Mvo2LFbRxioifnfAu3O8DnuZMHuKvR/2YwDl HosFMFrKs6pgB2gstos9kNGnkCoEe6zcfGJVTsAARr+HW3obhO4rfKlG72twIlftyo4S 3VBEEp/+tH8aWGrBJDIG316cmyu4u0LybphVyBA1EQIWuSltVvYkyMxAkJUZ6Wl71OXH gJgA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id t9si1334864ejj.143.2020.11.11.05.38.07; Wed, 11 Nov 2020 05:38:30 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727103AbgKKNgn (ORCPT + 99 others); Wed, 11 Nov 2020 08:36:43 -0500 Received: from mx2.suse.de ([195.135.220.15]:51348 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726756AbgKKNgU (ORCPT ); Wed, 11 Nov 2020 08:36:20 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 3461AAD77; Wed, 11 Nov 2020 13:36:19 +0000 (UTC) To: Alex Shi , akpm@linux-foundation.org, mgorman@techsingularity.net, tj@kernel.org, hughd@google.com, khlebnikov@yandex-team.ru, daniel.m.jordan@oracle.com, willy@infradead.org, hannes@cmpxchg.org, lkp@intel.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, shakeelb@google.com, iamjoonsoo.kim@lge.com, richard.weiyang@gmail.com, kirill@shutemov.name, alexander.duyck@gmail.com, rong.a.chen@intel.com, mhocko@suse.com, vdavydov.dev@gmail.com, shy828301@gmail.com Cc: Michal Hocko References: <1604566549-62481-1-git-send-email-alex.shi@linux.alibaba.com> <1604566549-62481-15-git-send-email-alex.shi@linux.alibaba.com> From: Vlastimil Babka Subject: Re: [PATCH v21 14/19] mm/lru: introduce TestClearPageLRU Message-ID: Date: Wed, 11 Nov 2020 14:36:16 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.4.0 MIME-Version: 1.0 In-Reply-To: <1604566549-62481-15-git-send-email-alex.shi@linux.alibaba.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/5/20 9:55 AM, Alex Shi wrote: > Currently lru_lock still guards both lru list and page's lru bit, that's > ok. but if we want to use specific lruvec lock on the page, we need to > pin down the page's lruvec/memcg during locking. Just taking lruvec > lock first may be undermined by the page's memcg charge/migration. To > fix this problem, we will clear the lru bit out of locking and use > it as pin down action to block the page isolation in memcg changing. > > So now a standard steps of page isolation is following: > 1, get_page(); #pin the page avoid to be free > 2, TestClearPageLRU(); #block other isolation like memcg change > 3, spin_lock on lru_lock; #serialize lru list access > 4, delete page from lru list; > > This patch start with the first part: TestClearPageLRU, which combines > PageLRU check and ClearPageLRU into a macro func TestClearPageLRU. This > function will be used as page isolation precondition to prevent other > isolations some where else. Then there are may !PageLRU page on lru > list, need to remove BUG() checking accordingly. As there now may be !PageLRU pages on lru list, we need to ... > > There 2 rules for lru bit now: > 1, the lru bit still indicate if a page on lru list, just in some > temporary moment(isolating), the page may have no lru bit when > it's on lru list. but the page still must be on lru list when the > lru bit set. > 2, have to remove lru bit before delete it from lru list. 2. we have to remove the lru bit before deleting page from lru list > > As Andrew Morton mentioned this change would dirty cacheline for page > isn't on LRU. But the lost would be acceptable in Rong Chen > report: > https://lore.kernel.org/lkml/20200304090301.GB5972@shao2-debian/ AFAIK these places generally expect PageLRU to be true, and if it's false, it's because of a race, so that effect should be negligible? > Suggested-by: Johannes Weiner > Signed-off-by: Alex Shi > Acked-by: Hugh Dickins > Acked-by: Johannes Weiner > Cc: Hugh Dickins > Cc: Johannes Weiner > Cc: Michal Hocko > Cc: Vladimir Davydov > Cc: Andrew Morton > Cc: linux-kernel@vger.kernel.org > Cc: cgroups@vger.kernel.org > Cc: linux-mm@kvack.org > --- ... > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -1542,7 +1542,7 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone, > */ > int __isolate_lru_page(struct page *page, isolate_mode_t mode) > { > - int ret = -EINVAL; > + int ret = -EBUSY; > > /* Only take pages on the LRU. */ > if (!PageLRU(page)) > @@ -1552,8 +1552,6 @@ int __isolate_lru_page(struct page *page, isolate_mode_t mode) > if (PageUnevictable(page) && !(mode & ISOLATE_UNEVICTABLE)) > return ret; > > - ret = -EBUSY; I'm not sure why this change is here, looks unrelated to the patch? Oh I see, you want to prevent the BUG() in isolate_lru_pages(). But due to that, the PageUnevictable check was also affected unintentionally. But I don't think it's that important to BUG() when we run into PageUnevictable unexpectedly, so that's probably ok. But with that, we can just make __isolate_lru_page() a bool function and remove the ugly switch in isolate_lru_pages()? > - > /* > * To minimise LRU disruption, the caller can indicate that it only > * wants to isolate pages it will be able to operate on without > @@ -1600,8 +1598,10 @@ int __isolate_lru_page(struct page *page, isolate_mode_t mode) > * sure the page is not being freed elsewhere -- the > * page release code relies on it. > */ > - ClearPageLRU(page); > - ret = 0; > + if (TestClearPageLRU(page)) > + ret = 0; > + else > + put_page(page); > } > > return ret; > @@ -1667,8 +1667,6 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan, > page = lru_to_page(src); > prefetchw_prev_lru_page(page, src, flags); > > - VM_BUG_ON_PAGE(!PageLRU(page), page); > - > nr_pages = compound_nr(page); > total_scan += nr_pages; > > @@ -1765,21 +1763,18 @@ int isolate_lru_page(struct page *page) > VM_BUG_ON_PAGE(!page_count(page), page); > WARN_RATELIMIT(PageTail(page), "trying to isolate tail page"); > > - if (PageLRU(page)) { > + if (TestClearPageLRU(page)) { > pg_data_t *pgdat = page_pgdat(page); > struct lruvec *lruvec; > > - spin_lock_irq(&pgdat->lru_lock); > + get_page(page); > lruvec = mem_cgroup_page_lruvec(page, pgdat); > - if (PageLRU(page)) { > - int lru = page_lru(page); > - get_page(page); > - ClearPageLRU(page); > - del_page_from_lru_list(page, lruvec, lru); > - ret = 0; > - } > + spin_lock_irq(&pgdat->lru_lock); > + del_page_from_lru_list(page, lruvec, page_lru(page)); > spin_unlock_irq(&pgdat->lru_lock); > + ret = 0; > } > + > return ret; > } > > @@ -4293,6 +4288,10 @@ void check_move_unevictable_pages(struct pagevec *pvec) > nr_pages = thp_nr_pages(page); > pgscanned += nr_pages; > > + /* block memcg migration during page moving between lru */ > + if (!TestClearPageLRU(page)) > + continue; > + > if (pagepgdat != pgdat) { > if (pgdat) > spin_unlock_irq(&pgdat->lru_lock); > @@ -4301,10 +4300,7 @@ void check_move_unevictable_pages(struct pagevec *pvec) > } > lruvec = mem_cgroup_page_lruvec(page, pgdat); > > - if (!PageLRU(page) || !PageUnevictable(page)) > - continue; > - > - if (page_evictable(page)) { > + if (page_evictable(page) && PageUnevictable(page)) { Doing PageUnevictable() test first should be cheaper? > enum lru_list lru = page_lru_base_type(page); > > VM_BUG_ON_PAGE(PageActive(page), page); > @@ -4313,12 +4309,15 @@ void check_move_unevictable_pages(struct pagevec *pvec) > add_page_to_lru_list(page, lruvec, lru); > pgrescued += nr_pages; > } > + SetPageLRU(page); > } > > if (pgdat) { > __count_vm_events(UNEVICTABLE_PGRESCUED, pgrescued); > __count_vm_events(UNEVICTABLE_PGSCANNED, pgscanned); > spin_unlock_irq(&pgdat->lru_lock); > + } else if (pgscanned) { > + count_vm_events(UNEVICTABLE_PGSCANNED, pgscanned); > } > } > EXPORT_SYMBOL_GPL(check_move_unevictable_pages); >