Received: by 2002:a05:6a10:16a7:0:0:0:0 with SMTP id gp39csp1515231pxb; Wed, 4 Nov 2020 11:00:43 -0800 (PST) X-Google-Smtp-Source: ABdhPJxVvZXhEjS56o/LW3msBAvley1K9mvXYEfTIv1Getbs/cH1SrbZ+KaHgOKDi8r5mOjHryPf X-Received: by 2002:a05:6402:4cb:: with SMTP id n11mr28159086edw.296.1604516443200; Wed, 04 Nov 2020 11:00:43 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1604516443; cv=none; d=google.com; s=arc-20160816; b=C3I5CWcPMiHXruzMJc9wbrTyMSzidtCisELE39wAnon2W2eecAJWZA0X2dEm+G17YI mhFhGUnhFoh3rjqNJ6MMnAPuWAwDb3nsb3YZ2s4qjnpkMBp4NzsFH2rr/IN3k2fL2jhA gdkiQHuqWQdPB39qCN+JqNfOkzDVkwayFGRpIiD9ZmEDhIa4PrgJHYv7TnGFRUMegeaG H32cg49Q2zsI4wBwJqAwnEkcmGeqoHoIlNeU1efmGWgOgCqba85Ew6VgLq4TAEog8BPc DWjal4bFUjO0vOD1FYYcGvFg7YrmAsaZl1niQuklYiSdcpR2682oNNNt3V1Dc5KRAdpq A3Zg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=7p9nMEZ9CzGNR2nnoc/NmQ6U82GjuP7yX7DwiANzfMc=; b=U1SNqaxUzznRcpK7kLfHK2DIpSqJ44GYWUSVX0twxEeCjhPCq7j9YlzaDGKfzDHIrg zsi4Z10jJhRQJP0DsE+xNtUMSJRDmfDpk5ntyBTdOrWLNyO1SkF99+bZqNRN/WiXHunz MNzklI/AhlB2v/juVhb6Ut8UDoCPuTxdoQyU/ilkLE9uGbkYCQocXKIYvB2TLKVu8rZh /3fIupf5zJELDTW9qfyG47uozPK+ORZfi/0CJjhMB+Lhy8oEEMsaNM8k6dBXar7fiIKi Cui4UAK1MpDOYfKleo+lagq0ex+dtw/6c9FqvMnEDo99TaMvYanCSYnCfDwPTLCxV0+M dc9w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cmpxchg-org.20150623.gappssmtp.com header.s=20150623 header.b=rli6FwsW; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id bz26si1926644ejc.635.2020.11.04.11.00.19; Wed, 04 Nov 2020 11:00:43 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@cmpxchg-org.20150623.gappssmtp.com header.s=20150623 header.b=rli6FwsW; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732118AbgKDRru (ORCPT + 99 others); Wed, 4 Nov 2020 12:47:50 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48862 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727851AbgKDRru (ORCPT ); Wed, 4 Nov 2020 12:47:50 -0500 Received: from mail-qk1-x742.google.com (mail-qk1-x742.google.com [IPv6:2607:f8b0:4864:20::742]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D0638C0613D4 for ; Wed, 4 Nov 2020 09:47:49 -0800 (PST) Received: by mail-qk1-x742.google.com with SMTP id b18so20072673qkc.9 for ; Wed, 04 Nov 2020 09:47:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to; bh=7p9nMEZ9CzGNR2nnoc/NmQ6U82GjuP7yX7DwiANzfMc=; b=rli6FwsWHB6cip/0jF9V7BsRWZHrrjIg7apvuKsE1Kge0E1dkDcIwIFohwKtLUNDAk wz/XglDjWFEulLaAfpTOkI/0Uc4V5g+ashXdnCpUU5Y67NkbPnjYcHowLSGV3qpV1sm8 PXa4wmOaVMSbRNKZO03cdu+YlTVXPEjInI04LrO59b8qeBsErr+oo5tT/Kqu6AQo2ZRk 65sCqy2Xavh+oMTSK1yv/QuMP9HCT8mS1vrN5D4TUL9Ved1CVkPbN6xpb49UcVVNb2VC YNORrzogpHxam/vez2HbOF4GRctTlLzbSWFf3SULmRcwcG/9bFvtEV/CyaxRuImJqqNG bpXA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to; bh=7p9nMEZ9CzGNR2nnoc/NmQ6U82GjuP7yX7DwiANzfMc=; b=NnLl6OU9etMCTaCDBTOkbm7MgfwJTtvu/yz0Lk6PTizrSV86y/Xm7a+hiDjbA2J4f6 h+14iHwoatD2fx8lDsh2GPhzUkW90tKCW1Bm4W5GV95IerlVrrlVTVQlOh31l82N3GMC YiubJoH/BkkJghgV4VHjM0hK+oaRjlMXoEMMUBADTekupopipH06xwbpOeOotL8nT7Ko PjaqpmCmy+atLSfIHLCItTPhK1QJhDVI/vUnthofXOh1hxcViyMiqK7MzPyeEd1L/S93 MlYCUQPIvUmpZE7a3LutEr3nuOYznwqFQoFgreVmU4eeiX0Ki7cJiFOHkITsF2S0OQEr sMgw== X-Gm-Message-State: AOAM530S+o7M4axlmIX/gR90Dz9RsYs3IDjYhqenRQ9I7jwa/sYa+PUs kZr0z9SDOZOmYCzWCjbCbYBs+g== X-Received: by 2002:a37:8542:: with SMTP id h63mr27249718qkd.102.1604512068975; Wed, 04 Nov 2020 09:47:48 -0800 (PST) Received: from localhost ([2620:10d:c091:480::1:9e9e]) by smtp.gmail.com with ESMTPSA id d145sm2908052qke.83.2020.11.04.09.47.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 04 Nov 2020 09:47:48 -0800 (PST) Date: Wed, 4 Nov 2020 12:46:03 -0500 From: Johannes Weiner To: Alex Shi Cc: Matthew Wilcox , akpm@linux-foundation.org, mgorman@techsingularity.net, tj@kernel.org, hughd@google.com, khlebnikov@yandex-team.ru, daniel.m.jordan@oracle.com, lkp@intel.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, shakeelb@google.com, iamjoonsoo.kim@lge.com, richard.weiyang@gmail.com, kirill@shutemov.name, alexander.duyck@gmail.com, rong.a.chen@intel.com, mhocko@suse.com, vdavydov.dev@gmail.com, shy828301@gmail.com, Vlastimil Babka , Minchan Kim Subject: Re: [PATCH v20 08/20] mm: page_idle_get_page() does not need lru_lock Message-ID: <20201104174603.GB744831@cmpxchg.org> References: <1603968305-8026-1-git-send-email-alex.shi@linux.alibaba.com> <1603968305-8026-9-git-send-email-alex.shi@linux.alibaba.com> <20201102144110.GB724984@cmpxchg.org> <20201102144927.GN27442@casper.infradead.org> <20201102202003.GA740958@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Nov 04, 2020 at 07:27:21PM +0800, Alex Shi wrote: > 在 2020/11/3 上午4:20, Johannes Weiner 写道: > > On Mon, Nov 02, 2020 at 02:49:27PM +0000, Matthew Wilcox wrote: > >> On Mon, Nov 02, 2020 at 09:41:10AM -0500, Johannes Weiner wrote: > >>> On Thu, Oct 29, 2020 at 06:44:53PM +0800, Alex Shi wrote: > >>>> From: Hugh Dickins > >>>> > >>>> It is necessary for page_idle_get_page() to recheck PageLRU() after > >>>> get_page_unless_zero(), but holding lru_lock around that serves no > >>>> useful purpose, and adds to lru_lock contention: delete it. > >>>> > >>>> See https://lore.kernel.org/lkml/20150504031722.GA2768@blaptop for the > >>>> discussion that led to lru_lock there; but __page_set_anon_rmap() now > >>>> uses WRITE_ONCE(), > >>> > >>> That doesn't seem to be the case in Linus's or Andrew's tree. Am I > >>> missing a dependent patch series? > >>> > >>>> and I see no other risk in page_idle_clear_pte_refs() using > >>>> rmap_walk() (beyond the risk of racing PageAnon->PageKsm, mostly but > >>>> not entirely prevented by page_count() check in ksm.c's > >>>> write_protect_page(): that risk being shared with page_referenced() > >>>> and not helped by lru_lock). > >>> > >>> Isn't it possible, as per Minchan's description, for page->mapping to > >>> point to a struct anon_vma without PAGE_MAPPING_ANON set, and rmap > >>> thinking it's looking at a struct address_space? > >> > >> I don't think it can point to an anon_vma without the ANON bit set. > >> Minchan's concern in that email was that it might still be NULL. > > > > Hm, no, the thread is a lengthy discussion about whether the store > > could be split such that page->mapping is actually pointing to > > something invalid (anon_vma without the PageAnon bit). > > > > From his email: > > > > CPU 0 CPU 1 > > > > do_anonymous_page > > __page_set_anon_rmap > > /* out of order happened so SetPageLRU is done ahead */ > > SetPageLRU(page) > > This SetPageLRU done in __pagevec_lru_add_fn() which under the lru_lock > protection, so the original memory barrier or lock concern isn't > correct. that means, the SetPageLRU isn't possible to be here. > And then no warry on right side 'CPU 1' problem. The SetPageLRU is done under lru_lock, but the store to page->mapping is not, so what ensures ordering between them? And what prevents the compiler from tearing the store to page->mapping? The writer does this: CPU 0 page_add_new_anon_rmap() page->mapping = anon_vma + PAGE_MAPPING_ANON lru_cache_add_inactive_or_unevictable() spin_lock(lruvec->lock) SetPageLRU() spin_unlock(lruvec->lock) The concern is what CPU 1 will observe at page->mapping after observing PageLRU set on the page. 1. anon_vma + PAGE_MAPPING_ANON That's the in-order scenario and is fine. 2. NULL That's possible if the page->mapping store gets reordered to occur after SetPageLRU. That's fine too because we check for it. 3. anon_vma without the PAGE_MAPPING_ANON bit That would be a problem and could lead to all kinds of undesirable behavior including crashes and data corruption. Is it possible? AFAICT the compiler is allowed to tear the store to page->mapping and I don't see anything that would prevent it. That said, I also don't see how the reader testing PageLRU under the lru_lock would prevent that in the first place. AFAICT we need that WRITE_ONCE() around the page->mapping assignment that's referenced in the changelog of this patch but I cannot find in any tree or patch.