Received: by 2002:ab2:6d45:0:b0:1fb:d597:ff75 with SMTP id d5csp231415lqr; Wed, 5 Jun 2024 04:38:16 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCW5sqn1gAbaRnvpO+1/0A03LnCj6kNxXkLIJo7JMMuNINpljkQyS8uupDUQABtfAu0+7T/tFpuxOmMyzZzVunVC1MZPUZuIqO3s+WkiXw== X-Google-Smtp-Source: AGHT+IHuuVMoNI7theuHwIVjQlXaXQf9KPlZ1yqWlFEbSSnYMDrkjOoD/PE15TYTFq2+z1EouWKW X-Received: by 2002:a05:620a:388a:b0:794:afc6:94b0 with SMTP id af79cd13be357-79523d772camr197356585a.42.1717587496767; Wed, 05 Jun 2024 04:38:16 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1717587496; cv=pass; d=google.com; s=arc-20160816; b=ksajiX27ibVlJOESySi2V5rQNrm8E7l+7d/iCPBZIIeP1H775yrGqpIlNqE7z3Dqle Uy0w7gEkQEm7mtHhuBvag02Urwm9XJQm+z47ry4dZgkCxTL0IBcbf5bL/6GjCylThL5t I5oruirasyE1+p60U6iTkqmv8y1j873JGUPNtWedd+gT4Wxw8+XVSYCLZHN2TFu/q3Wa YtDHj9YlU6BU+fnX14TlcGHFwjWeSiGnQ68hpew+6F2zKAZhSsEI8NFDRJ5RhsZqBn8O WeV60FYZJ9CifLo1dCIj62WREHQKjA4cXBk51Wu2nN964K694Tr62jcIhO8aDQqPqUs0 lMSg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:in-reply-to:from:references:cc:to:subject :user-agent:mime-version:list-unsubscribe:list-subscribe:list-id :precedence:date:message-id:dkim-signature; bh=GQClWKdIZ+BJLdneee1gIM/rhZDMeoYKUCGbcbAUpmA=; fh=FgNt0UXmisLNxFSQRUGjld9ESMOWO94bkCafxeRqwL4=; b=c3vjddNet4TOvut+g8YfXE/Fqsp7PjejJLAljq/Lcpv5FLXXSV0DEpBFE9fSzWc9it cz+Hctw4YbpZBBcg4Nwm7eUXtcmlPdBcb/i9U+S1eATBLlcjc6sXIJHtTaltVn55uT5Q ADUYXJ4O94O+rhGxQlNIfJUlz8JrPWupGjKzhRN3uZc4zr3PnYlPiO8kZlMdiDzAKGPP E6CXHIEm5whgJBflBg3OUdLllljke4zlN+7f0n6G1Qcpxkl28NGlIQI1GAv9BJyFJiH1 qHXQuXiKk7QNZUkpB4n2Kp9VlbIM+TuO9Z4qbxc4BDsQFX8BaiBIqfPFbtCytZYEYfo1 7+XA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@linux.alibaba.com header.s=default header.b=fXXDREDu; arc=pass (i=1 spf=pass spfdomain=linux.alibaba.com dkim=pass dkdomain=linux.alibaba.com dmarc=pass fromdomain=linux.alibaba.com); spf=pass (google.com: domain of linux-kernel+bounces-202268-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-202268-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.alibaba.com Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id d75a77b69052e-43ff259c71bsi29624951cf.570.2024.06.05.04.38.16 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 05 Jun 2024 04:38:16 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-202268-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@linux.alibaba.com header.s=default header.b=fXXDREDu; arc=pass (i=1 spf=pass spfdomain=linux.alibaba.com dkim=pass dkdomain=linux.alibaba.com dmarc=pass fromdomain=linux.alibaba.com); spf=pass (google.com: domain of linux-kernel+bounces-202268-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-202268-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.alibaba.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 311411C21B3F for ; Wed, 5 Jun 2024 11:38:16 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 738B614D6F8; Wed, 5 Jun 2024 11:38:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="fXXDREDu" Received: from out30-124.freemail.mail.aliyun.com (out30-124.freemail.mail.aliyun.com [115.124.30.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0CD7014D44D for ; Wed, 5 Jun 2024 11:38:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.124 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717587488; cv=none; b=j9HTkgoSCy7qXvBvtcrjac10Ox4cMETyfTUcwewZRQPGzqBC7rOyhflYBXuH/B63yjdLQXS50+TG95fSiKN2kQnc/N7zagwawklNJRkB6SfZMWtPCtpkkWZYqraszFnzhBiJxMatfg04ivsYelpCEcfMoFKVZnwHAQCDdfaPipc= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717587488; c=relaxed/simple; bh=s58dkB+y+5nc3n6LHhvCXb8auxIBJEAPQ/VL7Kyjni8=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=nuBs/iQsyf3G2Z2Pb1DyotfYH+jZtjiwGDxZMjf6AwT2WL3R4OF4gyNZKMvXXthnfQuFhaME75FZ/NSKce2uFuHyWDp4ohv9LwgGN6FWJwd6qWFulwzclNYKzMrC+zmWZsvps12l8pL9xpooASQZfri5G23ycmURbr1skVRoWbc= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=fXXDREDu; arc=none smtp.client-ip=115.124.30.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com DKIM-Signature:v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1717587477; h=Message-ID:Date:MIME-Version:Subject:To:From:Content-Type; bh=GQClWKdIZ+BJLdneee1gIM/rhZDMeoYKUCGbcbAUpmA=; b=fXXDREDuQdHjR97hdequiylY76V9zmlgKFJiqA/j+pfbRBkzoaRZZvNWegdnVy/o0m7Hbz2OzzihBkfZCRDpY4w0XQq4U3Z/3TYsL4wYbTTEPqoPFIPKCifK2zVKlC6va5LCmrLa2VQDqr8Kt/4zbSOMtlDi2BrLXKf9uBvpziE= X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R161e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=maildocker-contentspam033068173054;MF=baolin.wang@linux.alibaba.com;NM=1;PH=DS;RN=7;SR=0;TI=SMTPD_---0W7v6r4V_1717587475; Received: from 30.97.56.61(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0W7v6r4V_1717587475) by smtp.aliyun-inc.com; Wed, 05 Jun 2024 19:37:56 +0800 Message-ID: <48150a28-ed48-49ff-9432-9cd30cda4da4@linux.alibaba.com> Date: Wed, 5 Jun 2024 19:37:55 +0800 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] mm/gup: don't check page lru flag before draining it To: David Hildenbrand , yangge1116 , akpm@linux-foundation.org, Matthew Wilcox Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, liuzixing@hygon.cn References: <1717498121-20926-1-git-send-email-yangge1116@126.com> <0d7a4405-9a2e-4bd1-ba89-a31486155233@redhat.com> <776de760-e817-43b2-bd00-8ce96f4e37a8@redhat.com> <7063920f-963a-4b3e-a3f3-c5cc227bc877@redhat.com> From: Baolin Wang In-Reply-To: <7063920f-963a-4b3e-a3f3-c5cc227bc877@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit On 2024/6/5 17:53, David Hildenbrand wrote: > On 05.06.24 11:41, David Hildenbrand wrote: >> On 05.06.24 03:18, yangge1116 wrote: >>> >>> >>> 在 2024/6/4 下午9:47, David Hildenbrand 写道: >>>> On 04.06.24 12:48, yangge1116@126.com wrote: >>>>> From: yangge >>>>> >>>>> If a page is added in pagevec, its ref count increases one, remove >>>>> the page from pagevec decreases one. Page migration requires the >>>>> page is not referenced by others except page mapping. Before >>>>> migrating a page, we should try to drain the page from pagevec in >>>>> case the page is in it, however, folio_test_lru() is not sufficient >>>>> to tell whether the page is in pagevec or not, if the page is in >>>>> pagevec, the migration will fail. >>>>> >>>>> Remove the condition and drain lru once to ensure the page is not >>>>> referenced by pagevec. >>>> >>>> What you are saying is that we might have a page on which >>>> folio_test_lru() succeeds, that was added to one of the cpu_fbatches, >>>> correct? >>> >>> Yes >>> >>>> >>>> Can you describe under which circumstances that happens? >>>> >>> >>> If we call folio_activate() to move a page from inactive LRU list to >>> active LRU list, the page is not only in LRU list, but also in one of >>> the cpu_fbatches. >>> >>> void folio_activate(struct folio *folio) >>> { >>>        if (folio_test_lru(folio) && !folio_test_active(folio) && >>>            !folio_test_unevictable(folio)) { >>>            struct folio_batch *fbatch; >>> >>>            folio_get(folio); >>>            //After this, folio is in LRU list, and its ref count have >>> increased one. >>> >>>            local_lock(&cpu_fbatches.lock); >>>            fbatch = this_cpu_ptr(&cpu_fbatches.activate); >>>            folio_batch_add_and_move(fbatch, folio, folio_activate_fn); >>>            local_unlock(&cpu_fbatches.lock); >>>        } >>> } >> >> Interesting, the !SMP variant does the folio_test_clear_lru(). >> >> It would be really helpful if we could reliably identify whether LRU >> batching code has a raised reference on a folio. >> >> We have the same scenario in >> * folio_deactivate() >> * folio_mark_lazyfree() >> >> In folio_batch_move_lru() we do the folio_test_clear_lru(folio). >> >> No expert on that code, I'm wondering if we could move the >> folio_test_clear_lru() out, such that we can more reliably identify >> whether a folio is on the LRU batch or not. > > I'm sure there would be something extremely broken with the following > (I don't know what I'm doing ;) ), but I wonder if there would be a way > to make something like that work (and perform well enough?). > > diff --git a/mm/swap.c b/mm/swap.c > index 67786cb771305..642e471c3ec5a 100644 > --- a/mm/swap.c > +++ b/mm/swap.c > @@ -212,10 +212,6 @@ static void folio_batch_move_lru(struct folio_batch > *fbatch, move_fn_t move_fn) >         for (i = 0; i < folio_batch_count(fbatch); i++) { >                 struct folio *folio = fbatch->folios[i]; > > -               /* block memcg migration while the folio moves between > lru */ > -               if (move_fn != lru_add_fn && !folio_test_clear_lru(folio)) > -                       continue; > - >                 folio_lruvec_relock_irqsave(folio, &lruvec, &flags); >                 move_fn(lruvec, folio); > > @@ -255,8 +251,9 @@ static void lru_move_tail_fn(struct lruvec *lruvec, > struct folio *folio) >   */ >  void folio_rotate_reclaimable(struct folio *folio) >  { > -       if (!folio_test_locked(folio) && !folio_test_dirty(folio) && > -           !folio_test_unevictable(folio) && folio_test_lru(folio)) { > +       if (folio_test_lru(folio) && !folio_test_locked(folio) && > +           !folio_test_dirty(folio) && !folio_test_unevictable(folio) && > +           folio_test_clear_lru(folio)) { >                 struct folio_batch *fbatch; >                 unsigned long flags; > > @@ -354,7 +351,7 @@ static void folio_activate_drain(int cpu) >  void folio_activate(struct folio *folio) >  { >         if (folio_test_lru(folio) && !folio_test_active(folio) && > -           !folio_test_unevictable(folio)) { > +           !folio_test_unevictable(folio) && > folio_test_clear_lru(folio)) { IMO, this seems violate the semantics of the LRU flag, since it's clear that this folio is still in the LRU list. With your changes, I think we should drain the page vectors before calling folio_test_lru(), otherwise some cases will fail to check folio_test_lru() if the folio remain in the page vectors for an extended period. >                 struct folio_batch *fbatch; > >                 folio_get(folio); > @@ -699,6 +696,8 @@ void deactivate_file_folio(struct folio *folio) >         /* Deactivating an unevictable folio will not accelerate > reclaim */ >         if (folio_test_unevictable(folio)) >                 return; > +       if (!folio_test_clear_lru(folio)) > +               return; > >         folio_get(folio); >         local_lock(&cpu_fbatches.lock); > @@ -718,7 +717,8 @@ void deactivate_file_folio(struct folio *folio) >  void folio_deactivate(struct folio *folio) >  { >         if (folio_test_lru(folio) && !folio_test_unevictable(folio) && > -           (folio_test_active(folio) || lru_gen_enabled())) { > +           (folio_test_active(folio) || lru_gen_enabled()) && > +           folio_test_clear_lru(folio)) { >                 struct folio_batch *fbatch; > >                 folio_get(folio); > @@ -740,7 +740,8 @@ void folio_mark_lazyfree(struct folio *folio) >  { >         if (folio_test_lru(folio) && folio_test_anon(folio) && >             folio_test_swapbacked(folio) && > !folio_test_swapcache(folio) && > -           !folio_test_unevictable(folio)) { > +           !folio_test_unevictable(folio) && > +           folio_test_clear_lru(folio)) { >                 struct folio_batch *fbatch; > >                 folio_get(folio); > > >