Date:   Tue, 28 Feb 2023 22:46:47 -0800 (PST)
From:   Hugh Dickins <hughd@google.com>
To:     "Huang, Ying" <ying.huang@intel.com>
cc:     Hugh Dickins <hughd@google.com>,
        Andrew Morton <akpm@linux-foundation.org>, linux-mm@kvack.org,
        linux-kernel@vger.kernel.org, "Xu, Pengfei" <pengfei.xu@intel.com>,
        Christoph Hellwig <hch@lst.de>,
        Stefan Roesch <shr@devkernel.io>, Tejun Heo <tj@kernel.org>,
        Xin Hao <xhao@linux.alibaba.com>, Zi Yan <ziy@nvidia.com>,
        Yang Shi <shy828301@gmail.com>,
        Baolin Wang <baolin.wang@linux.alibaba.com>,
        Matthew Wilcox <willy@infradead.org>,
        Mike Kravetz <mike.kravetz@oracle.com>
Subject: Re: [PATCH 3/3] migrate_pages: try migrate in batch asynchronously
 firstly
In-Reply-To: <874jr5atqf.fsf@yhuang6-desk2.ccr.corp.intel.com>
Message-ID: <c9de353-2420-d076-9fff-d6011611c2b@google.com>
References: <20230224141145.96814-1-ying.huang@intel.com> <20230224141145.96814-4-ying.huang@intel.com> <bdc873-3367-9aa7-79c6-91c68fecac41@google.com> <87cz5ub5dr.fsf@yhuang6-desk2.ccr.corp.intel.com> <070f71-9af-c29a-30b9-758b5cdf6766@google.com>
 <874jr5atqf.fsf@yhuang6-desk2.ccr.corp.intel.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Precedence: bulk

On Wed, 1 Mar 2023, Huang, Ying wrote:
> Hugh Dickins <hughd@google.com> writes:
> > On Tue, 28 Feb 2023, Huang, Ying wrote:
> >> Hugh Dickins <hughd@google.com> writes:
> >> > On Fri, 24 Feb 2023, Huang Ying wrote:
> >> >> 
> >> >> diff --git a/mm/migrate.c b/mm/migrate.c
> >> >> index 91198b487e49..c17ce5ee8d92 100644
> >> >> --- a/mm/migrate.c
> >> >> +++ b/mm/migrate.c
> >> >> @@ -1843,6 +1843,51 @@ static int migrate_pages_batch(struct list_head *from, new_page_t get_new_page,
> >> >>  	return rc;
> >> >>  }
> >> >>  
> >> >> +static int migrate_pages_sync(struct list_head *from, new_page_t get_new_page,
> >> >> +		free_page_t put_new_page, unsigned long private,
> >> >> +		enum migrate_mode mode, int reason, struct list_head *ret_folios,
> >> >> +		struct list_head *split_folios, struct migrate_pages_stats *stats)
> >> >> +{
> >> >> +	int rc, nr_failed = 0;
> >> >> +	LIST_HEAD(folios);
> >> >> +	struct migrate_pages_stats astats;
> >> >> +
> >> >> +	memset(&astats, 0, sizeof(astats));
> >> >> +	/* Try to migrate in batch with MIGRATE_ASYNC mode firstly */
> >> >> +	rc = migrate_pages_batch(from, get_new_page, put_new_page, private, MIGRATE_ASYNC,
> >> >> +				 reason, &folios, split_folios, &astats,
> >> >> +				 NR_MAX_MIGRATE_PAGES_RETRY);
> >> >
> >> > I wonder if that and below would better be NR_MAX_MIGRATE_PAGES_RETRY / 2.
> >> >
> >> > Though I've never got down to adjusting that number (and it's not a job
> >> > to be done in this set of patches), those 10 retries sometimes terrify
> >> > me, from a latency point of view.  They can have such different weights:
> >> > in the unmapped case, 10 retries is okay; but when a pinned page is mapped
> >> > into 1000 processes, the thought of all that unmapping and TLB flushing
> >> > and remapping is terrifying.
> >> >
> >> > Since you're retrying below, halve both numbers of retries for now?
> >> 
> >> Yes.  These are reasonable concerns.
> >> 
> >> And in the original implementation, we only wait to lock page and wait
> >> the writeback to complete if pass > 2.  This is kind of trying to
> >> migrate asynchronously for 3 times before the real synchronous
> >> migration.  So, should we delete the "force" logic (in
> >> migrate_folio_unmap()), and try to migrate asynchronously for 3 times in
> >> batch before migrating synchronously for 7 times one by one?
> >
> > Oh, that's a good idea (but please don't imagine I've thought it through):
> > I hadn't realized the way in which your migrate_pages_sync() addition is
> > kind of duplicating the way that the "force" argument conditions behaviour,
> > It would be very appealing to delete the "force" argument now if you can.
> 
> Sure.  Will do that in the next version.
> 
> > But aside from that, you've also made me wonder (again, please remember I
> > don't have a good picture of the new migrate_pages() sequence in my head)
> > whether you have already made a *great* strike against my 10 retries
> > terror.  Am I reading it right, that the unmapping is now done on the
> > first try, and the remove_migration_ptes after the last try (all the
> > pages involved having remained locked throughout)?
> 
> Yes.  You are right.  Now, unmapping and moving are two separate steps,
> and they are retried separately.  After a folio has been unmapped
> successfully, we will not remap/unmap it 10 times if the folio is pinned
> so that failed to move (migrate_folio_move()).  So the latency caused by
> retrying is much better now.  But I still tend to keep the total retry
> number as before.  Do you agree?

Yes, I agree, keep the total retry number 10 as before: maybe someone in
future will show that more than 5 is a waste of time, but there's little
need to get into that now: if you've put an end to that 10 times unmapping
and remapping, that's a great step forward, quite apart from the TLB flush
batching itself.

(I did change "no need" to "little need" above: I do have some some
anxiety about the increased latencies from keeping folios locked and
migration entries in place for significantly longer than before your
batching: I won't be surprised if the maximum batch size has to be
lowered, if reports of latency spikes come in; and that might extend
to the retry count too.)

Hugh