Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20;
MIME-Version: 1.0
References: <20230413182313.RFC.1.Ia86ccac02a303154a0b8bc60567e7a95d34c96d3@changeid>
 <87v8hz17o9.fsf@yhuang6-desk2.ccr.corp.intel.com> <CAD=FV=XPBaGGLJVG9UGoJss6EU5=esqyt=aWsp2nOm2YcVOc8g@mail.gmail.com>
 <87ildvwbr5.fsf@yhuang6-desk2.ccr.corp.intel.com> <CAD=FV=WCWWuGO7D9X6By-fQ0ZB63iDsAvcPwza-F6tbA-Z_M6w@mail.gmail.com>
 <87edohvpzk.fsf@yhuang6-desk2.ccr.corp.intel.com>
In-Reply-To: <87edohvpzk.fsf@yhuang6-desk2.ccr.corp.intel.com>
From:   Doug Anderson <dianders@chromium.org>
Date:   Tue, 18 Apr 2023 09:19:21 -0700
Message-ID: <CAD=FV=V1SAufJUT-d5praYszUN1ssdT61WS7iB-c62R4M6Lsmw@mail.gmail.com>
Subject: Re: [RFC PATCH] migrate_pages: Never block waiting for the page lock
To:     "Huang, Ying" <ying.huang@intel.com>
Cc:     Andrew Morton <akpm@linux-foundation.org>,
        Yu Zhao <yuzhao@google.com>, linux-kernel@vger.kernel.org,
        linux-mm@kvack.org, Vlastimil Babka <vbabka@suse.cz>,
        Mel Gorman <mgorman@techsingularity.net>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Precedence: bulk

Hi,

On Mon, Apr 17, 2023 at 8:18=E2=80=AFPM Huang, Ying <ying.huang@intel.com> =
wrote:
>
> Doug Anderson <dianders@chromium.org> writes:
>
> > Hi,
> >
> > On Sun, Apr 16, 2023 at 6:15=E2=80=AFPM Huang, Ying <ying.huang@intel.c=
om> wrote:
> >>
> >> Doug Anderson <dianders@chromium.org> writes:
> >>
> >> > Hi,
> >> >
> >> > On Thu, Apr 13, 2023 at 8:10=E2=80=AFPM Huang, Ying <ying.huang@inte=
l.com> wrote:
> >> >>
> >> >> Douglas Anderson <dianders@chromium.org> writes:
> >> >>
> >> >> > Currently when we try to do page migration and we're in "synchron=
ous"
> >> >> > mode (and not doing direct compaction) then we'll wait an infinit=
e
> >> >> > amount of time for a page lock. This does not appear to be a grea=
t
> >> >> > idea.
> >> >> >
> >> >> > One issue can be seen when I put a device under extreme memory
> >> >> > pressure. I took a sc7180-trogdor Chromebook (4GB RAM, 8GB zram
> >> >> > swap). I ran the browser along with Android (which runs from a
> >> >> > loopback mounted 128K block-size squashfs "disk"). I then manuall=
y ran
> >> >> > the mmm_donut memory pressure tool [1]. The system is completely
> >> >> > unusable both with and without this patch since there are 8 proce=
sses
> >> >> > completely thrashing memory, but it was still interesting to look=
 at
> >> >> > how migration was behaving. I put some timing code in and I could=
 see
> >> >> > that we sometimes waited over 25 seconds (in the context of
> >> >> > kcompactd0) for a page lock to become available. Although the 25
> >> >> > seconds was the high mark, it was easy to see tens, hundreds, or
> >> >> > thousands of milliseconds spent waiting on the lock.
> >> >> >
> >> >> > Instead of waiting, if I bailed out right away (as this patch doe=
s), I
> >> >> > could see kcompactd0 move forward to successfully to migrate othe=
r
> >> >> > pages instead. This seems like a better use of kcompactd's time.
> >> >> >
> >> >> > Thus, even though this didn't make the system any more usable in =
my
> >> >> > absurd test case, it still seemed to make migration behave better=
 and
> >> >> > that feels like a win. It also makes the code simpler since we ha=
ve
> >> >> > one fewer special case.
> >> >>
> >> >> TBH, the test case is too extreme for me.
> >> >
> >> > That's fair. That being said, I guess the point I was trying to make
> >> > is that waiting for this lock could take an unbounded amount of time=
.
> >> > Other parts of the system sometimes hold a page lock and then do a
> >> > blocking operation. At least in the case of kcompactd there are bett=
er
> >> > uses of its time than waiting for any given page.
> >> >
> >> >> And, we have multiple "sync" mode to deal with latency requirement,=
 for
> >> >> example, we use MIGRATE_SYNC_LIGHT for compaction to avoid too long
> >> >> latency.  If you have latency requirement for some users, you may
> >> >> consider to add new "sync" mode.
> >> >
> >> > Sure. kcompactd_do_work() is currently using MIGRATE_SYNC_LIGHT. I
> >> > guess my first thought would be to avoid adding a new mode and make
> >> > MIGRATE_SYNC_LIGHT not block here. Then anyone that truly needs to
> >> > wait for all the pages to be migrated can use the heavier sync modes=
.
> >> > It seems to me like the current users of MIGRATE_SYNC_LIGHT would no=
t
> >> > want to block for an unbounded amount of time here. What do you thin=
k?
> >>
> >> It appears that you can just use MIGRATE_ASYNC if you think the correc=
t
> >> behavior is "NOT block at all".  I found that there are more
> >> fine-grained controls on this in compaction code, please take a look a=
t
> >> "enum compact_priority" and its comments.
> >
> > Actually, the more I think about it the more I think the right answer
> > is to keep kcompactd as using MIGRATE_SYNC_LIGHT and make
> > MIGRATE_SYNC_LIGHT not block on the folio lock.
>
> Then, what is the difference between MIGRATE_SYNC_LIGHT and
> MIGRATE_ASYNC?

Aren't there still some differences even if we remove blocking this
one lock? ...or maybe your point is that maybe the other differences
have similar properties?

OK, so let's think about just using MIGRATE_ASYNC and either leaving
MIGRATE_SYNC_LIGHT alone or deleting it (if there are no users left).
The nice thing is that the only users of MIGRATE_SYNC_LIGHT are in
"compaction.c" and there are only 3 places where it's specified.

1. kcompactd_do_work() - This is what I was analyzing and where I
argued that indefinite blocking is less useful than simply trying to
compact a different page. So sure, moving this to MIGRATE_ASYNC seems
like it would be OK?

2. proactive_compact_node() - Just like kcompactd_do_work(), this is
called from kcompactd and thus probably should have the same mode.

3. compact_zone_order() - This explicitly chooses between
MIGRATE_SYNC_LIGHT and MIGRATE_ASYNC, so I guess I'd keep
MIGRATE_SYNC_LIGHT just for this use case. It looks as if
compact_zone_order() is called for direct compaction and thus making
it synchronous can make sense, especially since it seems to go to the
synchronous case after it failed with the async case. Ironically,
though, the exact lock I was proposing to not wait on _isn't_ ever
waited on in direct reclaim (see the comment in migrate_folio_unmap()
about deadlock), but the other differences between SYNC_LIGHT and
ASYNC come into play.

If the above sounds correct then I'm OK w/ moving #1 and #2 to
MIGRATE_ASYNC and leaving #3 as the sole user or MIGRATE_SYNC_LIGHT.

> > kcompactd can accept some blocking but we don't want long / unbounded
> > blocking. Reading the comments for MIGRATE_SYNC_LIGHT, this also seems
> > like it fits pretty well. MIGRATE_SYNC_LIGHT says that the stall time
> > of writepage() is too much. It's entirely plausible that someone else
> > holding the lock is doing something as slow as writepage() and thus
> > waiting on the lock can be just as bad for latency.
>
> IIUC, during writepage(), the page/folio will be unlocked.
>
> But, during page reading, the page/folio will be locked.  I don't really
> understand why we can wait for page reading but cannot wait for page
> writeback.

I'm not sure I totally got your point here. It sorta sounds as if
you're making the same point that I was? IIUC by waiting on the lock
we may be implicitly waiting for someone to finish reading which seems
as bad as waiting for writing. That was why I was arguing that with
MIGRATE_SYNC_LIGHT (which says that waiting for the write was too
slow) that we shouldn't wait for the lock (which may be blocking on a
read).

-Doug