MIME-Version: 1.0
References: <20211124151915.GA6163@haolee.io> <YZ5o/VmU59evp65J@dhcp22.suse.cz>
In-Reply-To: <YZ5o/VmU59evp65J@dhcp22.suse.cz>
From:   Hao Lee <haolee.swjtu@gmail.com>
Date:   Thu, 25 Nov 2021 11:24:02 +0800
Message-ID: <CA+PpKPmy-u_BxYMCQOFyz78t2+3uM6nR9mQeX+MPyH6H2tOOHA@mail.gmail.com>
Subject: Re: [PATCH] mm: reduce spinlock contention in release_pages()
To:     Michal Hocko <mhocko@suse.com>
Cc:     Linux MM <linux-mm@kvack.org>,
        Johannes Weiner <hannes@cmpxchg.org>, vdavydov.dev@gmail.com,
        Shakeel Butt <shakeelb@google.com>, cgroups@vger.kernel.org,
        LKML <linux-kernel@vger.kernel.org>
Content-Type: text/plain; charset="UTF-8"
Precedence: bulk

On Thu, Nov 25, 2021 at 12:31 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Wed 24-11-21 15:19:15, Hao Lee wrote:
> > When several tasks are terminated simultaneously, lots of pages will be
> > released, which can cause severe spinlock contention. Other tasks which
> > are running on the same core will be seriously affected. We can yield
> > cpu to fix this problem.
>
> How does this actually address the problem? You are effectivelly losing
> fairness completely.

Got it. Thanks!

> We do batch currently so no single task should be
> able to monopolize the cpu for too long. Why this is not sufficient?

uncharge and unref indeed take advantage of the batch process, but
del_from_lru needs more time to complete. Several tasks will contend
spinlock in the loop if nr is very large. We can notice a transient peak
of sys% reflecting this, and perf can also report spinlock slowpath takes
too much time. This scenario is not rare, especially when containers are
destroyed simultaneously and other latency critical tasks may be affected
by this problem. So I want to figure out a way to deal with it.

Thanks.

>
> > diff --git a/mm/swap.c b/mm/swap.c
> > index e8c9dc6d0377..91850d51a5a5 100644
> > --- a/mm/swap.c
> > +++ b/mm/swap.c
> > @@ -960,8 +960,14 @@ void release_pages(struct page **pages, int nr)
> >               if (PageLRU(page)) {
> >                       struct lruvec *prev_lruvec = lruvec;
> >
> > -                     lruvec = folio_lruvec_relock_irqsave(folio, lruvec,
> > +retry:
> > +                     lruvec = folio_lruvec_tryrelock_irqsave(folio, lruvec,
> >                                                                       &flags);
> > +                     if (!lruvec) {
> > +                             cond_resched();
> > +                             goto retry;
> > +                     }
> > +
> >                       if (prev_lruvec != lruvec)
> >                               lock_batch = 0;
> >
> > --
> > 2.31.1
>
> --
> Michal Hocko
> SUSE Labs