Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18;
MIME-Version: 1.0
References: <20210219224405.1544597-1-shakeelb@google.com> <YDBZFY8WnLewRqLg@cmpxchg.org>
In-Reply-To: <YDBZFY8WnLewRqLg@cmpxchg.org>
From:   Shakeel Butt <shakeelb@google.com>
Date:   Mon, 22 Feb 2021 10:45:54 -0800
Message-ID: <CALvZod6Uf4w1xTrM6wSuGyefqWSfauXiJbCex_OnhY2g-Kadhw@mail.gmail.com>
Subject: Re: [PATCH] memcg: charge before adding to swapcache on swapin
To:     Johannes Weiner <hannes@cmpxchg.org>
Cc:     Hugh Dickins <hughd@google.com>, Roman Gushchin <guro@fb.com>,
        Michal Hocko <mhocko@kernel.org>,
        Andrew Morton <akpm@linux-foundation.org>,
        Cgroups <cgroups@vger.kernel.org>, Linux MM <linux-mm@kvack.org>,
        LKML <linux-kernel@vger.kernel.org>
Content-Type: text/plain; charset="UTF-8"
Precedence: bulk

On Fri, Feb 19, 2021 at 4:34 PM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> On Fri, Feb 19, 2021 at 02:44:05PM -0800, Shakeel Butt wrote:
> > Currently the kernel adds the page, allocated for swapin, to the
> > swapcache before charging the page. This is fine but now we want a
> > per-memcg swapcache stat which is essential for folks who wants to
> > transparently migrate from cgroup v1's memsw to cgroup v2's memory and
> > swap counters.
> >
> > To correctly maintain the per-memcg swapcache stat, one option which
> > this patch has adopted is to charge the page before adding it to
> > swapcache. One challenge in this option is the failure case of
> > add_to_swap_cache() on which we need to undo the mem_cgroup_charge().
> > Specifically undoing mem_cgroup_uncharge_swap() is not simple.
> >
> > This patch circumvent this specific issue by removing the failure path
> > of  add_to_swap_cache() by providing __GFP_NOFAIL. Please note that in
> > this specific situation ENOMEM was the only possible failure of
> > add_to_swap_cache() which is removed by using __GFP_NOFAIL.
> >
> > Another option was to use __mod_memcg_lruvec_state(NR_SWAPCACHE) in
> > mem_cgroup_charge() but then we need to take of the do_swap_page() case
> > where synchronous swap devices bypass the swapcache. The do_swap_page()
> > already does hackery to set and reset PageSwapCache bit to make
> > mem_cgroup_charge() execute the swap accounting code and then we would
> > need to add additional parameter to tell to not touch NR_SWAPCACHE stat
> > as that code patch bypass swapcache.
> >
> > This patch added memcg charging API explicitly foe swapin pages and
> > cleaned up do_swap_page() to not set and reset PageSwapCache bit.
> >
> > Signed-off-by: Shakeel Butt <shakeelb@google.com>
>
> The patch makes sense to me. While it extends the charge interface, I
> actually quite like that it charges the page earlier - before putting
> it into wider circulation. It's a step in the right direction.
>
> But IMO the semantics of mem_cgroup_charge_swapin_page() are a bit too
> fickle: the __GFP_NOFAIL in add_to_swap_cache() works around it, but
> having a must-not-fail-after-this line makes the code tricky to work
> on and error prone.
>
> It would be nicer to do a proper transaction sequence.
>
> > @@ -497,16 +497,15 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
> >       __SetPageLocked(page);
> >       __SetPageSwapBacked(page);
> >
> > -     /* May fail (-ENOMEM) if XArray node allocation failed. */
> > -     if (add_to_swap_cache(page, entry, gfp_mask & GFP_RECLAIM_MASK, &shadow)) {
> > -             put_swap_page(page, entry);
> > +     if (mem_cgroup_charge_swapin_page(page, NULL, gfp_mask, entry))
> >               goto fail_unlock;
> > -     }
> >
> > -     if (mem_cgroup_charge(page, NULL, gfp_mask)) {
> > -             delete_from_swap_cache(page);
> > -             goto fail_unlock;
> > -     }
> > +     /*
> > +      * Use __GFP_NOFAIL to not worry about undoing the changes done by
> > +      * mem_cgroup_charge_swapin_page() on failure of add_to_swap_cache().
> > +      */
> > +     add_to_swap_cache(page, entry,
> > +                       (gfp_mask|__GFP_NOFAIL) & GFP_RECLAIM_MASK, &shadow);
>
> How about:
>
>         mem_cgroup_charge_swapin_page()
>         add_to_swap_cache()
>         mem_cgroup_finish_swapin_page()
>
> where finish_swapin_page() only uncharges the swap entry (on cgroup1)
> once the swap->memory transition is complete?
>
> Otherwise the patch looks good to me.

Thanks for the review and yes this makes the code much more clear and
maintainable.