Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp2154690pxb; Fri, 5 Mar 2021 08:27:02 -0800 (PST) X-Google-Smtp-Source: ABdhPJzWQkwDcOTa+C8HlQXLbKErqpi4uxnjUBiAMriAPzcrsnYmlnpR4SWCGzmnFuiLQkmp0cMr X-Received: by 2002:a05:6402:1113:: with SMTP id u19mr9539316edv.205.1614961622041; Fri, 05 Mar 2021 08:27:02 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1614961622; cv=none; d=google.com; s=arc-20160816; b=pNRrGyctugJLIl5iFHF5MphCf+3cdDR7//Aa4jGnobx7cIdo8tvO1stUGejB1qwDGx nLlVwug3sx8AiqeRYd93w4EiSlSIqWwVgS1k9g5h+PyaSeLPqunLnN0Xx7G8hgFBwV0R xiuEFI8kkVlNP+60wp1ltr2TsN8i8rjKLQOETKGHjxqTCGZn6+hpfSbwbTQCW9MQA11Z L5diZARAuOyNjRB22QYDixMal8QzoEO2wsG58Fq7lb8ynCA517ATNiVoRoVkp4RemIjT tKrtK+Rk3n1e+BgGhT72GqkP/rn5BgHH0cmBuqQsu/3NiwXCy3WvhMnJAQppxbph/jSA f33w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=5GBpYtsOToNXALvaP0yVwWut+4euuVYAomBALxtd1po=; b=CUnC9FyUvtH9/nedxSwbqmtNtEtudJCulDHM7pN9/QSMWIUH4sCraZsT5izF7ipqe0 sLUHTUGMeeiZ+F1vlTnIIITqKMMGtkSHCq5qQ3QFxnbF9+xzvh3j+AWf4NUrAJpxRlbu ny48DgqjWBO3Vo3/JM2qveogV48UJCoRcGbAPgF1iEtKRdaJz9CBQOYxUnL2F7D9sJaT mJu1yx0Z5ns4GJf07AgBXqdUOGSmPfvnx9txh8k8+cfHZvufne3jyQvEb9yTRRTapX7h ONcH/yAO/Va+umTx0lFK38joaYxoWvZ/r2HAEnbVt4+TNubjub94l8I+cnJnPVEMOyOr EuUA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cmpxchg-org.20150623.gappssmtp.com header.s=20150623 header.b=l5GR7801; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id h20si1637183ejc.687.2021.03.05.08.26.38; Fri, 05 Mar 2021 08:27:02 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@cmpxchg-org.20150623.gappssmtp.com header.s=20150623 header.b=l5GR7801; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230430AbhCEQZm (ORCPT + 99 others); Fri, 5 Mar 2021 11:25:42 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59256 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229682AbhCEQZN (ORCPT ); Fri, 5 Mar 2021 11:25:13 -0500 Received: from mail-qt1-x831.google.com (mail-qt1-x831.google.com [IPv6:2607:f8b0:4864:20::831]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 22237C06175F for ; Fri, 5 Mar 2021 08:25:13 -0800 (PST) Received: by mail-qt1-x831.google.com with SMTP id d11so2116711qtx.9 for ; Fri, 05 Mar 2021 08:25:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=5GBpYtsOToNXALvaP0yVwWut+4euuVYAomBALxtd1po=; b=l5GR7801lJ+iexrPZFrMv2GyBBGWIkMKJrSkR2MfNTsFIDXHUpy3cm/SE8mt8iA8UW tX/fInL970tsjCxnzQF+FRUNwkBYl8ssYwqC0WnbEYVWqVCGsXbQPeWx+VWBu956APq5 mGp7vqfZN32cBBvXjmDpf/1oIPIh2R8ci632Envp0BMPk+oYYwrezZz94yY0QUPCCM5C 0MdWXR5gQwacJqCZHp30/0E3L+uYsM+aSN4osjQSY2P/XcQXN0bZ63NuW5fVyTMeSvMY +wgAS8a9Hke9BxQO5OP3oLzb5rE1rG9b5YRUFu7ogbuZnpaXfViZtSf1xkH9CLZTA469 WDKA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=5GBpYtsOToNXALvaP0yVwWut+4euuVYAomBALxtd1po=; b=dLWlvlRJt7XuZgJmkhkUPu1CDE5USXz7StAgirgmpOutDEBqW8vfC1RXzJjIfp0zP7 aE/hKISFo5W6EknQQaokBESBi4q8rRNxJYrohzu6C6j583R6slih25rvY4dTqUR64rXC VjKXk/twD4TMppuK1XCuNxabvJNgLFT0dWYHev67K4lujMZBX8I+fQJpBRVFNykC/285 SLIUKaDC4H11KZU/uaa8pGRCcFbca9vRIN/tgZzuP9RV6pcSG/shqBjhVxmGK5VTN8iT h77gIhA1GdJ+u3NFOtj0Kzxvi4z/S3U9A8NbVNqt0pGiPTeNoJM1bkNdQ79VnbHDY//n dlvQ== X-Gm-Message-State: AOAM530U02dVj3wMrpqXNB7RYombr7KaVW+u6BhpTevMcQOI/uXVZAUH 0NEC/baXcqLb4cY1pQ2Zzh+PPw== X-Received: by 2002:a05:622a:486:: with SMTP id p6mr9500637qtx.81.1614961512027; Fri, 05 Mar 2021 08:25:12 -0800 (PST) Received: from localhost (70.44.39.90.res-cmts.bus.ptd.net. [70.44.39.90]) by smtp.gmail.com with ESMTPSA id a34sm2185343qtc.97.2021.03.05.08.25.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 05 Mar 2021 08:25:11 -0800 (PST) Date: Fri, 5 Mar 2021 11:25:10 -0500 From: Johannes Weiner To: Hugh Dickins Cc: Shakeel Butt , Roman Gushchin , Michal Hocko , Andrew Morton , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v3] memcg: charge before adding to swapcache on swapin Message-ID: References: <20210304014229.521351-1-shakeelb@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Mar 05, 2021 at 12:06:31AM -0800, Hugh Dickins wrote: > On Wed, 3 Mar 2021, Shakeel Butt wrote: > > > Currently the kernel adds the page, allocated for swapin, to the > > swapcache before charging the page. This is fine but now we want a > > per-memcg swapcache stat which is essential for folks who wants to > > transparently migrate from cgroup v1's memsw to cgroup v2's memory and > > swap counters. In addition charging a page before exposing it to other > > parts of the kernel is a step in the right direction. > > > > To correctly maintain the per-memcg swapcache stat, this patch has > > adopted to charge the page before adding it to swapcache. One > > challenge in this option is the failure case of add_to_swap_cache() on > > which we need to undo the mem_cgroup_charge(). Specifically undoing > > mem_cgroup_uncharge_swap() is not simple. > > > > To resolve the issue, this patch introduces transaction like interface > > to charge a page for swapin. The function mem_cgroup_charge_swapin_page() > > initiates the charging of the page and mem_cgroup_finish_swapin_page() > > completes the charging process. So, the kernel starts the charging > > process of the page for swapin with mem_cgroup_charge_swapin_page(), > > adds the page to the swapcache and on success completes the charging > > process with mem_cgroup_finish_swapin_page(). > > > > Signed-off-by: Shakeel Butt > > Quite apart from helping with the stat you want, what you've ended > up with here is a nice cleanup in several different ways (and I'm > glad Johannes talked you out of __GFP_NOFAIL: much better like this). > I'll say > > Acked-by: Hugh Dickins > > but I am quite unhappy with the name mem_cgroup_finish_swapin_page(): > it doesn't finish the swapin, it doesn't finish the page, and I'm > not persuaded by your paragraph above that there's any "transaction" > here (if there were, I'd suggest "commit" instead of "finish"'; and > I'd get worried by the css_put before it's called - but no, that's > fine, it's independent). > > How about complementing mem_cgroup_charge_swapin_page() with > mem_cgroup_uncharge_swapin_swap()? I think that describes well > what it does, at least in the do_memsw_account() case, and I hope > we can overlook that it does nothing at all in the other case. Yes, that's better. The sequence is still somewhat mysterious for people not overly familiar with memcg swap internals, but it's much clearer for people who are. I mildly prefer swapping the swapin bit: mem_cgroup_swapin_charge_page() mem_cgroup_swapin_uncharge_swap() But either way works for me. > And it really doesn't need a page argument: both places it's called > have just allocated an order-0 page, there's no chance of a THP here; > but you might have some idea of future expansion, or matching > put_swap_page() - I won't object if you prefer to pass in the page. Agreed. > > + * mem_cgroup_finish_swapin_page - complete the swapin page charge transaction > > + * @page: page charged for swapin > > + * @entry: swap entry for which the page is charged > > + * > > + * This function completes the transaction of charging the page allocated for > > + * swapin. > > + */ > > +void mem_cgroup_finish_swapin_page(struct page *page, swp_entry_t entry) > > +{ > > /* > > * Cgroup1's unified memory+swap counter has been charged with the > > * new swapcache page, finish the transfer by uncharging the swap > > @@ -6760,20 +6796,14 @@ int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask) > > * correspond 1:1 to page and swap slot lifetimes: we charge the > > * page to memory here, and uncharge swap when the slot is freed. > > */ > > - if (do_memsw_account() && PageSwapCache(page)) { > > - swp_entry_t entry = { .val = page_private(page) }; > > + if (!mem_cgroup_disabled() && do_memsw_account()) { > > I understand why you put that !mem_cgroup_disabled() check in there, > but I have a series of observations on that. > > First I was going to say that it would be better left to > mem_cgroup_uncharge_swap() itself. > > Then I was going to say that I think it's already covered here > by the cgroup_memory_noswap check inside do_memsw_account(). > > Then, going back to mem_cgroup_uncharge_swap(), I realized that 5.8's > 2d1c498072de ("mm: memcontrol: make swap tracking an integral part of > memory control") removed the do_swap_account or cgroup_memory_noswap > checks from mem_cgroup_uncharge_swap() and swap_cgroup_swapon() and > swap_cgroup_swapoff() - so since then we have been allocating totally > unnecessary swap_cgroup arrays when mem_cgroup_disabled() (and > mem_cgroup_uncharge_swap() has worked by reading the zalloced array). > > I think, or am I confused? If I'm right on that, one of us ought to > send another patch putting back, either cgroup_memory_noswap checks > or mem_cgroup_disabled() checks in those three places - I suspect the > static key mem_cgroup_disabled() is preferable, but I'm getting dozy. You're right, that patch was overzealous. The point behind it was to ALWAYS track swap ownership when memcg is enabled, so that even if swap space COUNTING has been disabled, we'll charge pages back to their original owners during swap readahead and on swapoff. Therefor, we have to allocate the arrays even if the user requested cgroup_memory_noswap. But we certainly don't need to allocate and maintain the array when mem_cgroup_disabled() altogether. I'll send a patch to add those back. I'd also rename cgroup_memory_noswap to cgroup_swapaccount - to match the commandline and (hopefully) make a bit clearer what it effects. > Whatever we do with that - and it's really not any business for this > patch - I think you can drop the mem_cgroup_disabled() check from > mem_cgroup_uncharge_swapin_swap(). Yes. do_memsw_account() implies !mem_cgroup_disabled(), as disabling memcg sets cgroup_memory_noswap.