Received: by 2002:a25:868d:0:0:0:0:0 with SMTP id z13csp111599ybk; Tue, 12 May 2020 17:02:14 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyC4cVpfxHjhfwrgmRo2dr/Sy9JnD3JYtIVOEn2Ur825Voev3Ono+wSio9wjKAh3tI2MgLN X-Received: by 2002:a05:6402:68f:: with SMTP id f15mr3450411edy.89.1589328134327; Tue, 12 May 2020 17:02:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1589328134; cv=none; d=google.com; s=arc-20160816; b=022ldXkcI8LwGCh8o8l0zbGWN57Iox/6V+6YaHShLtcoGpGe+D095lwW3VuW1N2JZ1 21D/7/1UMMyu2oTdqJmKB2w6FWMTE9kU/zOFqAM23QpIy+dyNHZcK0hz1MfvyJXGjDBB lQpdYMily/6zZPLP10zK9KKAAVkViQ4yTbRPHV3kEVZNzWFnHOBq23aOSvAhdBaYfaXn nblp5BTOKXFwmBnn3FXC1Q4UhbkH07Ol49Pwxdw2UnitFKdEs7o9NtJroVtNSk4RLAlq J5nG8uX0so6sfT1WU0dnk/4MP3QiluX/1PA30rgPmVX2d+gquXj+3QzTsJCnYfitCJBo NaMw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:to:references:message-id :content-transfer-encoding:cc:date:in-reply-to:from:subject :mime-version:dkim-signature; bh=yD3a12+euav5lEzQ2eD5g0W71Fdr+4HWLSF5wAq+Di4=; b=twmj6HL4zHapPAkif0sFgo9B2NyRruGC6aQ/0+SfjaknLf67ekgEOO8ZtnjcAH/Z1Y l67l8WDwGnExFSC2sfyWiv401GID9KnFy3g7TGEp5o/GK3g0gkReytThm01EQQrVA7o0 JqMOhy+SYFs5GTpPiqGgO7gUrRkyTyAj8q1/QNVvqVr8Yozaq6YVx0vO8xU2U1sNK5nr 81anUtmvttf56ZuSvaIpvDSbLezYZ8CQiuinZJRPYk2n0UV+sZmfVsEoYY44fMvlTyTG gvbzRgA5Vg1/8p0L5il//Kff6umEo+wMQ8amDOLncx7RdkbVnWv0RlRm5jCclpwzwaoK T8Qg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@lca.pw header.s=google header.b=NCtjCiXd; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id u13si8302133edb.6.2020.05.12.17.01.51; Tue, 12 May 2020 17:02:14 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@lca.pw header.s=google header.b=NCtjCiXd; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731744AbgELX6T (ORCPT + 99 others); Tue, 12 May 2020 19:58:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42142 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725938AbgELX6T (ORCPT ); Tue, 12 May 2020 19:58:19 -0400 Received: from mail-qv1-xf44.google.com (mail-qv1-xf44.google.com [IPv6:2607:f8b0:4864:20::f44]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B82D8C061A0C for ; Tue, 12 May 2020 16:58:17 -0700 (PDT) Received: by mail-qv1-xf44.google.com with SMTP id t8so6912957qvw.5 for ; Tue, 12 May 2020 16:58:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lca.pw; s=google; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=yD3a12+euav5lEzQ2eD5g0W71Fdr+4HWLSF5wAq+Di4=; b=NCtjCiXdpUxrD3bLzlXKjxlXGtB+pIewyDmhQ6NgTrKvEBiiAO3l064DUq0chV8UqU 3wB2OBgRQ9Iht75ymMDaei3StnLtfND5OXchaakbk/r7q/N8ILNzEior35O7K8va4vj+ BMOiB9k3rO9/RspupD8JjBww3kCaa3IQrVCaRw2F0BIkSgdnctCS9PhcekAiHzOzBLkr nEry2gYpIjfwV5M6ajTLYZLme6Jsn+kCR4KMcN0MXVdjQ+LvwvTXQ0r55weM0847xzkz Mheh2jC4iUU31tpYcrbMBSe8SsSJO3QT5fBSmT+HjN6JfSS/rHlRsDS7n8JNyjTZLX8L ZRdg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=yD3a12+euav5lEzQ2eD5g0W71Fdr+4HWLSF5wAq+Di4=; b=TpFYT8HCbDtgC2uPSgARjSVcDjiw+QPRhYtj0Dzf8CLZ+4lZwSa9Q816kxs7tEgP6v b8nY0kWALn3zf6wLXKOfXw8SaBNpLzkirfc2jeaITqEzKuwM6ZcyVzxg5aPRf3o9ZkXc Qw5nDZuOE+aZh2gx4ifuzjTOlG2gKcnb3iOIo/KoorqGBjCeqJ+AEKIscahBFnyf6okH yuiEYP9t/Ad4qs8hviogVhaKTZUA9L3Lwi1gsxH1Sn+7pdzoh9Z6zdsfLwfQG3TNmMbC AUX9FPh0cYw48rPhdCAtyJvcizSotCLgHXi9PQjRmpIGkqjDIMbA2Vk9rsFQG+T3pqrC dGDQ== X-Gm-Message-State: AGi0PuY6acj71nbxq0+vSs5djz4VCvW0hptMDhhVb16T+QPPS/GkOZxS +6zXKirTDP5gXKqGeGpX+0g5mg== X-Received: by 2002:ad4:42b1:: with SMTP id e17mr23319979qvr.149.1589327896644; Tue, 12 May 2020 16:58:16 -0700 (PDT) Received: from [192.168.1.153] (pool-71-184-117-43.bstnma.fios.verizon.net. [71.184.117.43]) by smtp.gmail.com with ESMTPSA id j11sm11834028qkk.33.2020.05.12.16.58.15 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Tue, 12 May 2020 16:58:15 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.80.23.2.2\)) Subject: Re: [PATCH 12/19] mm: memcontrol: convert anon and file-thp to new mem_cgroup_charge() API From: Qian Cai In-Reply-To: <20200512215813.GA487759@cmpxchg.org> Date: Tue, 12 May 2020 19:58:14 -0400 Cc: Andrew Morton , Alex Shi , Joonsoo Kim , Shakeel Butt , Hugh Dickins , Michal Hocko , "Kirill A. Shutemov" , Roman Gushchin , Linux-MM , cgroups@vger.kernel.org, LKML , kernel-team@fb.com, Johannes Weiner Content-Transfer-Encoding: quoted-printable Message-Id: <25270BAD-CB7E-4E92-96BA-740690744DBA@lca.pw> References: <20200508183105.225460-1-hannes@cmpxchg.org> <20200508183105.225460-13-hannes@cmpxchg.org> <45AA36A9-0C4D-49C2-BA3C-08753BBC30FB@lca.pw> <20200512215813.GA487759@cmpxchg.org> To: Stephen Rothwell X-Mailer: Apple Mail (2.3608.80.23.2.2) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > On May 12, 2020, at 5:58 PM, Johannes Weiner = wrote: >=20 > On Tue, May 12, 2020 at 10:38:54AM -0400, Qian Cai wrote: >>> On May 8, 2020, at 2:30 PM, Johannes Weiner = wrote: >>>=20 >>> With the page->mapping requirement gone from memcg, we can charge = anon >>> and file-thp pages in one single step, right after they're = allocated. >>>=20 >>> This removes two out of three API calls - especially the tricky = commit >>> step that needed to happen at just the right time between when the >>> page is "set up" and when it's "published" - somewhat vague and = fluid >>> concepts that varied by page type. All we need is a freshly = allocated >>> page and a memcg context to charge. >>>=20 >>> v2: prevent double charges on pre-allocated hugepages in khugepaged >>>=20 >>> Signed-off-by: Johannes Weiner >>> Reviewed-by: Joonsoo Kim >>> --- >>> include/linux/mm.h | 4 +--- >>> kernel/events/uprobes.c | 11 +++-------- >>> mm/filemap.c | 2 +- >>> mm/huge_memory.c | 9 +++------ >>> mm/khugepaged.c | 35 ++++++++++------------------------- >>> mm/memory.c | 36 ++++++++++-------------------------- >>> mm/migrate.c | 5 +---- >>> mm/swapfile.c | 6 +----- >>> mm/userfaultfd.c | 5 +---- >>> 9 files changed, 31 insertions(+), 82 deletions(-) >> [] >>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c >>>=20 >>> @@ -1198,10 +1193,11 @@ static void collapse_huge_page(struct = mm_struct *mm, >>> out_up_write: >>> up_write(&mm->mmap_sem); >>> out_nolock: >>> + if (*hpage) >>> + mem_cgroup_uncharge(*hpage); >>> trace_mm_collapse_huge_page(mm, isolated, result); >>> return; >>> out: >>> - mem_cgroup_cancel_charge(new_page, memcg); >>> goto out_up_write; >>> } >> [] >>=20 >> Some memory pressure will crash this new code. It looks like somewhat = racy. >>=20 >> if (!page->mem_cgroup) >>=20 >> where page =3D=3D NULL in mem_cgroup_uncharge(). >=20 > Thanks for the report, sorry about the inconvenience. >=20 > Hm, the page is exclusive at this point, nobody else should be > touching it. After all, khugepaged might reuse the preallocated page > for another pmd if this one fails to collapse. >=20 > Looking at the code, I think it's page itself that's garbage, not > page->mem_cgroup changing. If you have CONFIG_NUMA and the allocation > fails, *hpage could contain an ERR_PTR instead of being NULL. >=20 > I think we need the following fixlet: Yes, I have NUMA here. Stephen, can you pick this up for first before Andrew has a chance to = push out the next mmotm hopefully contain this fix? https://lore.kernel.org/lkml/20200512215813.GA487759@cmpxchg.org/ >=20 > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > index f2e0a5e5cfbb..f6161e17da26 100644 > --- a/mm/khugepaged.c > +++ b/mm/khugepaged.c > @@ -1193,7 +1193,7 @@ static void collapse_huge_page(struct mm_struct = *mm, > out_up_write: > up_write(&mm->mmap_sem); > out_nolock: > - if (*hpage) > + if (!IS_ERR_OR_NULL(*hpage)) > mem_cgroup_uncharge(*hpage); > trace_mm_collapse_huge_page(mm, isolated, result); > return; > @@ -1928,7 +1928,7 @@ static void collapse_file(struct mm_struct *mm, > unlock_page(new_page); > out: > VM_BUG_ON(!list_empty(&pagelist)); > - if (*hpage) > + if (!IS_ERR_OR_NULL(*hpage)) > mem_cgroup_uncharge(*hpage); > /* TODO: tracepoints */ > }