Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp1189028rwb; Tue, 29 Nov 2022 10:03:46 -0800 (PST) X-Google-Smtp-Source: AA0mqf4ivERKD8Dzqv7l9xOYOAa2ZtS2q5BVjK5JtbF/bOEQNEmtNB0xAZMZ23H7AGBhUVlnTBjv X-Received: by 2002:a62:7b83:0:b0:574:eb89:dfb7 with SMTP id w125-20020a627b83000000b00574eb89dfb7mr17403929pfc.29.1669745026305; Tue, 29 Nov 2022 10:03:46 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1669745026; cv=none; d=google.com; s=arc-20160816; b=0tQnhghUCdLmtTW5m3bpk+Ev+ICdPPlTMSXSX74JTwuu/GEolQnn6JY1uz+jU8rSBR S1fEaSVoBJ8E3DB58io7tyBLHfsEh9jg8jy4V+9+XmJonijk96MsZA2CTCvpQlUguS06 X/+X4/WFgeq0S5rT1gpwU9F/i5+sMZI+kseGYpo6APFiAbw3Ia06SvXYSdAfNSXTkEwK 0DkfEfaDsgyxQH7od1ZXbk53PX4rtO/Fk8jPyeFIC61qTqvFj3bCA99vBP2YKlcSW4kl c1kthrekrqv/mR1L6H/rtW8TPIQ+nBrY5EYy5l3pzHVLbGbP/SNY1cIPqAbM3uQV9SSj AckQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=Hl3DPINmZ1T1XtdjVjeWIBcpMDsqsxiqkNjkViEje6g=; b=bhFJ1+vFZXAGv22ywvioQxAm2Fdl//O5wpEu+LXrqzat0Pq81kJyR3/9apVKQkPUTK Gx2fLr6OeW9s9RNTorZRufOHsmuLBrq5Bi/ntwNEpNnHaNt+85jS/R8pyXjSKIKqYdJH eSJO4TbP7cLEZvfScaW9T5Hd0N5iVuS9A0yBALEIbZF19IVwUhtTvkJWhwZKGgVflIyt e31BayLFuHjmJ0mHTpTN4zm5LDWx1JxYL1luctS6dbtuAU1agDE7pnE9c8yWd9NA6l2F cSaP/O50AE3mmYKqoPeK5f+EMZPxFmOcVeIEsRSXPRyEQwYLQ/9FmyNhxYsUGML0MXBM qC0Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=S2O6hUY5; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ot2-20020a17090b3b4200b001fe2de6a2c9si2432422pjb.4.2022.11.29.10.03.21; Tue, 29 Nov 2022 10:03:46 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=S2O6hUY5; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236344AbiK2Rtv (ORCPT + 84 others); Tue, 29 Nov 2022 12:49:51 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45924 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236359AbiK2Rti (ORCPT ); Tue, 29 Nov 2022 12:49:38 -0500 Received: from mail-pl1-x632.google.com (mail-pl1-x632.google.com [IPv6:2607:f8b0:4864:20::632]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E1E1168C5C for ; Tue, 29 Nov 2022 09:49:37 -0800 (PST) Received: by mail-pl1-x632.google.com with SMTP id b21so14145137plc.9 for ; Tue, 29 Nov 2022 09:49:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=Hl3DPINmZ1T1XtdjVjeWIBcpMDsqsxiqkNjkViEje6g=; b=S2O6hUY5rTSZmVMXYryt5ZPXz7UTn/t8d0ZpVZ/P7PSEnElhd2Gb9JCwr5oRUL2n+b eska0zL51J7KzNay8y03JpZRAT4X7FXicBAu1rV9/El1Pln8vAnHHisgagi3GkOYDrOA VqptbHTlmJZjIIlwNMkVe/Alh1rHgNBu+d0LdvJ8LKsk+CQvitgJRwyXy4RylfE2zY4T EYifWdhLXJQ+2K90zbyogJcbW4QeKTkk/ekQz6cjXY6yWzU6wpuOaGyzKq+zVG4+0AvJ XYDeopFJoNC9Zc8whw59rhzseZ8CTkW6iwj6kCqvwURac6yuFPZJBbwR5Z7WDPSGWey1 vIJw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=Hl3DPINmZ1T1XtdjVjeWIBcpMDsqsxiqkNjkViEje6g=; b=gKE94CVLPvyWfcTW5Rub6jCvdPJwvUHiem6LVIZauCTX7QfXndmL91x5UNkT1oBupi mxsA9ymVZguYahTiQaRSKNGWXBugMm9JqsumoOu4GZFJ5BHkLFTZWAovPLKfLgL/saFI H6Bs3gdVmlxhHs/P9l1XEus5L1gjW/NkTxMfB8dypeVS5aBke60c0QuuQwc/PIE4A3oq rBbcmreV7vC+F2pOdVLBoveMYiLXBz9xPvj15hDVSgbQMNBgTJy1YwgRMrTA45TYFmUH Uu7x7AlPj2vlBCz+HkHzW1lCbrsi0vt7UQ1qoyaMpQACNhy3I4I+l7ch2qemtjHcuivJ ZMkA== X-Gm-Message-State: ANoB5pmEsdE19EtqlvC3Ty1Iy5jtdyQIM4TwgUmxBU6jaW1FnYYBDbvz c9ZgcOV/37ZOcE83gWNIooE6dUePW+qMYQTGsqY= X-Received: by 2002:a17:90a:4889:b0:20d:d531:97cc with SMTP id b9-20020a17090a488900b0020dd53197ccmr62168749pjh.164.1669744177387; Tue, 29 Nov 2022 09:49:37 -0800 (PST) MIME-Version: 1.0 References: <8a2f2644-71d0-05d7-49d8-878aafa99652@huawei.com> In-Reply-To: From: Yang Shi Date: Tue, 29 Nov 2022 09:49:25 -0800 Message-ID: Subject: Re: [QUESTION] memcg page_counter seems broken in MADV_DONTNEED with THP enabled To: Michal Hocko Cc: Yongqiang Liu , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , "akpm@linux-foundation.org" , aarcange@redhat.com, hughd@google.com, mgorman@suse.de, cl@gentwo.org, zokeefe@google.com, rientjes@google.com, Matthew Wilcox , peterx@redhat.com, "Wangkefeng (OS Kernel Lab)" , "zhangxiaoxu (A)" , kirill.shutemov@linux.intel.com, Lu Jialin Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-1.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Nov 29, 2022 at 12:10 AM Michal Hocko wrote: > > On Mon 28-11-22 12:01:37, Yang Shi wrote: > > On Sat, Nov 26, 2022 at 5:10 AM Yongqiang Liu wrote: > > > > > > Hi, > > > > > > We use mm_counter to how much a process physical memory used. Meanwhile, > > > page_counter of a memcg is used to count how much a cgroup physical > > > memory used. > > > If a cgroup only contains a process, they looks almost the same. But with > > > THP enabled, sometimes memory.usage_in_bytes in memcg may be twice or > > > more than rss > > > in proc/[pid]/smaps_rollup as follow: > [...] > > > node_page_stat which shows in meminfo was also decreased. the > > > __split_huge_pmd > > > seems free no physical memory unless the total THP was free.I am > > > confused which > > > one is the true physical memory used of a process. > > > > This should be caused by the deferred split of THP. When MADV_DONTNEED > > is called on the partial of the map, the huge PMD is split, but the > > THP itself will not be split until the memory pressure is hit (global > > or memcg limit). So the unmapped sub pages are actually not freed > > until that point. So the mm counter is decreased due to the zapping > > but the physical pages are not actually freed then uncharged from > > memcg. > > Yes, and this is not really bound to THP. Consider a page cache. It can > be accessed via syscalls when it doesn't correspondent to rss at all > while it is still charged to a memcg. Or it can be mapped and then later > unmapped so it disappear from rss while it is still charged until it > gets reclaimed by the memory pressure. Or it can be an in-memory object > that is not bound to any process life time (e.g. tmpfs). Or it can be a > kernel memory charged to a memcg which is not covered by rss because it > is either not mapped or it is unknown to rss counters. Yes, good points. Thanks, Michal. And one more thing worth mentioning is that the RSS shown by ps or smaps is different from the RSS shown by memcg. > -- > Michal Hocko > SUSE Labs