Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp509985rwb; Tue, 29 Nov 2022 01:29:21 -0800 (PST) X-Google-Smtp-Source: AA0mqf6WcPe+r8OHZ6DuCgaWKUNVTyRPDa+ONKPQpd5QvLXYCe6qWWwI4yV4A5PXg3p97aOgXLMI X-Received: by 2002:a17:90b:2744:b0:218:7675:ba83 with SMTP id qi4-20020a17090b274400b002187675ba83mr25907923pjb.61.1669714160955; Tue, 29 Nov 2022 01:29:20 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1669714160; cv=none; d=google.com; s=arc-20160816; b=v51zrDl8fFnm4xOoQCAF95hrUZyB1T/gFN2H0uJ8um9u/44NQG2ACA+IkpJ1/Bwtdp qmlVR1tSMYWJLvwXRjShe6oQ0PnWyK3ldwN9PntstLSyzXRMHuV8zNVfh7TZQmWxWpy4 6HCa1IO4eHPNrBt/0vvGClEg4/r9SUx2Z/lBE0zn3jvK7hnlTUadfdhoi7AdiPRRKAfN u+QnYiClQWHxnIgRhcxtAzYmOCdhlf29YPotoUAPrQQxxSMYLU2wJjJpnQj4TGu8eS84 Oikcbpn3TDIkI0fYrUaa2bMXvo58NxhBw15mOOCRfDAkdjDbGWu6j4d0W30fudMLIzBm 86Ew== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=vpF2YamSI/QIudpCLdal+ckmziNQUnDAtuCenuOdo9E=; b=Wnsz0BAZpl9zlCfT2jvhyCKRhnVj0FcQEH0jwx2Ax1p0f6xZD6NbLxLIIt/lcMjaNr EbNNTMDZCtXdE87DlBR/LyQSv1WS852o38fdrXcqg+iKIR58uEbLSFAs++P+CIILw4z3 g19OBIBW0JOus7iBsNhOOJgQ5GWQ6tqMXmXhl2T6QnFV8yip5doL2G1N8Tf0+7M8t7Ya xuzFuIhTSva1cAECB+9aWIAs997cq9HIxf+p0Yz4M2v7cftKeSOmbM7ZN7dxmq/uOJW+ WwgE6nlE/ZlTTaArWQnYGQDRqL6/9Sw1Vgoujwk6wuQJsYuI20/ycFeito49FdDLj7PM ylDQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=YD9s9c99; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id s1-20020a63f041000000b00477bb3c1b5csi14115175pgj.871.2022.11.29.01.29.08; Tue, 29 Nov 2022 01:29:20 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=YD9s9c99; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230108AbiK2ILA (ORCPT + 83 others); Tue, 29 Nov 2022 03:11:00 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57098 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229559AbiK2IK5 (ORCPT ); Tue, 29 Nov 2022 03:10:57 -0500 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D78DA54751 for ; Tue, 29 Nov 2022 00:10:56 -0800 (PST) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 893951FDDE; Tue, 29 Nov 2022 08:10:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1669709455; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=vpF2YamSI/QIudpCLdal+ckmziNQUnDAtuCenuOdo9E=; b=YD9s9c997jmmDDfxsh3bT8iKz1T1lktCDneoYeKA31d6pomrnHKwLbcmWdlsXwNM4bDr90 1ph1S+71F1eevqdw+JIE0ynaJC0nuC7RRGJs51S0lKdGyfazaWgF/a//RKWGGuP0AGlM/x r5cLonfI4ma+50bx4w/oxa9qLyl8JFQ= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 6C81013AF6; Tue, 29 Nov 2022 08:10:55 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id pGrBF4++hWOWegAAMHmgww (envelope-from ); Tue, 29 Nov 2022 08:10:55 +0000 Date: Tue, 29 Nov 2022 09:10:54 +0100 From: Michal Hocko To: Yang Shi Cc: Yongqiang Liu , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , "akpm@linux-foundation.org" , aarcange@redhat.com, hughd@google.com, mgorman@suse.de, cl@gentwo.org, n-horiguchi@ah.jp.nec.com, zokeefe@google.com, rientjes@google.com, Matthew Wilcox , peterx@redhat.com, "Wangkefeng (OS Kernel Lab)" , "zhangxiaoxu (A)" , kirill.shutemov@linux.intel.com, Lu Jialin Subject: Re: [QUESTION] memcg page_counter seems broken in MADV_DONTNEED with THP enabled Message-ID: References: <8a2f2644-71d0-05d7-49d8-878aafa99652@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon 28-11-22 12:01:37, Yang Shi wrote: > On Sat, Nov 26, 2022 at 5:10 AM Yongqiang Liu wrote: > > > > Hi, > > > > We use mm_counter to how much a process physical memory used. Meanwhile, > > page_counter of a memcg is used to count how much a cgroup physical > > memory used. > > If a cgroup only contains a process, they looks almost the same. But with > > THP enabled, sometimes memory.usage_in_bytes in memcg may be twice or > > more than rss > > in proc/[pid]/smaps_rollup as follow: [...] > > node_page_stat which shows in meminfo was also decreased. the > > __split_huge_pmd > > seems free no physical memory unless the total THP was free.I am > > confused which > > one is the true physical memory used of a process. > > This should be caused by the deferred split of THP. When MADV_DONTNEED > is called on the partial of the map, the huge PMD is split, but the > THP itself will not be split until the memory pressure is hit (global > or memcg limit). So the unmapped sub pages are actually not freed > until that point. So the mm counter is decreased due to the zapping > but the physical pages are not actually freed then uncharged from > memcg. Yes, and this is not really bound to THP. Consider a page cache. It can be accessed via syscalls when it doesn't correspondent to rss at all while it is still charged to a memcg. Or it can be mapped and then later unmapped so it disappear from rss while it is still charged until it gets reclaimed by the memory pressure. Or it can be an in-memory object that is not bound to any process life time (e.g. tmpfs). Or it can be a kernel memory charged to a memcg which is not covered by rss because it is either not mapped or it is unknown to rss counters. -- Michal Hocko SUSE Labs