Received: by 2002:a05:6a10:1a4d:0:0:0:0 with SMTP id nk13csp901341pxb; Tue, 1 Feb 2022 12:47:24 -0800 (PST) X-Google-Smtp-Source: ABdhPJybUN1XBL4QTSvf7MVMzKFTvkMBOJueZGj+xJhvgimkI2cu4fnub3tSBhJDGcuTWtGKw9LS X-Received: by 2002:a17:902:dac8:: with SMTP id q8mr28122339plx.9.1643748444067; Tue, 01 Feb 2022 12:47:24 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1643748444; cv=none; d=google.com; s=arc-20160816; b=zkuQeItuWhcZl6bTt4ZvTUb6NzobpYsEs9FvJ/Wr5E8+fC7x49aVpfNG7zNwtxSEDg AAtG9DCcr63r3IYXWtWk0qbHHMUCrC8tsifoYATpGYSkx2ifupc4QtG8hM+TwBwvIXpF krGCsSVfXXwMjUy1xUTh44sMaQ6RvzPv/5Y6FCNl9AkGe3T7OnbfMOl5yRoLRA1oSD+l nIWx50+3mmz6fA4CrgiKvz5BLzCLWtoAPIOdbPvvi+h+FvGxH26sD4JH8Qa+744PwCsp wPGDv/K9mwESyQ4596RRablDmZdRFfyb8kS36mc/h4Weqkw5bj4eit7eLJnoHj6diFRm hPlg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=DiBnAFdQdse7D2Q1MQrjsVd8hfDU0Q4Z6wXjB4bB+ic=; b=LEQSqRCX6MApfFBOx2aXC8jcSIYZialIY/n2baMMGd6aQcY6g8OK8RUVL/aP/USzEp aIgbAjAbLqU++KTbjS/txHn7RPTGjuUvVEj8pcnTFf7/BJuK8Vw+5t0UisLcO0pe3qW8 KqiTi7jeiNCHE9QicZwHrqpeLJbUm5ucQ1rQFsqbSS7IFqsxVnSKzUExG5MwbHNswopx qNHZoPdWMLVsAeKeVcxsayMCw56QeCRM6I/aluQYj1fUnZ2A5w7tgmGLEVpv29w3GsBK wonVfSrQiLeuyGeOpJLCaSVCewvd5maDBGmymtNpMGpYiCTqlSGr4Fv/ZdRajCjvRJt6 TX/A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=cP1Ii4hx; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id k71si18155800pga.357.2022.02.01.12.47.11; Tue, 01 Feb 2022 12:47:24 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=cP1Ii4hx; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1356833AbiAaSii (ORCPT + 99 others); Mon, 31 Jan 2022 13:38:38 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:20227 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234175AbiAaSih (ORCPT ); Mon, 31 Jan 2022 13:38:37 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1643654316; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=DiBnAFdQdse7D2Q1MQrjsVd8hfDU0Q4Z6wXjB4bB+ic=; b=cP1Ii4hxSm+0Xe2SqHwilseLd+hoavcQG+3xcrlyWxCte5DlwbNSvM/t7f3QMa7jjbyMh/ tNZs0Dld85Of4QHPgbd96QO9cTMwvhMRGaP/gUifOcZ+CUz2935hzM4RiX3u1L8Bg1rsVk db8hd1STOFwdKLEZw4e2NuB8CoKATDU= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-553-wk6H0LuIP_2AxFRLUl6LTQ-1; Mon, 31 Jan 2022 13:38:33 -0500 X-MC-Unique: wk6H0LuIP_2AxFRLUl6LTQ-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 0B70884B9A7; Mon, 31 Jan 2022 18:38:31 +0000 (UTC) Received: from [10.22.16.244] (unknown [10.22.16.244]) by smtp.corp.redhat.com (Postfix) with ESMTP id F3D8285EEC; Mon, 31 Jan 2022 18:38:28 +0000 (UTC) Message-ID: <12686956-612d-d89b-5641-470d5e913090@redhat.com> Date: Mon, 31 Jan 2022 13:38:28 -0500 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.4.0 Subject: Re: [PATCH v2 3/3] mm/page_owner: Dump memcg information Content-Language: en-US To: Michal Hocko , Roman Gushchin Cc: Johannes Weiner , Vladimir Davydov , Andrew Morton , Petr Mladek , Steven Rostedt , Sergey Senozhatsky , Andy Shevchenko , Rasmus Villemoes , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, Ira Weiny , Rafael Aquini References: <20220129205315.478628-1-longman@redhat.com> <20220129205315.478628-4-longman@redhat.com> From: Waiman Long In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 1/31/22 13:25, Michal Hocko wrote: > On Mon 31-01-22 10:15:45, Roman Gushchin wrote: >> On Mon, Jan 31, 2022 at 11:53:19AM -0500, Johannes Weiner wrote: >>> On Mon, Jan 31, 2022 at 10:38:51AM +0100, Michal Hocko wrote: >>>> On Sat 29-01-22 15:53:15, Waiman Long wrote: >>>>> It was found that a number of offlined memcgs were not freed because >>>>> they were pinned by some charged pages that were present. Even "echo >>>>> 1 > /proc/sys/vm/drop_caches" wasn't able to free those pages. These >>>>> offlined but not freed memcgs tend to increase in number over time with >>>>> the side effect that percpu memory consumption as shown in /proc/meminfo >>>>> also increases over time. >>>>> >>>>> In order to find out more information about those pages that pin >>>>> offlined memcgs, the page_owner feature is extended to dump memory >>>>> cgroup information especially whether the cgroup is offlined or not. >>>> It is not really clear to me how this is supposed to be used. Are you >>>> really dumping all the pages in the system to find out offline memcgs? >>>> That looks rather clumsy to me. I am not against adding memcg >>>> information to the page owner output. That can be useful in other >>>> contexts. >>> We've sometimes done exactly that in production, but with drgn >>> scripts. It's not very common, so it doesn't need to be very efficient >>> either. Typically, we'd encounter a host with an unusual number of >>> dying cgroups, ssh in and poke around with drgn to figure out what >>> kind of objects are still pinning the cgroups in question. >>> >>> This patch would make that process a little easier, I suppose. >> Right. Over last few years I've spent enormous amount of time digging into >> various aspects of this problem and in my experience the combination of drgn >> for the inspection of the current state and bpf for following various decisions >> on the reclaim path was the most useful combination. >> >> I really appreciate an effort to put useful tools to track memcg references >> into the kernel tree, however the page_owner infra has a limited usefulness >> as it has to be enabled on the boot. But because it doesn't add any overhead, >> I also don't think there any reasons to not add it. > Would it be feasible to add a debugfs interface to displa dead memcg > information? Originally, I added some debug code to keep track of the list of memcg that has been offlined but not yet freed. After some more testing, I figured out that the memcg's were not freed because they were pinned by references in the page structs. At this point, I realize the using the existing page owner debugging tool will be useful to track this kind of problem since it already have all the infrastructure to list where the pages were allocated as well as various field in the page structures. Of course, it is also possible to have a debugfs interface to list those dead memcg information, displaying more information about the page that pins the memcg will be hard without using the page owner tool. Keeping track of the list of dead memcg's may also have some runtime overhead. Cheers, Longman