Received: by 2002:a05:7412:b10a:b0:f3:1519:9f41 with SMTP id az10csp722694rdb; Thu, 30 Nov 2023 17:19:15 -0800 (PST) X-Google-Smtp-Source: AGHT+IFCSO/BRuMnH5n8mdyspr2O8uu8v5tPYkI/MyGqI8eFiJv6YdydKTtLDxGP8GhCMA2+enk2 X-Received: by 2002:a05:6a20:748d:b0:18b:8bf4:6b4a with SMTP id p13-20020a056a20748d00b0018b8bf46b4amr25988650pzd.0.1701393555433; Thu, 30 Nov 2023 17:19:15 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701393555; cv=none; d=google.com; s=arc-20160816; b=Q8kiWUQDDf/tYX2hfkNiOg4BsPtJV2I82O1vriX9LhfslF38RfEsi/J/yJLXIKdfw5 Cu0JjVqnouTIxLwikNsTvM0OZhIKQ95oZPxQTygvt7OAqf0TeCExYqK8lvYBemCZ7Tqh B/LWk5PeUOOj6Y5qkd0yfzWa5R5CUGA2w9+xzmwzG3dHXZUNzhGs92cWsLxGsJy2MJab QfWI13wObSB7ugmDMnoMn2oLqhfG6Jc/XG0KNWhwfLRNOyPcB6qOlfCDfQ+y+LjHjMTe EPxxhvFQtVquAon9q5AUtXC2906aihL6bB5S/pEhyzmyI3lZ6he9FKhlJM8aQfrbzIdS emAg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=g/48m46wPgyvqnD4XpgC0rhUYtIZ3FOXv9eiG/yAP/4=; fh=GrgmIMhS/yeEAocG9eUXioRVKicnmvj2rvnbI0khTUA=; b=G2ZaOvfz3bqAEivlqTebXjXWwxSTlJf0X8viaiz1v53nhYOU8IMX340pw6hQYRd/6X 26K0uo4ppFs5IxCnXmcUAbDiwIdTLQDGDJ3LzUUWiBXLn5laNKQ4SPK7tyv2sxRR2ej8 0h9AuQDD6vOPOG2QdDMU6SpIswWOBQkuzly9XjOxDSYqZt4+Zx+3N40W2I0ISX4rcyyt kHp+s0Ro1J12KgYBRRUgCaHJQmVtRdp8o6l4d+q5XyNtES0pkZ0gbD52bolcM84hFnyi YTOha/8ivZ9UjCKwYmuOb4ZWJmLsq7bvME0FZhDCnEAUxhElSUQ43A7Zirie84f6ut+s 5kdQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=vquECw7L; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=fromorbit.com Return-Path: Received: from pete.vger.email (pete.vger.email. [2620:137:e000::3:6]) by mx.google.com with ESMTPS id q7-20020a634307000000b005acb88d8c15si2449230pga.386.2023.11.30.17.19.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Nov 2023 17:19:15 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) client-ip=2620:137:e000::3:6; Authentication-Results: mx.google.com; dkim=pass header.i=@fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=vquECw7L; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=fromorbit.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by pete.vger.email (Postfix) with ESMTP id 0B21582E15B8; Thu, 30 Nov 2023 17:19:12 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at pete.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229523AbjLABSn (ORCPT + 99 others); Thu, 30 Nov 2023 20:18:43 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32986 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229448AbjLABSl (ORCPT ); Thu, 30 Nov 2023 20:18:41 -0500 Received: from mail-pg1-x535.google.com (mail-pg1-x535.google.com [IPv6:2607:f8b0:4864:20::535]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CC22610D1 for ; Thu, 30 Nov 2023 17:18:47 -0800 (PST) Received: by mail-pg1-x535.google.com with SMTP id 41be03b00d2f7-5c6001ec920so29479a12.0 for ; Thu, 30 Nov 2023 17:18:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1701393527; x=1701998327; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=g/48m46wPgyvqnD4XpgC0rhUYtIZ3FOXv9eiG/yAP/4=; b=vquECw7LY7U9wZ9FGmu3KeuJE3PL9M7XK5Vv2/ry21FXNE0FxvzdJtqqbmV/CLd6RX dnphUPkzvzOZyUAyUUWtS/+idD1AdtVS75iGTTLGGMmS7I90zYd/nqC8xnzJWwhaosG+ qaM2m187eZZlR2S/uYpewt09kuenGgBjILN7rN80LMoVJlQqV/W9WI+3+40Jn8mFKotL p+HnBWf8F4gE5+KWBHfPF8fLesSXlFR3YGrTZ/yNEREG9pQB3rssM58qWejN4mSM3Kkw BxfD+2uKTfTAUw3y+0z1hljulAnkgGaAz2rSi1a46kDgVoA5Hbf15EnEsacP6IPyTbO+ zqCw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701393527; x=1701998327; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=g/48m46wPgyvqnD4XpgC0rhUYtIZ3FOXv9eiG/yAP/4=; b=QnDl2yNIuSKkQ5uztWgio8FnKrHPNodreHlrRzAt9HdOH6nkbtI79qhS5EGkUYJ757 VbPmh6jlAZXobFJm3iJ+u/FSAQ383386yzcMQYulKIK9d1sZFv0hYRshZ3r1aYVRaiDj d4WNBKZvR0vwz0aMhmTbaynj8/okHpgOH69Bc0cHMDYXtCOkAWwBBtLdxj9I4T0jHeJM qACDvNjHgFTRrbg+sbeRvRwyna1xSIrL/795Av1Nu2Gz4MovEpOiHxlVvgSHG7jYNm3Q g2tf4cUEgfb9k7pGwLSPMzI/Lcu+LnbZ3x3/p//rGvbZUmnLaOzGII8XNAtwLx+Wq+Nl aYUw== X-Gm-Message-State: AOJu0Yz/IsV0VdDKaf78peFmYDjEKyVav3NIVYOa05Vc+NYwjl8Kifc3 Z1NssZF5T6RuQD4OLhM5B0cldw== X-Received: by 2002:a05:6a20:daaa:b0:187:c662:9b7e with SMTP id iy42-20020a056a20daaa00b00187c6629b7emr24458515pzb.25.1701393527121; Thu, 30 Nov 2023 17:18:47 -0800 (PST) Received: from dread.disaster.area (pa49-180-125-5.pa.nsw.optusnet.com.au. [49.180.125.5]) by smtp.gmail.com with ESMTPSA id hq23-20020a056a00681700b006cddd9d0174sm1843768pfb.108.2023.11.30.17.18.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Nov 2023 17:18:46 -0800 (PST) Received: from dave by dread.disaster.area with local (Exim 4.96) (envelope-from ) id 1r8sB2-002C0s-0T; Fri, 01 Dec 2023 12:18:44 +1100 Date: Fri, 1 Dec 2023 12:18:44 +1100 From: Dave Chinner To: Roman Gushchin Cc: Kent Overstreet , Qi Zheng , Michal Hocko , Muchun Song , Linux-MM , linux-kernel@vger.kernel.org, Andrew Morton Subject: Re: [PATCH 2/7] mm: shrinker: Add a .to_text() method for shrinkers Message-ID: References: <20231125003009.tbaxuquny43uwei3@moria.home.lan> <76A1EE85-B62C-49B3-889C-80F9A2A88040@linux.dev> <20231128035345.5c7yc7jnautjpfoc@moria.home.lan> <20231129231147.7msiocerq7phxnyu@moria.home.lan> <04f63966-af72-43ef-a65c-ff927064a3e4@bytedance.com> <20231130032149.ynap4ai47dj62fy3@moria.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on pete.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (pete.vger.email [0.0.0.0]); Thu, 30 Nov 2023 17:19:12 -0800 (PST) On Thu, Nov 30, 2023 at 11:01:23AM -0800, Roman Gushchin wrote: > On Wed, Nov 29, 2023 at 10:21:49PM -0500, Kent Overstreet wrote: > > On Thu, Nov 30, 2023 at 11:09:42AM +0800, Qi Zheng wrote: > > > For non-bcachefs developers, who knows what those statistics mean? For non-mm developers, who knows what those internal mm state statistics mean? IOWs, a non-mm developer goes and asks a mm developer to help them decipher the output to determine what to do next. So why can't a mm developer go an ask a subsystem developer to tell them what the shrinker oom-kill output means? Such a question is a demonstration of an unconscious bias that prioritises internal mm stuff as far more important than what anyone else outside core-mm might ever need... > > > You can use BPF or drgn to traverse in advance to get the address of the > > > bcachefs shrinker structure, and then during OOM, find the bcachefs > > > private structure through the shrinker->private_data member, and then > > > dump the bcachefs private data. Is there any problem with this? > > > > No, BPF is not an excuse for improving our OOM/allocation failure > > reports. BPF/tracing are secondary tools; whenever we're logging > > information about a problem we should strive to log enough information > > to debug the issue. > > Ok, a simple question then: > why can't you dump /proc/slabinfo after the OOM? Taken to it's logical conclusion, we arrive at: OOM-kill doesn't need to output anything at all except for what it killed because we can dump /proc/{mem,zone,vmalloc,buddy,slab}info after the OOM.... As it is, even asking such a question shows that you haven't looked at the OOM kill output for a long time - it already reports the slab cache usage information for caches that are reclaimable. That is, if too much accounted slab cache based memory consumption is detected at OOM-kill, it will calldump_unreclaimable_slab() to dump all the SLAB_RECLAIM_ACCOUNT caches (i.e. those with shrinkers) to the console as part of the OOM-kill output. The problem Kent is trying to address is that this output *isn't sufficient to debug shrinker based memory reclaim issues*. It hasn't been for a long time, and so we've all got our own special debug patches and methods for checking that shrinkers are doing what they are supposed to. Kent is trying to formalise one of the more useful general methods for exposing that internal information when OOM occurs... Indeed, I can think of several uses for a shrinker->to_text() output that we simply cannot do right now. Any shrinker that does garbage collection on something that is not a pure slab cache (e.g. xfs buffer cache, xfs inode gc subsystem, graphics memory allocators, binder, etc) has no visibility of the actuall memory being used by the subsystem in the OOM-kill output. This information isn't in /proc/slabinfo, it's not accounted by a SLAB_RECLAIM_ACCOUNT cache, and it's not accounted by anything in the core mm statistics. e.g. How does anyone other than a XFS expert know that the 500k of active xfs_buf handles in the slab cache actually pins 15GB of cached metadata allocated directly from the page allocator, not just the 150MB of slab cache the handles take up? Another example is that an inode can pin lots of heap memory (e.g. for in-memory extent lists) and that may not be freeable until the inode is reclaimed. So while the slab cache might not be excesively large, we might have an a million inodes with a billion cumulative extents cached in memory and it is the heap memory consumed by the cached extents that is consuming the 30GB of "missing" kernel memory that is causing OOM-kills to occur. How is a user or developer supposed to know when one of these situations has occurred given the current lack of memory usage introspection into subsystems? These are the sorts of situations that shrinker->to_text() would allow us to enumerate when it is necessary (i.e. at OOM-kill). At any other time, it just doesn't matter, but when we're at OOM having a mechanism to report somewhat accurate subsystem memory consumption would be very useful indeed. > Unlike anon memory, slab memory (fs caches in particular) should not be heavily > affected by killing some userspace task. Whether tasks get killed or not is completely irrelevant. The issue is that not all memory that is reclaimed by shrinkers is either pure slab cache memory or directly accounted as reclaimable to the mm subsystem.... -Dave. -- Dave Chinner david@fromorbit.com