Received: by 2002:a05:7412:b10a:b0:f3:1519:9f41 with SMTP id az10csp42602rdb; Wed, 29 Nov 2023 19:22:18 -0800 (PST) X-Google-Smtp-Source: AGHT+IEOHpqWe3u74XblsMmo+ogHEgXxqUlOs9xAFOLL1Bxd7lGVuoFvvX4NQt9PVCdEufat5ZBv X-Received: by 2002:a17:902:ea08:b0:1d0:1562:7791 with SMTP id s8-20020a170902ea0800b001d015627791mr5099363plg.11.1701314538241; Wed, 29 Nov 2023 19:22:18 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701314538; cv=none; d=google.com; s=arc-20160816; b=sSyL9tFUeBsQuspR9erdEPjB/B+c0CPdovDfnRasu4N/0ZJfoU94PYLsoKCuNMXAVx bMr99xlfzvBZQNrP9Og3pUWbPOf/xlAWtLIMx8zUXvsTEIKqvBSpKZILbiRocermfQII kJzwhwOKT8YGs6ohDKSyzlMKFWVUeS7BLuOLgNgpoRYKhGoR0sp4RF/cdF1n43pC/B7Y C1mhKRyvO0f2qAQpFd9dKwA7lsUcr1lb4K+9taT6eUn//4J7lMIfArVuSIOZKgOtjknc +usKJaJvI0GqZWT2G4AZaVMZAcwoBhkZt6tVYy0eMucUIZRHd4kzWNM+U1N//NPgfoEq Sm1g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:dkim-signature:date; bh=Czmn18gjTSfWVrvUGKkcFDp26hQvPKQD/bldcWkParM=; fh=yWDSwjeHQrAUKiyrdMntL9BlkHYsX7yuEczfrnGkK+0=; b=ZvXSMNFB97qJBjb494zbERbQT68M5kJVIO4ID8xcn3sEmjfw/gPA6SymEAR4tZ1p0T sSsg4P7mS8bqXbK/LFwpT6GjCo2gSHcs/yyQd1wHP5kUw0m77PokmX9KMmOfKroiHzkQ FFarVKXWStjLkgHOHiWbIl5mtNeB5t/pvnkIaJuUOMSsnZuqtfhoc9jt9z+8lUPugShc 8QbGl9Zkrf6B/bdfxjLeWjxiMv1U5ox3bWuG0YoEcKmhdAK/PzhqfYeLFBZqxSI2cO6h TwbU4CjOrqqXxzU2/2L3mkWDG7ngBuxNBMIYUBubshmSw4j0JJN5VQnpP/bMra+00ZGd FkvQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b=fUgDpF8g; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Return-Path: Received: from groat.vger.email (groat.vger.email. [23.128.96.35]) by mx.google.com with ESMTPS id c10-20020a170902724a00b001cfdb31375fsi212736pll.368.2023.11.29.19.22.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 29 Nov 2023 19:22:18 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) client-ip=23.128.96.35; Authentication-Results: mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b=fUgDpF8g; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id D96BF80443AD; Wed, 29 Nov 2023 19:22:14 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231639AbjK3DVy (ORCPT + 99 others); Wed, 29 Nov 2023 22:21:54 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41552 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230393AbjK3DVw (ORCPT ); Wed, 29 Nov 2023 22:21:52 -0500 Received: from out-179.mta0.migadu.com (out-179.mta0.migadu.com [IPv6:2001:41d0:1004:224b::b3]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 71BD2C4 for ; Wed, 29 Nov 2023 19:21:57 -0800 (PST) Date: Wed, 29 Nov 2023 22:21:49 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1701314514; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Czmn18gjTSfWVrvUGKkcFDp26hQvPKQD/bldcWkParM=; b=fUgDpF8gwjCohOrHWhracPL13JiJvWhq2tX5EVWyxGYN+FO8FaI/C6yOY97/y4c9Ud67QF aEUgxHzMtYe2uFDpNfPAzTG64JlQSDyrRri6UkXTfMdA2KOq3VZKcYgZt15rJTH+lMDYXO PymIfltyDoAY22RtOQMx8rzpIOLoaZE= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Kent Overstreet To: Qi Zheng Cc: Michal Hocko , Roman Gushchin , Muchun Song , Linux-MM , linux-kernel@vger.kernel.org, Andrew Morton , Dave Chinner Subject: Re: [PATCH 2/7] mm: shrinker: Add a .to_text() method for shrinkers Message-ID: <20231130032149.ynap4ai47dj62fy3@moria.home.lan> References: <20231123212411.s6r5ekvkklvhwfra@moria.home.lan> <4caadff7-1df0-45cc-9d43-e616f9e4ddb3@bytedance.com> <20231125003009.tbaxuquny43uwei3@moria.home.lan> <76A1EE85-B62C-49B3-889C-80F9A2A88040@linux.dev> <20231128035345.5c7yc7jnautjpfoc@moria.home.lan> <20231129231147.7msiocerq7phxnyu@moria.home.lan> <04f63966-af72-43ef-a65c-ff927064a3e4@bytedance.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <04f63966-af72-43ef-a65c-ff927064a3e4@bytedance.com> X-Migadu-Flow: FLOW_OUT X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Wed, 29 Nov 2023 19:22:15 -0800 (PST) On Thu, Nov 30, 2023 at 11:09:42AM +0800, Qi Zheng wrote: > > > On 2023/11/30 07:11, Kent Overstreet wrote: > > On Wed, Nov 29, 2023 at 10:14:54AM +0100, Michal Hocko wrote: > > > On Tue 28-11-23 16:34:35, Roman Gushchin wrote: > > > > On Tue, Nov 28, 2023 at 02:23:36PM +0800, Qi Zheng wrote: > > > [...] > > > > > Now I think adding this method might not be a good idea. If we allow > > > > > shrinkers to report thier own private information, OOM logs may become > > > > > cluttered. Most people only care about some general information when > > > > > troubleshooting OOM problem, but not the private information of a > > > > > shrinker. > > > > > > > > I agree with that. > > > > > > > > It seems that the feature is mostly useful for kernel developers and it's easily > > > > achievable by attaching a bpf program to the oom handler. If it requires a bit > > > > of work on the bpf side, we can do that instead, but probably not. And this > > > > solution can potentially provide way more information in a more flexible way. > > > > > > > > So I'm not convinced it's a good idea to make the generic oom handling code > > > > more complicated and fragile for everybody, as well as making oom reports differ > > > > more between kernel versions and configurations. > > > > > > Completely agreed! From my many years of experience of oom reports > > > analysing from production systems I would conclude the following categories > > > - clear runaways (and/or memory leaks) > > > - userspace consumers - either shmem or anonymous memory > > > predominantly consumes the memory, swap is either depleted > > > or not configured. > > > OOM report is usually useful to pinpoint those as we > > > have required counters available > > > - kernel memory consumers - if we are lucky they are > > > using slab allocator and unreclaimable slab is a huge > > > part of the memory consumption. If this is a page > > > allocator user the oom repport only helps to deduce > > > the fact by looking at how much user + slab + page > > > table etc. form. But identifying the root cause is > > > close to impossible without something like page_owner > > > or a crash dump. > > > - misbehaving memory reclaim > > > - minority of issues and the oom report is usually > > > insufficient to drill down to the root cause. If the > > > problem is reproducible then collecting vmstat data > > > can give a much better clue. > > > - high number of slab reclaimable objects or free swap > > > are good indicators. Shrinkers data could be > > > potentially helpful in the slab case but I really have > > > hard time to remember any such situation. > > > On non-production systems the situation is quite different. I can see > > > how it could be very beneficial to add a very specific debugging data > > > for subsystem/shrinker which is developed and could cause the OOM. For > > > that purpose the proposed scheme is rather inflexible AFAICS. > > > > Considering that you're an MM guy, and that shrinkers are pretty much > > universally used by _filesystem_ people - I'm not sure your experience > > is the most relevant here? > > > > The general attitude I've been seeing in this thread has been one of > > dismissiveness towards filesystem people. Roman too; back when he was > > Oh, please don't say that, it seems like you are the only one causing > the fight. We deeply respect the opinions of file system developers, so > I invited Dave to this thread from the beginning. And you didn’t CC > linux-fsdevel@vger.kernel.org yourself. > > > working on his shrinker debug feature I reached out to him, explained > > that I was working on my own, and asked about collaborating - got > > crickets in response... > > > > Hmm.. > > > > Besides that, I haven't seen anything what-so-ever out of you guys to > > make our lives easier, regarding OOM debugging, nor do you guys even > > seem interested in the needs and perspectives of the filesytem people. > > Roman, your feature didn't help one bit for OOM debuging - didn't even > > come with documentation or hints as to what it's for. > > > > BPF? Please. > > (Disclaimer, no intention to start a fight, here are some objective > views.) > > Why not? In addition to printk, there are many good debugging tools > worth trying, such as BPF related tools, drgn, etc. > > For non-bcachefs developers, who knows what those statistics mean? > > You can use BPF or drgn to traverse in advance to get the address of the > bcachefs shrinker structure, and then during OOM, find the bcachefs > private structure through the shrinker->private_data member, and then > dump the bcachefs private data. Is there any problem with this? No, BPF is not an excuse for improving our OOM/allocation failure reports. BPF/tracing are secondary tools; whenever we're logging information about a problem we should strive to log enough information to debug the issue. We've got junk in there we don't need: as mentioned before, there's no need to be dumping information on _every_ slab, we can pick the ones using the most memory and show those. Similarly for shrinkers, we're not going to be printing all of them - the patchset picks the top 10 by objects and prints those. Could probably be ~4, there's fewer shrinkers than slabs; also if we can get shrinkers to report on memory owned in bytes, that will help too with deciding what information is pertinent. That's not a huge amount of information to be dumping, and to make it easier to debug something that has historically been a major pain point. There's a lot more that could be done to make our OOM reports more readable and useful to non-mm developers. Unfortunately, any time changing the show_mem report the immediate reaction seems to be "but that will break my log parsing/change what I'm used to!"...