Received: by 2002:a05:7412:b10a:b0:f3:1519:9f41 with SMTP id az10csp48938rdb; Wed, 29 Nov 2023 19:43:36 -0800 (PST) X-Google-Smtp-Source: AGHT+IGAoYtBT9YggBnaNysDvzw2sBimR03tgnQrMX9zEjii/Po3eNk7kPtWHZnYdSDYJjJn843F X-Received: by 2002:a05:6a20:7d91:b0:18c:ae67:c1e8 with SMTP id v17-20020a056a207d9100b0018cae67c1e8mr12766181pzj.24.1701315816367; Wed, 29 Nov 2023 19:43:36 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701315816; cv=none; d=google.com; s=arc-20160816; b=IijhN9BFbZ1p7Aes95BmbLjBriTmvJ9PcWGyXQ892SYgkNRrPjVEyq8RjZuttyCXEx wla8GwNYNy5dO8LquT/sL9rWMINhiip7s9h1HDjQdOZ//H+NqpFW2elkSx6940jAHhhL LAFqvMHADCu0857u0WWlYd4hZK9iqs9AehhM4h60jHT4iXFj0FlKa2hMMwsEvIuIU0zd ZFuBi6H4CvGxXoSfWHoFW2S6pWiPJ983jGdMY72bS3EKYjvZPOK5I5Xbcil8lkKgdUi4 qGZVSNVm7xdSlHlmdezXJeZ6cpuFw6OYGwthihp66fD9jWlEQXG4sEtvzBhSvBzOUgQf xNcQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=rha3cUxUxmHIZK8eWzBvLAMSEva4ikfr6lW4zTpcG+g=; fh=WCAwkoRUa11fd+yfyLEJu9CQOva/h6nHoAjX8rrQfKQ=; b=jk+s7csKr7M0sswOYWbdyWDF0oXjQQcjB2TOzQaQp3/3tKLPnncMzoZl5SH8ZCfIsR u5gL9PmZ0K7K6h06F5c1T4e+fCUjnOMU7xQS0VY+D5iiXOs2UDU//PMYn1E023zS+7Pp j41/NKaMlAd3N4iC/iiBIog3OlK0T9uu3mgYMVS2o1WveYg/IW9Gj4Mue8nV6LHxcDwd qRJ/TFuVds6kdNrDcfo2jQ9yCPUh2NY0mk0Vt8/1XlIJ526bLGV5gpxTJe9gVfV5ZbCs inbXUbD5DHiHwCox+lpXbAcPsjQ15liLoeOCg753VZldoAFOPk/OSBsxZtSFBg9kPc0h r1fQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=KMScnz31; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.31 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Return-Path: Received: from morse.vger.email (morse.vger.email. [23.128.96.31]) by mx.google.com with ESMTPS id w14-20020a170902e88e00b001c446b59c8dsi278817plg.271.2023.11.29.19.43.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 29 Nov 2023 19:43:36 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.31 as permitted sender) client-ip=23.128.96.31; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=KMScnz31; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.31 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by morse.vger.email (Postfix) with ESMTP id E56858042701; Wed, 29 Nov 2023 19:43:33 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at morse.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231639AbjK3DnL (ORCPT + 99 others); Wed, 29 Nov 2023 22:43:11 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49552 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231447AbjK3DnL (ORCPT ); Wed, 29 Nov 2023 22:43:11 -0500 Received: from mail-pl1-x62b.google.com (mail-pl1-x62b.google.com [IPv6:2607:f8b0:4864:20::62b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A0AF694 for ; Wed, 29 Nov 2023 19:42:52 -0800 (PST) Received: by mail-pl1-x62b.google.com with SMTP id d9443c01a7336-1cfa7be5650so1426705ad.0 for ; Wed, 29 Nov 2023 19:42:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1701315772; x=1701920572; darn=vger.kernel.org; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=rha3cUxUxmHIZK8eWzBvLAMSEva4ikfr6lW4zTpcG+g=; b=KMScnz31fpAoQLbYnzWXTxKBFgZZ9/RDksyRor49YQAVxXFpEq0RsGRkkvWkK/zCIs L/C0KWU8aGhtOCeVr4ASaNYTKoapr0NKr3LnXH0mW8Miw/eBBojMQYM+7rPyufwtmzX7 BUcHPtUJ9V7C3Cf5nORFcmX/X3BIBVL8yqWWyWCTo0osrAvRbYUgdVXpB99c0jEH9WU1 dCcglcEUyMuBTBZxYHIUaLYc0RqedDUexdmAJegLR4DkF643ElIxGQ+hH8nRhpvfqQXD eyXhr+/cM1hfze8GXaCtkz+G6ScXV6RxxEteXzJ1vf857FbnUxo0P54SLAXlaqYFH0qI HBFQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701315772; x=1701920572; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=rha3cUxUxmHIZK8eWzBvLAMSEva4ikfr6lW4zTpcG+g=; b=aQ55VLywsCrwm1+hvn5KpQETVqr1PG/hNMZmqzc3MSuekz8qkBgIVo38tIlcJ0BMcO XBahmH65iAqcDLWMmuukndu6zjaY2vMeb/tv4u09Zum2y0s+yDYDMwPJVnZnh+/TVgcJ pEKomct27SY30XfMJ0Y3vN46EGLAyr7BJrVebZ3PHLZRqbJzAChHE3Fn2wTjORt8BiUt Y5gjsuCKlpCie5xw3PI9ClD+dfCyk8dEi11lhdI6SLco894VLBZ46T9eZEey0CeYfoKA yW7TFeOixiw1XvgZ5PI2aPUdrP9G6/Ef3gNVLXTJN5k5LKMpkNyZW8t/RN17TQyhP2o/ wh5A== X-Gm-Message-State: AOJu0YwnwgOibHNQTeDOpQTI9Tw3oFmJ3CsHT+Pmf83NjOCU3O4lATts pCupNKxEWXBfzGj+YmfciVCEtQ== X-Received: by 2002:a17:902:d4cb:b0:1cf:70a2:c26d with SMTP id o11-20020a170902d4cb00b001cf70a2c26dmr24549436plg.5.1701315771989; Wed, 29 Nov 2023 19:42:51 -0800 (PST) Received: from [10.84.152.29] ([203.208.167.146]) by smtp.gmail.com with ESMTPSA id y8-20020a17090322c800b001bc6e6069a6sm146856plg.122.2023.11.29.19.42.48 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 29 Nov 2023 19:42:51 -0800 (PST) Message-ID: <6f56c8f4-77e3-4ad7-a5f8-a6235b047137@bytedance.com> Date: Thu, 30 Nov 2023 11:42:45 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 2/7] mm: shrinker: Add a .to_text() method for shrinkers Content-Language: en-US To: Kent Overstreet Cc: Michal Hocko , Roman Gushchin , Muchun Song , Linux-MM , linux-kernel@vger.kernel.org, Andrew Morton , Dave Chinner References: <20231123212411.s6r5ekvkklvhwfra@moria.home.lan> <4caadff7-1df0-45cc-9d43-e616f9e4ddb3@bytedance.com> <20231125003009.tbaxuquny43uwei3@moria.home.lan> <76A1EE85-B62C-49B3-889C-80F9A2A88040@linux.dev> <20231128035345.5c7yc7jnautjpfoc@moria.home.lan> <20231129231147.7msiocerq7phxnyu@moria.home.lan> <04f63966-af72-43ef-a65c-ff927064a3e4@bytedance.com> <20231130032149.ynap4ai47dj62fy3@moria.home.lan> From: Qi Zheng In-Reply-To: <20231130032149.ynap4ai47dj62fy3@moria.home.lan> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on morse.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (morse.vger.email [0.0.0.0]); Wed, 29 Nov 2023 19:43:34 -0800 (PST) On 2023/11/30 11:21, Kent Overstreet wrote: > On Thu, Nov 30, 2023 at 11:09:42AM +0800, Qi Zheng wrote: >> >> >> On 2023/11/30 07:11, Kent Overstreet wrote: >>> On Wed, Nov 29, 2023 at 10:14:54AM +0100, Michal Hocko wrote: >>>> On Tue 28-11-23 16:34:35, Roman Gushchin wrote: >>>>> On Tue, Nov 28, 2023 at 02:23:36PM +0800, Qi Zheng wrote: >>>> [...] >>>>>> Now I think adding this method might not be a good idea. If we allow >>>>>> shrinkers to report thier own private information, OOM logs may become >>>>>> cluttered. Most people only care about some general information when >>>>>> troubleshooting OOM problem, but not the private information of a >>>>>> shrinker. >>>>> >>>>> I agree with that. >>>>> >>>>> It seems that the feature is mostly useful for kernel developers and it's easily >>>>> achievable by attaching a bpf program to the oom handler. If it requires a bit >>>>> of work on the bpf side, we can do that instead, but probably not. And this >>>>> solution can potentially provide way more information in a more flexible way. >>>>> >>>>> So I'm not convinced it's a good idea to make the generic oom handling code >>>>> more complicated and fragile for everybody, as well as making oom reports differ >>>>> more between kernel versions and configurations. >>>> >>>> Completely agreed! From my many years of experience of oom reports >>>> analysing from production systems I would conclude the following categories >>>> - clear runaways (and/or memory leaks) >>>> - userspace consumers - either shmem or anonymous memory >>>> predominantly consumes the memory, swap is either depleted >>>> or not configured. >>>> OOM report is usually useful to pinpoint those as we >>>> have required counters available >>>> - kernel memory consumers - if we are lucky they are >>>> using slab allocator and unreclaimable slab is a huge >>>> part of the memory consumption. If this is a page >>>> allocator user the oom repport only helps to deduce >>>> the fact by looking at how much user + slab + page >>>> table etc. form. But identifying the root cause is >>>> close to impossible without something like page_owner >>>> or a crash dump. >>>> - misbehaving memory reclaim >>>> - minority of issues and the oom report is usually >>>> insufficient to drill down to the root cause. If the >>>> problem is reproducible then collecting vmstat data >>>> can give a much better clue. >>>> - high number of slab reclaimable objects or free swap >>>> are good indicators. Shrinkers data could be >>>> potentially helpful in the slab case but I really have >>>> hard time to remember any such situation. >>>> On non-production systems the situation is quite different. I can see >>>> how it could be very beneficial to add a very specific debugging data >>>> for subsystem/shrinker which is developed and could cause the OOM. For >>>> that purpose the proposed scheme is rather inflexible AFAICS. >>> >>> Considering that you're an MM guy, and that shrinkers are pretty much >>> universally used by _filesystem_ people - I'm not sure your experience >>> is the most relevant here? >>> >>> The general attitude I've been seeing in this thread has been one of >>> dismissiveness towards filesystem people. Roman too; back when he was >> >> Oh, please don't say that, it seems like you are the only one causing >> the fight. We deeply respect the opinions of file system developers, so >> I invited Dave to this thread from the beginning. And you didn’t CC >> linux-fsdevel@vger.kernel.org yourself. >> >>> working on his shrinker debug feature I reached out to him, explained >>> that I was working on my own, and asked about collaborating - got >>> crickets in response... >>> >>> Hmm.. >>> >>> Besides that, I haven't seen anything what-so-ever out of you guys to >>> make our lives easier, regarding OOM debugging, nor do you guys even >>> seem interested in the needs and perspectives of the filesytem people. >>> Roman, your feature didn't help one bit for OOM debuging - didn't even >>> come with documentation or hints as to what it's for. >>> >>> BPF? Please. >> >> (Disclaimer, no intention to start a fight, here are some objective >> views.) >> >> Why not? In addition to printk, there are many good debugging tools >> worth trying, such as BPF related tools, drgn, etc. >> >> For non-bcachefs developers, who knows what those statistics mean? >> >> You can use BPF or drgn to traverse in advance to get the address of the >> bcachefs shrinker structure, and then during OOM, find the bcachefs >> private structure through the shrinker->private_data member, and then >> dump the bcachefs private data. Is there any problem with this? > > No, BPF is not an excuse for improving our OOM/allocation failure > reports. BPF/tracing are secondary tools; whenever we're logging > information about a problem we should strive to log enough information > to debug the issue. > > We've got junk in there we don't need: as mentioned before, there's no > need to be dumping information on _every_ slab, we can pick the ones > using the most memory and show those. > > Similarly for shrinkers, we're not going to be printing all of them - > the patchset picks the top 10 by objects and prints those. Could > probably be ~4, there's fewer shrinkers than slabs; also if we can get > shrinkers to report on memory owned in bytes, that will help too with > deciding what information is pertinent. I'm not worried about the shrinker's general data. What I'm worried about is the shrinker's private data. Except for the corresponding developers, others don't know the meaning of the private statistical data, and we have no control over the printing quantity and form of the private data. This may indeed cause OOM log confusion and failure to automatically parse. For this, any thoughts? > > That's not a huge amount of information to be dumping, and to make it > easier to debug something that has historically been a major pain point. > > There's a lot more that could be done to make our OOM reports more > readable and useful to non-mm developers. Unfortunately, any time > changing the show_mem report the immediate reaction seems to be "but > that will break my log parsing/change what I'm used to!"...