Received: by 2002:a05:6602:2086:0:0:0:0 with SMTP id a6csp4391428ioa; Wed, 27 Apr 2022 02:49:34 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyRZILBlBKaap/HRNQizCwY0fI10UYZ3joZk3r6sCTdyNWhkCEIsqt5XtMypUPm60IDh2mX X-Received: by 2002:a17:902:9f97:b0:15d:1b87:6164 with SMTP id g23-20020a1709029f9700b0015d1b876164mr13463261plq.71.1651052974748; Wed, 27 Apr 2022 02:49:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1651052974; cv=none; d=google.com; s=arc-20160816; b=IFmVpahoOaKIGactJdRGJbOFYaPWCdViK/iDkr+7h++YVVqsBBRsZprorUEf4IulYZ dfeI3IwtCLPXQjYebe2hd1B1jp19jcM0f4SVMG601p753cr/+kqKFh6PbSYYW+En0Jge i36/+1H9YnPWJyAgygWNmgYMx2TmiHqphP66UKYGYVhFABr4AXpM3xBulNN8mFjHjLqF yLQYIc3Ii87vGiHESadrKaNTflnvF8RlDrhpR+wuANgimOuA62pdezUO9/7eB30RmyCW 3nN18B7t51rufy8uHDUyrBZIjFYs6+mQocg8nnBcTONl4DUu/+n2KDq2uA/9JMBGoHoz gaqw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:dkim-signature:date; bh=aKjMvPM+2MJ2yRO8Ts8TLANuj83FMCxVybjtRrJhsDw=; b=TKjcgRr4IFafmDYYJKvcO/fmuRdpMPEns63+7Zf1SxqJBXRvCJqVbvSY0s2LfyXyBR 4zIwcDMHel29ebL5t5EjpF2rWPzGnm865xBp9IrjRSRlUpdPuQGDbnP4jVvb3FolRbpS fxp8lZiVDHRd9UErFylJjIp5ZoQusERQGElBYFDErNYMiuqi19wMl2D1oqdwkGoOOr1C +cdNe/Or+36Cn888XxUVcBpaGuA2Xkjuxzn0vSr9QxZRWzQKychVWf6Uiyhs6L9ESiu+ v/Qc0rxSXlhQofYEerejMxB180rVfVXccdfEE0/oQo/nR0g4LbqngXpkuGXDEM77hhuK Q9cg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b=OBTgeN3l; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id s127-20020a637785000000b00383f92e1f96si989353pgc.79.2022.04.27.02.49.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Apr 2022 02:49:34 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b=OBTgeN3l; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Received: from out1.vger.email (out1.vger.email [IPv6:2620:137:e000::1:20]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 990DC1F0437; Wed, 27 Apr 2022 02:22:21 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1357136AbiD0CWP (ORCPT + 99 others); Tue, 26 Apr 2022 22:22:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59714 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1352940AbiD0CWN (ORCPT ); Tue, 26 Apr 2022 22:22:13 -0400 Received: from out2.migadu.com (out2.migadu.com [188.165.223.204]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 665D081191 for ; Tue, 26 Apr 2022 19:19:02 -0700 (PDT) Date: Tue, 26 Apr 2022 19:18:53 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1651025940; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=aKjMvPM+2MJ2yRO8Ts8TLANuj83FMCxVybjtRrJhsDw=; b=OBTgeN3lChLzxmMfe4teyWHnkTWxurZczcrbXhakAoxuwu0Dp8CYE/eOIPWEE9rYYndeHM zHg1uAH9G8HSBO5U1KhcGIemGGBiahWklJBS4q86wiTWQ+2tV248ivfCkUSy1UaUOaYX/2 JtmsAyowfVd2PozU/7V3CBiWosXJ9vw= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Roman Gushchin To: Dave Chinner Cc: Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Yang Shi , Kent Overstreet , Hillf Danton Subject: Re: [PATCH v2 0/7] mm: introduce shrinker debugfs interface Message-ID: References: <20220422202644.799732-1-roman.gushchin@linux.dev> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Migadu-Flow: FLOW_OUT X-Migadu-Auth-User: linux.dev X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Apr 27, 2022 at 11:22:55AM +1000, Dave Chinner wrote: > On Tue, Apr 26, 2022 at 09:41:34AM -0700, Roman Gushchin wrote: > > Can you, please, summarize your position, because it's a bit unclear. > > You made a lot of good points about some details (e.g. shrinkers naming, > > and I totally agree there; machines with hundreds of nodes etc), then > > you said the active scanning is useless and then said the whole thing > > is useless and we're fine with what we have regarding shrinkers debugging. > > Better introspection the first thing we need. Work on improving > that. I've been making suggestions to help improve introspection > infrastructure. > > Before anything else, we need to improve introspection so we can > gain better insight into the problems we have. Once we understand > the problems better and have evidence to back up where the problems > lie and we have a plan to solve them, then we can talk about whether > we need other user accessible shrinker APIs. Ok, at least we do agree here. This is exactly why I've started with this debugfs stuff. > > For the moment, exposing shrinker control interfaces to userspace > could potentially be very bad because it exposes internal > architectural and implementation details to a user API. Just > because it is in /sys/kernel/debug it doesn't mean applications > won't start to use it and build dependencies on it. > > That doesn't mean I'm opposed to exposing a shrinker control > mechanism to debugfs - I'm still on the fence on that one. However, > I definitely think that an API that directly exposes the internal > implementation to userspace is the wrong way to go about this. Ok, if it's about having memcg-aware and other interfaces, I can agree here as well. I actually made an attempt to unify memcg-aware and system-wide shrinker scanning, not very successful yet, but it's definitely on my todo list. I'm pretty sure we're iterating over and over some empty root-level shrinkers without benefiting the bitmap infrastructure which works for memory cgroups. > > Fine grained shrinker control is not necessary to improve shrinker > introspection and OOM debugging capability, so if you want/need > control interfaces then I think you should separate those out into a > separate line of development where it doesn't derail the discussion > on how to improve shrinker/OOM introspection. Ok, no problems here. Btw, tem OOM debugging is a separate topic brought in by Kent, I'd keep it separate too, as it comes with many OOM-specific complications. From your another email: > So, yeah, you need to think about how to do fine-grained access to > shrinker stats effectively. That might require a complete change of > presentation API. For example, changing the filesystem layout to be > memcg centric rather than shrinker instance centric would make an > awful lot of this file parsing problem go away. > > e.g: > > /sys/kernel/debug/mm/memcg//shrinker//stats The problem with this approach (I though about it) is that it comes with a high memory overhead especially on that machine with thousands cgroups and mount points. And beside the memory overhead, it's really expensive to collect system-wide data and get a big picture, as it requires opening and reading of thousand of files. Actually, you wrote recently: "I've thought about it, too, and can see where it could be useful. However, when I consider the list_lru memcg integration, I suspect it becomes a "can't see the forest for the trees" problem. We're going to end up with millions of sysfs objects with no obvious way to navigate, iterate or search them if we just take the naive "sysfs object + stats per list_lru instance" approach." It all makes me think we need both: a way to iterate over all memcgs and dump all the numbers at once and a way to get a specific per-memcg (per-node) count. Thanks!