Received: by 2002:a05:7412:2a91:b0:fc:a2b0:25d7 with SMTP id u17csp377327rdh; Tue, 13 Feb 2024 22:20:49 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCUjaCdkISdg0eXSOEx3mGmmY1IE0+J67ArqK4PEon4pg4gAu0lcihcVHh83pfnZmy2nkBYyxMQi/AElxbIEMSLkhd+wmVemsVMrkFpcCQ== X-Google-Smtp-Source: AGHT+IHLjn0/1jBEdfvqTZvsxfahtXyEANCdgNxZPR1zcdl7P8mrN0CdXF0GEoAv80/w67ysTJgq X-Received: by 2002:a05:6402:791:b0:560:c6a8:e7ca with SMTP id d17-20020a056402079100b00560c6a8e7camr1008528edy.10.1707891648857; Tue, 13 Feb 2024 22:20:48 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707891648; cv=pass; d=google.com; s=arc-20160816; b=OtR7U4uVVctM95VV0sQ3IQ5zhl0RofOKN+Rm4C3x2i5d+LFsKKLiOHPgtjbmEwxaze ukt2+/tkqwXSHMf3mOfTNsEzQmQO1/HzreqpLqvIeEm86OJtEspWi0csFnnS9LV+al+i bYbrvvfF6odT4F+BKo8XRGuwwFHSCnBWJWu5QBMEZN2irBBe0prlKKH5nM90hfQvAMu6 JLTg6LpoEUn3+q7+CAfuP2oUWOZpnZL7BUHi0dAy8he69XqzmPyvdhyArx7KZeOusPbH nSA90NListKTRQoe+pJBNgeCAPEQhDT9oOJTmoDfIz9ThZg/12mnn6Pspg7V0RaOaWwg TCYA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-disposition:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:message-id:subject:cc :to:from:date:dkim-signature; bh=mPZvfWNgR9UvijIIJJg1XLOXvrF3xKs/y4S11I2T7Mc=; fh=LE9rIcW1CdYfQhn5DjC1jcbecZ0yLN8X0M2AyJ9oJac=; b=HpzDQlzRJmh1VTgm7G8MuDrLbrrQyd8M3a/0ZuaIuuYB9P+kFEaMGh3Vhjo7z/L61Y xdOTJOW1bwBlnKYVYYnLyGxq17s3H2sT0+6UDclqkNodXV9s0IWHNH1WDJ1uWRHhER4P 0HJ0Kk4sTZZhZtWp3hbG7Qmh1ujRxB5ZpSaXU3NoYC/m97uk4tTLlE9+pMijSnLr3tF5 nAJhNllJ4OV1F13Ok+Y6ueQM3T3uL3xNWZ8F4E0AAceEoE41HwpRaW+cc3NGQ+eeYgeV Dg3ijDxDUqrZcwgIdRX4w4vpEr2G70PigT0vb1fnXgmJuvm1onBDcI2WpK5bBaJVB3pb PDJA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=frEKCXWj; arc=pass (i=1 spf=pass spfdomain=cmpxchg.org dkim=pass dkdomain=cmpxchg-org.20230601.gappssmtp.com dmarc=pass fromdomain=cmpxchg.org); spf=pass (google.com: domain of linux-kernel+bounces-64780-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-64780-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org X-Forwarded-Encrypted: i=2; AJvYcCXSbRdb/HSUKB+r9gI0NvhD9aoYB8PmbfwOeOX7UabLNgQyoZzQjYa4rFpNfS24AOifVcppq5lrI08C6IGqBCZDwVLHDhvei6N1e0r/xQ== Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [2604:1380:4601:e00::3]) by mx.google.com with ESMTPS id w14-20020a056402268e00b00561d0ad61a8si2306613edd.647.2024.02.13.22.20.48 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 13 Feb 2024 22:20:48 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-64780-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; Authentication-Results: mx.google.com; dkim=pass header.i=@cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=frEKCXWj; arc=pass (i=1 spf=pass spfdomain=cmpxchg.org dkim=pass dkdomain=cmpxchg-org.20230601.gappssmtp.com dmarc=pass fromdomain=cmpxchg.org); spf=pass (google.com: domain of linux-kernel+bounces-64780-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-64780-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 6FAB61F27073 for ; Wed, 14 Feb 2024 06:20:48 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 46D03111B7; Wed, 14 Feb 2024 06:20:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=cmpxchg-org.20230601.gappssmtp.com header.i=@cmpxchg-org.20230601.gappssmtp.com header.b="frEKCXWj" Received: from mail-qt1-f174.google.com (mail-qt1-f174.google.com [209.85.160.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CB54B10A34 for ; Wed, 14 Feb 2024 06:20:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.174 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707891630; cv=none; b=uamUQ3Z1m5Q5uiONmgwxF5/Mv/7gwvYcv1fsMlxmI+htCLZgVQl+entbPA0TjoUoe3qn1WbyTeii305pB1DBSD1Im2Sv/R2Sp3hxNr5r9XW7/qEYzZr8+e0+UlZ8EaPxUZvweMnuOb7WRy6ZPpSHAwgmTHGdI6UPL7UIV7JvVRU= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707891630; c=relaxed/simple; bh=mPZvfWNgR9UvijIIJJg1XLOXvrF3xKs/y4S11I2T7Mc=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=E6IaNTei42IjoDBIuzdNUW3CSGW0mjKIOt8+afjtDuNbnrrytyA7k2XUA7O51QB3MdjCd+cc/5jMSvpDI0C+ii+/4zp7JuzL2/opNS3BeshfnnXV7dncWIdo8j6ZTqKZbiNh4BIe5AyihI0fIs0ccuj6X2LT/9jDzrTi9aV3+6c= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=cmpxchg.org; spf=pass smtp.mailfrom=cmpxchg.org; dkim=pass (2048-bit key) header.d=cmpxchg-org.20230601.gappssmtp.com header.i=@cmpxchg-org.20230601.gappssmtp.com header.b=frEKCXWj; arc=none smtp.client-ip=209.85.160.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=cmpxchg.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=cmpxchg.org Received: by mail-qt1-f174.google.com with SMTP id d75a77b69052e-42ce63b1d30so18612521cf.3 for ; Tue, 13 Feb 2024 22:20:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20230601.gappssmtp.com; s=20230601; t=1707891626; x=1708496426; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=mPZvfWNgR9UvijIIJJg1XLOXvrF3xKs/y4S11I2T7Mc=; b=frEKCXWjz3d2XuOUYzdId9W6Mvn73jhIIjZV2UtP6WyaWc+D189xwP5w9kl+nSGfMv 92wT3Ax40rNcFpOR+78CXna5fylDXfI6Hpiij0UumIVVKLviUy2TBLD/fQ4JiX8hcSKk lDb1xW+VEfETTFCtbvp6CurH0ifNSGWGK9795xsbLYIgyGs6dgO9WOfTiI+Y04tJwTwz U2B8EUisd2LXc0KXZhBK0T+8L+7pSUgqhfIrW+lqZMIXTuHrXKY6BZ+bKZmBwZFp1CkH S+9RnGWorvD7Btw1UXhsTSfLoPpByNUBhDeBmUW9EuyIbEp1u9AceE9ovXck9gVbMmaP b6Zw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707891626; x=1708496426; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=mPZvfWNgR9UvijIIJJg1XLOXvrF3xKs/y4S11I2T7Mc=; b=lXOmvYbDv3MyowarDbjk7zESzoxSFbuSdrrivfvuHkz4my+hARcjrRQixUliKtZJzc tjKK6g5rX7WHzq9ocU9k7ZRCZL85OltJ4zYcHyAwpdD0BTggLxO+BtOyKGuyXhthI9AO zU+m1ASdb/r+Au3FgMvpveXEpP3dyKSw/9i51g3wAesf8fDuok5M4LMp4XbxcGsiSP8r xlsIm6NZNEWzkbIJoE7Pvb7C1W7RPQXeaQ67C62zNOSfNJ7M8quQ3tXTJFikDDSin7O/ ZWFPodzvovV8m35Axbar1D3yTvkpIhSdgP+xJmQWnxSdqYnM0q5YxsAuoEKzREMO82hW 49kA== X-Forwarded-Encrypted: i=1; AJvYcCXNInwUaP+ZREdCjP8Tnfcf4aUoWcAPiEIcsOMokqqn5vAYe1rE1AZi1ZMG5NXqGwQ15+iEPVwRQgGhC7KDFop31gSQ4H4BYmL/0fz5 X-Gm-Message-State: AOJu0YzZTMOEBBtlgG8lnNM9jhtqCUJBOsNj2VKrUpPGeptdV8prDDyB iG02yP8OF1CW86SH+JB1p067DRBtrID0Hy05qD+8ge9hEknYOcxf/y+w9QzQQLs= X-Received: by 2002:a05:622a:1045:b0:42c:70a8:1b3f with SMTP id f5-20020a05622a104500b0042c70a81b3fmr1875328qte.7.1707891626471; Tue, 13 Feb 2024 22:20:26 -0800 (PST) X-Forwarded-Encrypted: i=1; AJvYcCVyB4jFMHn+EGrswmDu1/pylqxzZS/7CEIDA9h5gu8tDWbdJ3ly8gQnCVDTj72KRibP+jAgu2ZRcQ/Q9a2f8mOD/8WXTApSC1qyCuHeU4+OP9RxDNN1OsO4haGA3xEgG8cBl+GFQKumMlACRPRyQJPu8vDu/OBVCiXC5h2Kc+UhHUN0uxIr4+CnlS8fi5EPSbA+LawgUvKtz3ZJACCe2oZdSez0fKge0ggM1gE4IT9Kqde0gLbAT0+TTaW2dEquOn5OWonOfMHlY3LgDmFyknOkT5W5eN0ZeXug1LFzG17semLCLr5AExvGYfmRQRDfU40g5L7KZnrImSIabsA13nCEDFDEsSslvrMcNaUb7t5eK7uOlxufthSTIpxY20ZQU3xPLCzAO+l1Ldckgh7k6tNNU06NoQR8ttw70zX1yKVuoWLDxjsxoj1T8wPB+5OL+GcWyt0wZa982UdziuBAHRENHQ0DQXXyXvCaQFyw1xsOKMQ7S7pmzVU3wvsVR5JUy1l6N8WUSUtuVMbsHyQijVWlF9mlMKaXIWaNBamQ9/S60ZP2p3zWinc2XX8gTox0N0h/Q0S2X5vFuIQ5UHcJrM5PUkbVJJNXCJyBC8ucShST/aG9b+YvQah2fi9VIDywwpfMIFK9M0YySaLklvdQV5uUU9LsM1ywmjyRyDGDk2WX41U+gOwGiADEmhGmUF0lSSQB5mWzXk6HCLoGXObkbI4tPYw+luBGD1XlhJpzMfqud8YIQO/OFkOahUuSX4+N3goBrvN3ZGHcTJUDPDc3LB08lvkBzmvCkMRH6ncHmyuNsvgmosgXfyXpuZFZkvLjViPXzM8cRsxd+8nU3DvvwLMl5itUgBMRkaIXUozPyGyG4FTV6l/TeUOqoFkVygQu+LVUyJl+62Jk4R/hoPm1djda/Ishy9+m0pREhG7BHUf/REwC5VSeNUF37sLHauvrCpMG+X WKicboE37DS7u3tPg+DN6EyK9ZAffCPdEkt2GF4XjAD2FnpADLiQVhGEq9Gq4x+WxVYIxnp7gxNGLxBm8TMis3mOlYQAdsIAvOSs8asdMvcSD25bG3gnLj9UKC1UKFf9aomJJB1ZKsIUjeqI2tRkeW/yf4Kcw2IqeFnyHjGKNQy1IldFxWzTnPTBNkHSNG6OZ7MHYp+nAysigPe3NYOe4Q5wbEutodr/NCeHs32gpONYfZhmrW19yPiWOyLuVChJlCWN/VhC335HRwM4Jr/3BjSnCvvbALRiLovEnzTWuwZusbptsgKAhf0pYy7Hj0O7mtI8Ipnh/4IbbGPoMN5arUoYLuSiC+F5FfROUz8hpEFNUUDL2AjbfwdHaatdbz/lhYwKm/0Eh46IibKUNBiDhVOuavk+6b194zhv0/CtvOV8FsHHEClx2D7Cjspyg+wLeUr5giZvnI8UB/IuG9+h/ALXycq0Q9ApV+74ekx9PtkzI2OBvC4NMelG/r5quHjtuJHtIzSEv7q7awQD23dieYar23cVHEPeDfPHFuaPAy/4S5RGl2QcRcwFxWNfIom7IFQS0jkYJu+FGuA2+uIbbjOXmcFA6+1f5cZnuCanLkH5nPRjzob2X4MKTarRGr73/himyWAHgMpj/7MG/senK/ELpDrSMOR9ILccmRutMaJYoT5BKk8QgRYlqxQQp/InILivj9wt6yn8wcG/ZlIk7Okj8ru9qLMBeJ/MA/eO9VzJ3RXkyt0bs3guJS9LyL5AetyXjHQDTm4NjWPUuvUoxp7fw6PVUWqv58Vt7B3YrRCsqrOsuTL8mQl+aXclP+RFSIv7QZBkmRVu87kVHgZltKUF98f2vLoKjI1gpQOjthlGqVicBoS6tNmoGtL1A76HVtYGtQ6OJi5g3LWPFa1xgahdCjQAbbgZ54quqQtuyqxGcgXGMfJYexzVuYVZmdOOI931sL2IwF3ykaiUQtqU5UJz7+ZmloPtGW1sO xtSy5GrkpnxXCEcybhjX/cUg0mDzafJYrWnWySSCMVwaThVHLFt/1LixORhIqMKVVGi/7OQeIomQZ7uRH6Pc2OtQdIsIQMy69S0RxK/8Av+XN15iNlxkhZLazp0BOiiSmWSZKEaTMf+OYvsKI Received: from localhost ([2620:10d:c091:400::5:6326]) by smtp.gmail.com with ESMTPSA id l13-20020ac8078d000000b0042c613a5cf3sm1755053qth.33.2024.02.13.22.20.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 13 Feb 2024 22:20:25 -0800 (PST) Date: Wed, 14 Feb 2024 01:20:20 -0500 From: Johannes Weiner To: Suren Baghdasaryan Cc: akpm@linux-foundation.org, kent.overstreet@linux.dev, mhocko@suse.com, vbabka@suse.cz, roman.gushchin@linux.dev, mgorman@suse.de, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, corbet@lwn.net, void@manifault.com, peterz@infradead.org, juri.lelli@redhat.com, catalin.marinas@arm.com, will@kernel.org, arnd@arndb.de, tglx@linutronix.de, mingo@redhat.com, dave.hansen@linux.intel.com, x86@kernel.org, peterx@redhat.com, david@redhat.com, axboe@kernel.dk, mcgrof@kernel.org, masahiroy@kernel.org, nathan@kernel.org, dennis@kernel.org, tj@kernel.org, muchun.song@linux.dev, rppt@kernel.org, paulmck@kernel.org, pasha.tatashin@soleen.com, yosryahmed@google.com, yuzhao@google.com, dhowells@redhat.com, hughd@google.com, andreyknvl@gmail.com, keescook@chromium.org, ndesaulniers@google.com, vvvvvv@google.com, gregkh@linuxfoundation.org, ebiggers@google.com, ytcoode@gmail.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, bristot@redhat.com, vschneid@redhat.com, cl@linux.com, penberg@kernel.org, iamjoonsoo.kim@lge.com, 42.hyeyoo@gmail.com, glider@google.com, elver@google.com, dvyukov@google.com, shakeelb@google.com, songmuchun@bytedance.com, jbaron@akamai.com, rientjes@google.com, minchan@google.com, kaleshsingh@google.com, kernel-team@android.com, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, iommu@lists.linux.dev, linux-arch@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-modules@vger.kernel.org, kasan-dev@googlegroups.com, cgroups@vger.kernel.org Subject: Re: [PATCH v3 00/35] Memory allocation profiling Message-ID: <20240214062020.GA989328@cmpxchg.org> References: <20240212213922.783301-1-surenb@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240212213922.783301-1-surenb@google.com> I'll do a more throrough code review, but before the discussion gets too sidetracked, I wanted to add my POV on the overall merit of the direction that is being proposed here. I have backported and used this code for debugging production issues before. Logging into a random host with an unfamiliar workload and being able to get a reliable, comprehensive list of kernel memory consumers is one of the coolest things I have seen in a long time. This is a huge improvement to sysadmin quality of life. It's also a huge improvement for MM developers. We're the first points of contact for memory regressions that can be caused by pretty much any driver or subsystem in the kernel. I encourage anybody who is undecided on whether this is worth doing to build a kernel with these patches applied and run it on their own machine. I think you'll be surprised what you'll find - and how myopic and uninformative /proc/meminfo feels in comparison to this. Did you know there is a lot more to modern filesystems than the VFS objects we are currently tracking? :) Then imagine what this looks like on a production host running a complex mix of filesystems, enterprise networking, bpf programs, gpus and accelerators etc. Backporting the code to a slightly older production kernel wasn't too difficult. The instrumentation layering is explicit, clean, and fairly centralized, so resolving minor conflicts around the _noprof renames and the wrappers was pretty straight-forward. When we talk about maintenance cost, a fair shake would be to weigh it against the cost and reliability of our current method: evaluating consumers in the kernel on a case-by-case basis and annotating the alloc/free sites by hand; then quibbling with the MM community about whether that consumer is indeed significant enough to warrant an entry in /proc/meminfo, and what the catchiest name for the stat would be. I think we can agree that this is vastly less scalable and more burdensome than central annotations around a handful of mostly static allocator entry points. Especially considering the rate of change in the kernel as a whole, and that not everybody will think of the comprehensive MM picture when writing a random driver. And I think that's generous - we don't even have the network stack in meminfo. So I think what we do now isn't working. In the Meta fleet, at any given time the p50 for unaccounted kernel memory is several gigabytes per host. The p99 is between 15% and 30% of total memory. That's a looot of opaque resource usage we have to accept on faith. For hunting down regressions, all it takes is one untracked consumer in the kernel to really throw a wrench into things. It's difficult to find in the noise with tracing, and if it's not growing after an initial allocation spike, you're pretty much out of luck finding it at all. Raise your hand if you've written a drgn script to walk pfns and try to guess consumers from the state of struct page :) I agree we should discuss how the annotations are implemented on a technical basis, but my take is that we need something like this. In a codebase of our size, I don't think the allocator should be handing out memory without some basic implied tracking of where it's going. It's a liability for production environments, and it can hide bad memory management decisions in drivers and other subsystems for a very long time.