Received: by 2002:a05:7412:5112:b0:fa:6e18:a558 with SMTP id fm18csp18646rdb; Mon, 22 Jan 2024 10:28:28 -0800 (PST) X-Google-Smtp-Source: AGHT+IF+aXCxrHtsMRVNWHJaoufOe+JemLudf/Qcn7OO1tqzXoeHWhR8plDxqg/pYe1gByXByXY+ X-Received: by 2002:a2e:9144:0:b0:2cd:e94e:e4c8 with SMTP id q4-20020a2e9144000000b002cde94ee4c8mr2025047ljg.83.1705948108188; Mon, 22 Jan 2024 10:28:28 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1705948108; cv=pass; d=google.com; s=arc-20160816; b=T4pXqZEgb7CeizWbvgj3B/Wo4u0MjcSlF9UW/TPsuJdUl7p9u5aDoyjXMKO7q0lBTR bqy03WCLGU3u+fPR2+scGuaEpAPAMlB4Bhpw2jD5HdzbCIf4vhklEC/J0fdAEkfd7JFQ hjWQuzDvOzxGsVbDMVrTx5HuhEu+WFaczD7s8QTNXFuF1B4OgQxjHVJIat0jpEiaboxI dROHdeFF0jXoqJzGW+k0eM8xmhOeOYqZk70LcoqsQxVeWdEnHupOe49ZeY6+/SxG1/bq 0js45FALFwtwbudSthdnCtvFsdJ2QarS51B7pMF8LUSd3cikeqGzKftLdtEbyo0QViKN miPA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:list-unsubscribe:list-subscribe :list-id:precedence:dkim-signature; bh=5UuQsVtmQQmr5HjqyMtyAk8R8Y84z4aC4k61fXm7dcc=; fh=mmrERxe3uUVtrRLGlGdBKxMy9rXBHe1VctMtdVsk0Co=; b=VKdH4TwwKcdVuZpF+uJE5bLpLjRZp9FUBESFcAzgAcmqOAe5B/HxTWUll5LltStv0S B6B+Jz4/ju46JCzII305mxuDDI+ero9+cdtijNH9yCKe8JQmuny3k0EljK4MJUjbxF4X zlKoO3n8aLcfs8kVBWJkrelPT4FCmtdItI40AITv5eT0BdbrTZlL4p6PZ7Z4eWHhgqcf 0JzC15RpjyNk+W5KSbe9IWj0gPk4k9vicD3u0L5iMDWI1fyhAqOT+iUnZyzlpdJYsUDK mhajMkvOQPM/sMA14CNdkntan02qiaVub5gzjSKGaAy5SKutwbzcPLM3USp6Fuuq1zdO ssyA== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=yPCF1fbJ; arc=pass (i=1 spf=pass spfdomain=google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-33841-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-33841-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id a98-20020a509eeb000000b0055974151337si7322354edf.11.2024.01.22.10.28.28 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Jan 2024 10:28:28 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-33841-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=yPCF1fbJ; arc=pass (i=1 spf=pass spfdomain=google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-33841-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-33841-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 4759A1F2DAED for ; Mon, 22 Jan 2024 18:14:04 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 931EC40BF9; Mon, 22 Jan 2024 17:38:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="yPCF1fbJ" Received: from mail-pl1-f174.google.com (mail-pl1-f174.google.com [209.85.214.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 368913E47B for ; Mon, 22 Jan 2024 17:38:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.174 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705945119; cv=none; b=rj9TscLYE4DhreSFghG057ohTzLZSzApOytn5LM4S6SdSXvDvlHQ7pYbK1oj6+f1FjGp6xW2chQUX0SULS0hpvR0U5suzg8tziTUGRMqfEDnhMvpFazRuYDX51ToqviwpZSqyIXM73WbRWV09tIL6nYQNVp6rj1H9dKRVQgZ1gg= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705945119; c=relaxed/simple; bh=lx13kHG0tJJSdUreOeJdloyx12tNITbn9wmXA6Zx684=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=KWCgUIgmyYyKtCX25E42TnkD4XKSGT06QuuLPegrsxUM+2++QblGyUJ3TTc9Z2AXbD7rz/AK1jYbkVIBjKKYXf0Xu4wGPLdx5O5P/RgwaxijZMo+3mDsfHi6AII+TambC5GGknBIX8S9WLyPd8Nit7iG2X1Sb9ONTKdtGCTPHPs= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=yPCF1fbJ; arc=none smtp.client-ip=209.85.214.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=google.com Received: by mail-pl1-f174.google.com with SMTP id d9443c01a7336-1d72d240c69so253815ad.1 for ; Mon, 22 Jan 2024 09:38:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1705945117; x=1706549917; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=5UuQsVtmQQmr5HjqyMtyAk8R8Y84z4aC4k61fXm7dcc=; b=yPCF1fbJt88udfi4IbYvyaXYX1RwZVyFK5gY8d4VgFcIHQfE23M7XwhosOmHMl7BSw 1f2ch0AVVFdaRvBeDfwm1Dx3LPBiol/n1wR7FJJuAo5L6FDFzvUlHz1gV6X5bPTy1Kmx RKhjIETiDmyPys3CDehoYWfEpHESDMTCapSKiGdGka7ryTwq/zKsrJqHCsMbu+yVxXlA 49lYvZyMLr7jiTPTi/pmV/HGwBC+sOJv9BVuM2L3UTGKoH1ffNCyTzQfpoVteKAulTit kiPRTYXLk7smW4XpMxCIX25uXX/RsApCHtPRHd7o4ihYa9OBLGk9EX7cdiupmcQmgTiK DKJg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705945117; x=1706549917; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=5UuQsVtmQQmr5HjqyMtyAk8R8Y84z4aC4k61fXm7dcc=; b=Il7LklUAX+JQLY/P7fxLO9K/DhZBfwTEynKUtTMubVAqCouk5GTOkzDARrXorC8hk6 ur55utvOR+JuT8Bnkj622PKjvbq/7DcNwRk4PGCKu2UWvah4rkMExgWTnwgM/wdZMrqG rVT2rY5+bvB2AKJQcFF/j3kFxQkxlV2QpR8VawE22RfGDLxz/zU2OQB/Wlz0MUlxaa5/ +5yORuMLNIqABcp4RKQLBLi8tVII4d1o1MCU+FP5TpDhRp06HD4zWgVSlnk+GQ8ktiKo fC35Dsk4HDlLpwtBSMTtQno0m+myrIG1FiZuUOtHf49+A2ll8ra9JdtYNPA8BCdNfx1+ Ravg== X-Gm-Message-State: AOJu0YxIk8lllqEZizqmv1P48rgxKkVlbEuGx/FBhzMMjI3g/mfhN+0O xxVzYdHkSnTOcAZ7nEVEbDwM1u/+hRS6NpN7zFyfiXqASgchrJhLRp+/Iau7ZvLIkmLBAr59DaL cHvYTExW45Z/aB9ohgFJNK61Re2MjEphu26pz X-Received: by 2002:a17:902:e747:b0:1d6:f66b:1057 with SMTP id p7-20020a170902e74700b001d6f66b1057mr2985plf.28.1705945117286; Mon, 22 Jan 2024 09:38:37 -0800 (PST) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <6667b799702e1815bd4e4f7744eddbc0bd042bb7.camel@kernel.org> <20240117193915.urwueineol7p4hg7@treble> In-Reply-To: From: Shakeel Butt Date: Mon, 22 Jan 2024 09:38:23 -0800 Message-ID: Subject: Re: [PATCH RFC 1/4] fs/locks: Fix file lock cache accounting, again To: Linus Torvalds Cc: Roman Gushchin , Josh Poimboeuf , Vlastimil Babka , Jeff Layton , Chuck Lever , Johannes Weiner , Michal Hocko , linux-kernel@vger.kernel.org, Jens Axboe , Tejun Heo , Vasily Averin , Michal Koutny , Waiman Long , Muchun Song , Jiri Kosina , cgroups@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Linus, On Sun, Jan 21, 2024 at 9:16=E2=80=AFPM Linus Torvalds wrote: > > On Wed, 17 Jan 2024 at 14:56, Shakeel Butt wrote: > > > > > > So I don't see how we can make it really cheap (say, less than 5% ove= rhead) > > > without caching pre-accounted objects. > > > > Maybe this is what we want. Now we are down to just SLUB, maybe such > > caching of pre-accounted objects can be in SLUB layer and we can > > decide to keep this caching per-kmem-cache opt-in or always on. > > So it turns out that we have another case of SLAB_ACCOUNT being quite > a big expense, and it's actually the normal - but failed - open() or > execve() case. > > See the thread at > > https://lore.kernel.org/all/CAHk-=3Dwhw936qzDLBQdUz-He5WK_0fRSWwKAjtb= VsMGfX70Nf_Q@mail.gmail.com/ > > and to see the effect in profiles, you can use this EXTREMELY stupid > test program: > > #include > > int main(int argc, char **argv) > { > for (int i =3D 0; i < 10000000; i++) > open("nonexistent", O_RDONLY); > } > > where the point of course is that the "nonexistent" pathname doesn't > actually exist (so don't create a file called that for the test). > > What happens is that open() allocates a 'struct file *' early from the > filp kmem_cache, which has SLAB_ACCOUNT set. So we'll do accounting > for it, failt the pathname open, and free it again, which uncharges > the accounting. > > Now, in this case, I actually have a suggestion: could we please just > make SLAB_ACCOUNT be something that we do *after* the allocation, kind > of the same way the zeroing works? > > IOW, I'd love to get rid of slab_pre_alloc_hook() entirely, and make > slab_post_alloc_hook() do all the "charge the memcg if required". > > Obviously that means that now a failure to charge the memcg would have > to then de-allocate things, but that's an uncommon path and would be > marked unlikely and not be in the hot path at all. > > Now, the reason I would prefer that is that the *second* step would be to > > (a) expose a "kmem_cache_charge()" function that takes a > *non*-accounted slab allocation, and turns it into an accounted one > (and obviously this is why you want to do everything in the post-alloc > hook: just try to share this code) > > (b) remote the SLAB_ACCOUNT from the filp_cachep, making all file > allocations start out unaccounted. > > (c) when we have *actually* looked up the pathname and open the file > successfully, at *that* point we'd do a > > error =3D kmem_cache_charge(filp_cachep, file); > > in do_dentry_open() to turn the unaccounted file pointer into an > accounted one (and if that fails, we do the cleanup and free it, of > course, exactly like we do when file_get_write_access() fails) > > which means that now the failure case doesn't unnecessarily charge the > allocation that never ends up being finalized. > > NOTE! I think this would clean up mm/slub.c too, simply because it > would get rid of that memcg_slab_pre_alloc_hook() entirely, and get > rid of the need to carry the "struct obj_cgroup **objcgp" pointer > along until the post-alloc hook: everything would be done post-alloc. > > The actual kmem_cache_free() code already deals with "this slab hasn't > been accounted" because it obviously has to deal with allocations that > were done without __GFP_ACCOUNT anyway. So there's no change needed on > the freeing path, it already has to handle this all gracefully. > > I may be missing something, but it would seem to have very little > downside, and fix a case that actually is visible right now. > Thanks for the idea. Actually I had a discussion with Vlastimil at LPC '22 on a similar idea but for a different problem i.e. allocation in irq context or more specifically charging sockets for incoming TCP connections. If you see inet_csk_accept() we solved this for TCP memory by charging at accept() syscall but the kernel memory holding socket is still uncharged. So, I think having the framework you suggested would help more use-cases. I will take a stab at this in the next couple of days (sorry stuck in some other stuff atm). thanks, Shakeel