Received: by 2002:ab2:6857:0:b0:1ef:ffd0:ce49 with SMTP id l23csp1998981lqp; Sat, 23 Mar 2024 20:45:06 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCWuL2vf0xmbHW1Zf93gxSxdGMk6zi/F128/wRmw6UIklxU+4PeworJyXxMTWy9Lz86GvMYz971vFS0EthH+6/EyPGEVuWpyyfm+bPE9tQ== X-Google-Smtp-Source: AGHT+IHj6osGsbg7l/QeRPfZeNAKZ0YnRW3bDs/zULn0hnES4YrAFuZut6MC0LT2fiQTTP6ADfTC X-Received: by 2002:a05:6a20:3d07:b0:1a3:4203:d518 with SMTP id y7-20020a056a203d0700b001a34203d518mr4159559pzi.7.1711251906009; Sat, 23 Mar 2024 20:45:06 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1711251905; cv=pass; d=google.com; s=arc-20160816; b=LTnJwLO6ev7Knb2Mg+Hv/fApNmHnN1CNBNmfap+DTQEpYQ0iFOGAtJL61qx06uY1/Y PaQGvsoEalUhVePxQ96wu5yDUa1lfhfCAo+GdxfiWW7exRF5+ktPbPfGe0vuhXWQYJy0 gEbkOFzfunzTlv5+sM3k2cYHQ6PWyzwn2E6Ip7zJs2dNXdQlMqJD3d3h6IcYf2YljMVq Rjs3EaoutspEGAXoP/G+PKq5lWZ177qevXjp+5UxGWc7mE7jAZLFZJt1ySg+jK3aJzYZ cAoK7sPxNZ6jGaKH6hpAEg411IcB2pQLwOoBb5KK7hnD9+XGQTYI6wVDDzum9mdFwMYp bG/A== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:in-reply-to:content-disposition:mime-version :list-unsubscribe:list-subscribe:list-id:precedence:references :message-id:subject:cc:to:from:date:dkim-signature; bh=5anrztUeF8JG/Q33v10wfAqJ9O4OuzO3WxPcGzCz/TY=; fh=VLf08BX87Mz/iN0Rl5dp5VV56/2iPWTnlP7IPYI8AyA=; b=Sz+pmQFmGJMu7uWy5U9A/TSo7LmzwfQaHVmwbexGae5xR6dihNkEKWA8bLFgoA96A2 11gilPPD6AXYg4b+e9kt8AbNoCZXAVHv1LnlsgU2GmHyI5CIELXw9FZKoXDLaAzkFFzs EBsLMwdxoIaKvpdv5G9CdGmBP4xmtR7MFseYwkVt9O4G5HUaTZulVmoXwDeH9aRPzDKF pIPYXJlavZ7YGgtwGtrRuiqyJspqwFyUe+w0NpApUHjrLlFf+SUn7Ik0Wat3/Ww+aYH0 qX+wlSGnr4EjF0DvWLijtSjaDAxjl+shR8FOVuerA3oxaF1GdwuEhEB7UADsaSN81yQs G3rg==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@linux.org.uk header.s=zeniv-20220401 header.b=WTdg4nnJ; arc=pass (i=1 dkim=pass dkdomain=linux.org.uk dmarc=pass fromdomain=zeniv.linux.org.uk); spf=pass (google.com: domain of linux-kernel+bounces-112553-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-112553-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=zeniv.linux.org.uk Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [147.75.48.161]) by mx.google.com with ESMTPS id f7-20020a170902684700b001dcc84cec4esi2641401pln.577.2024.03.23.20.45.05 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 23 Mar 2024 20:45:05 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-112553-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) client-ip=147.75.48.161; Authentication-Results: mx.google.com; dkim=pass header.i=@linux.org.uk header.s=zeniv-20220401 header.b=WTdg4nnJ; arc=pass (i=1 dkim=pass dkdomain=linux.org.uk dmarc=pass fromdomain=zeniv.linux.org.uk); spf=pass (google.com: domain of linux-kernel+bounces-112553-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-112553-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=zeniv.linux.org.uk Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 4E797B21C19 for ; Sun, 24 Mar 2024 02:28:15 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id C18337469; Sun, 24 Mar 2024 02:28:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linux.org.uk header.i=@linux.org.uk header.b="WTdg4nnJ" Received: from zeniv.linux.org.uk (zeniv.linux.org.uk [62.89.141.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0BCAC63A1; Sun, 24 Mar 2024 02:27:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=62.89.141.173 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711247282; cv=none; b=IlclhZdV7ThvbFLXtcenm8mMbKVqJHh7cZHpUTG/Lk79OO1s8c/J7XvnsA369mez0ND7RXxjfVPK1OegmRKLVEPJ60lftI6wKT4+n/Tmt9l6YOCQ4kYEsT57q81jFGoB8ODrtA/N22ILEy+BNxt6rhQuJ7zQZ+l9eCoiQ/YivXQ= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711247282; c=relaxed/simple; bh=j7cwiU9WpH1v7ptn9G45h6NOLb1MsA0yEP7XVW6yQMA=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=VXxW54UPlhpmNNQtAjugd654Z21V3v5fkLOv8Sl4RtYQiAUiNJDVjJULZuWhyCrAbp/X4D/XveB+kkOXv3/yEuvj90ld/xI92t+53oUHD+DZf0GULQswcSnYg8Af1P7ntR4VhVp8JqpJi+1lC18Mv90tPWESdCyRcw7Q0L93TWs= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=zeniv.linux.org.uk; spf=none smtp.mailfrom=ftp.linux.org.uk; dkim=pass (2048-bit key) header.d=linux.org.uk header.i=@linux.org.uk header.b=WTdg4nnJ; arc=none smtp.client-ip=62.89.141.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=zeniv.linux.org.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ftp.linux.org.uk DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=linux.org.uk; s=zeniv-20220401; h=Sender:In-Reply-To:Content-Type: MIME-Version:References:Message-ID:Subject:Cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=5anrztUeF8JG/Q33v10wfAqJ9O4OuzO3WxPcGzCz/TY=; b=WTdg4nnJUuQpkBIFuZFHSNxfo/ p20+XA4M1RXQ7tTZAMDLOw7HTZW4HJBbv/utts4jrNzOkQHc2MBpBnrE5KgNf+ODD747qMSw0o5Pi oSDzWCvaLWU+8wykiMlVhhYQlACyKobyeEMzxyVFBL+KZU8Vkdz0FvXwptw1NTlGviMxwYwznjhp1 EnkViMnS2CCk9icn5ZnT7bSz58vEETs/gVjsZKMzV6wbfeOH3PBVzIQj7oRGdlIHHslmKR1NIOSOR 1lJfUll3dyJE5j42V5946Tz7SZfaYUJfWQLCQ5o8mzainCz7fDkPWJr/mH/ncNAdbsev9EAMaY+Dp qF6Lurnw==; Received: from viro by zeniv.linux.org.uk with local (Exim 4.96 #2 (Red Hat Linux)) id 1roDa7-00FXRD-2L; Sun, 24 Mar 2024 02:27:31 +0000 Date: Sun, 24 Mar 2024 02:27:31 +0000 From: Al Viro To: Linus Torvalds Cc: Vlastimil Babka , Josh Poimboeuf , Jeff Layton , Chuck Lever , Kees Cook , Christoph Lameter , Pekka Enberg , David Rientjes , Joonsoo Kim , Andrew Morton , Roman Gushchin , Hyeonggon Yoo <42.hyeyoo@gmail.com>, Johannes Weiner , Michal Hocko , Shakeel Butt , Muchun Song , Christian Brauner , Jan Kara , linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: Re: [PATCH RFC 4/4] UNFINISHED mm, fs: use kmem_cache_charge() in path_openat() Message-ID: <20240324022731.GR538574@ZenIV> References: <20240301-slab-memcg-v1-0-359328a46596@suse.cz> <20240301-slab-memcg-v1-4-359328a46596@suse.cz> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: Al Viro On Fri, Mar 01, 2024 at 09:51:18AM -0800, Linus Torvalds wrote: > Right. I think the natural and logical way to deal with this is to > just say "we account when we add the file to the fdtable". > > IOW, just have fd_install() do it. That's the really natural point, > and also makes it very logical why alloc_empty_file_noaccount() > wouldn't need to do the GFP_KERNEL_ACCOUNT. We can have the same file occuring in many slots of many descriptor tables, obviously. So it would have to be a flag (in ->f_mode?) set by it, for "someone's already charged for it", or you'll end up with really insane crap on each fork(), dup(), etc. But there's also MAP_ANON with its setup_shmem_file(), with the resulting file not going into descriptor tables at all, and that's not a rare thing. > > - I don't know how to properly unwind the accounting failure case. It > > seems like a new case because when we succeed the open, there's no > > further error path at least in path_openat(). > > Yeah, let me think about this part. Becasue fd_install() is the right > point, but that too does not really allow for error handling. > > Yes, we could close things and fail it, but it really is much too late > at this point. That as well. For things like O_CREAT even do_dentry_open() would be too late for unrolls. > What I *think* I'd want for this case is > > (a) allow the accounting to go over by a bit > > (b) make sure there's a cheap way to ask (before) about "did we go > over the limit" > > IOW, the accounting never needed to be byte-accurate to begin with, > and making it fail (cheaply and early) on the next file allocation is > fine. > > Just make it really cheap. Can we do that? That might be reasonable, but TBH I would rather combine that with do_dentry_open()/alloc_file() (i.e. the places where we set FMODE_OPENED) as places to do that, rather than messing with fd_install(). How does the following sound? * those who allocate empty files mark them if they are intended to be kernel-internal (see below for how to get the information there) * memcg charge happens when we set FMODE_OPENED, provided that struct file instance is not marked kernel-internal. * exceeding the limit => pretend we'd succeeded and fail the next allocation. As for how to get the information down there... We have 6 functions where "allocate" and "mark it opened" callchains converge - alloc_file() (pipe(2) et.al., mostly), path_openat() (normal opens, but also filp_open() et.al.), dentry_open(), kernel_file_open(), kernel_tmpfile_open(), dentry_create(). The last 3 are all kernel-internal; dentry_open() might or might not be. For path_openat() we can add a bit somewhere in struct open_flags; the places where we set struct open_flags up would be the ones that might need to be annotated. That's file_open_name() file_open_root() do_sys_openat2() (definitely userland) io_openat2() (ditto) sys_uselib() (ditto) do_open_execat() (IMO can be considered userland in all cases) For alloc_file() it's almost always userland. IMO things like dma_buf_export() and setup_shmem_file() should be charged. So it's a matter of propagating the information to dentry_open(), file_open_name() and file_open_root(). That's about 70 callers to annotate, including filp_open() and file_open_root_mnt() into the mix. 61, actually, and from the quick look it seems that most of them are really obvious... Comments?