S_KERNEL_FILE is grossly misnamed. We have plenty of files hold open by
the kernel kernel using filp_open. This flag OTOH signals the inode as
being a special snowflake that cachefiles holds onto that can't be
unlinked because of ..., erm, pixie dust. So clearly mark it as such.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Amir Goldstein <[email protected]>
---
Changes since v1:
- fix commit message typos
fs/cachefiles/namei.c | 12 ++++++------
fs/namei.c | 2 +-
include/linux/fs.h | 2 +-
3 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
index 9bd692870617c..599dc13a7d9ab 100644
--- a/fs/cachefiles/namei.c
+++ b/fs/cachefiles/namei.c
@@ -20,8 +20,8 @@ static bool __cachefiles_mark_inode_in_use(struct cachefiles_object *object,
struct inode *inode = d_backing_inode(dentry);
bool can_use = false;
- if (!(inode->i_flags & S_KERNEL_FILE)) {
- inode->i_flags |= S_KERNEL_FILE;
+ if (!(inode->i_flags & S_CACHEFILE)) {
+ inode->i_flags |= S_CACHEFILE;
trace_cachefiles_mark_active(object, inode);
can_use = true;
} else {
@@ -51,7 +51,7 @@ static void __cachefiles_unmark_inode_in_use(struct cachefiles_object *object,
{
struct inode *inode = d_backing_inode(dentry);
- inode->i_flags &= ~S_KERNEL_FILE;
+ inode->i_flags &= ~S_CACHEFILE;
trace_cachefiles_mark_inactive(object, inode);
}
@@ -742,7 +742,7 @@ static struct dentry *cachefiles_lookup_for_cull(struct cachefiles_cache *cache,
goto lookup_error;
if (d_is_negative(victim))
goto lookup_put;
- if (d_inode(victim)->i_flags & S_KERNEL_FILE)
+ if (d_inode(victim)->i_flags & S_CACHEFILE)
goto lookup_busy;
return victim;
@@ -789,11 +789,11 @@ int cachefiles_cull(struct cachefiles_cache *cache, struct dentry *dir,
/* check to see if someone is using this object */
inode = d_inode(victim);
inode_lock(inode);
- if (inode->i_flags & S_KERNEL_FILE) {
+ if (inode->i_flags & S_CACHEFILE) {
ret = -EBUSY;
} else {
/* Stop the cache from picking it back up */
- inode->i_flags |= S_KERNEL_FILE;
+ inode->i_flags |= S_CACHEFILE;
ret = 0;
}
inode_unlock(inode);
diff --git a/fs/namei.c b/fs/namei.c
index d81f04f8d8188..7402277ecc1f5 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3959,7 +3959,7 @@ int vfs_rmdir(struct user_namespace *mnt_userns, struct inode *dir,
error = -EBUSY;
if (is_local_mountpoint(dentry) ||
- (dentry->d_inode->i_flags & S_KERNEL_FILE))
+ (dentry->d_inode->i_flags & S_CACHEFILE))
goto out;
error = security_inode_rmdir(dir, dentry);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index c8510da6cc6dc..099d7e03d46e6 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2173,7 +2173,7 @@ struct super_operations {
#define S_ENCRYPTED (1 << 14) /* Encrypted file (using fs/crypto/) */
#define S_CASEFOLD (1 << 15) /* Casefolded file */
#define S_VERITY (1 << 16) /* Verity file (using fs/verity/) */
-#define S_KERNEL_FILE (1 << 17) /* File is in use by the kernel (eg. fs/cachefiles) */
+#define S_CACHEFILE (1 << 17) /* In use as cachefile, can't be unlinked */
/*
* Note that nosuid etc flags are inode-specific: setting some file-system
--
2.30.2
On 1/27/22 11:47 PM, Christoph Hellwig wrote:
> S_KERNEL_FILE is grossly misnamed. We have plenty of files hold open by
> the kernel kernel using filp_open. This flag OTOH signals the inode as
> being a special snowflake that cachefiles holds onto that can't be
> unlinked because of ..., erm, pixie dust. So clearly mark it as such.
>
> Signed-off-by: Christoph Hellwig <[email protected]>
> Reviewed-by: Amir Goldstein <[email protected]>
> ---
>
Looks good.
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Christoph Hellwig <[email protected]> wrote:
> S_KERNEL_FILE is grossly misnamed. We have plenty of files hold
"held".
> open by the kernel kernel using filp_open.
You said "kernel" twice.
And so what? Cachefiles holds files open by filp_open - but it can only do so
temporarily otherwise EMFILE/ENFILE and OOMs start to become a serious problem
because it could end up holding thousands of files open (one or two of the
xfstests cause this to happen).
Further, holding the file open *doesn't* prevent cachefilesd from trying to
cull the file to make space whilst it's "in use".
Yet further, I'm not holding the directories that form the cache layout open
with filp_open at all - I'm not reading them, so that would just be a waste of
resources - but I really don't want cachefilesd culling them because it sees
they're empty but cachefiles is holding them ready.
> This flag OTOH signals the inode as being a special snowflake that
> cachefiles holds onto that can't be unlinked because of ..., erm, pixie
> dust.
Really? I presume you read the explanation I gave of the races that are a
consequence of splitting the driver between the kernel and userspace? I could
avoid them - or at least mitigate them - by keeping my own list of all the
inodes in use by cachefiles so that cachefilesd can query it. I did, in fact,
use to have such a list, but the core kernel already has such lists and the
facilities to translate pathnames into internal objects, so my stuff ought to
be redundant - all I need is one inode flag.
Further, that inode flag can be shared with anyone else who wants to do
something similar. It's just an "I'm using this" lock. There's no particular
reason to limit its use to cachefiles. In fact, it is better if it is then
shared so that in the unlikely event that two drivers try to use the same
file, an error will occur.
I do use it to defend cachefiles against itself also. In the event that
there's a bug or a race and it tries to reuse its own cache - particularly
with something like NFS that can have multiple superblocks for the same
source, just with different I/O parameters, for example - this can lead to
data corruption. I try to defend against it in fscache also, but there can be
delayed effects due to object finalisation being done in the background after
fscache has returned to the netfs.
Now, I do think there's an argument to be made for splitting the flag into
two, as I advanced in a proposed patch. One piece would just be an "I'm using
this", the other would be a "don't delete this" flag. Both are of potential
use to other drivers.
I take it you'd prefer this to be done a different way?
David
On Fri, Jan 28, 2022 at 12:12 PM David Howells <[email protected]> wrote:
>
> Christoph Hellwig <[email protected]> wrote:
>
> > S_KERNEL_FILE is grossly misnamed. We have plenty of files hold
>
> "held".
>
> > open by the kernel kernel using filp_open.
>
> You said "kernel" twice.
>
> And so what? Cachefiles holds files open by filp_open - but it can only do so
> temporarily otherwise EMFILE/ENFILE and OOMs start to become a serious problem
> because it could end up holding thousands of files open (one or two of the
> xfstests cause this to happen).
>
> Further, holding the file open *doesn't* prevent cachefilesd from trying to
> cull the file to make space whilst it's "in use".
>
> Yet further, I'm not holding the directories that form the cache layout open
> with filp_open at all - I'm not reading them, so that would just be a waste of
> resources - but I really don't want cachefilesd culling them because it sees
> they're empty but cachefiles is holding them ready.
>
> > This flag OTOH signals the inode as being a special snowflake that
> > cachefiles holds onto that can't be unlinked because of ..., erm, pixie
> > dust.
>
> Really? I presume you read the explanation I gave of the races that are a
> consequence of splitting the driver between the kernel and userspace? I could
> avoid them - or at least mitigate them - by keeping my own list of all the
> inodes in use by cachefiles so that cachefilesd can query it. I did, in fact,
> use to have such a list, but the core kernel already has such lists and the
> facilities to translate pathnames into internal objects, so my stuff ought to
> be redundant - all I need is one inode flag.
>
> Further, that inode flag can be shared with anyone else who wants to do
> something similar. It's just an "I'm using this" lock. There's no particular
> reason to limit its use to cachefiles. In fact, it is better if it is then
> shared so that in the unlikely event that two drivers try to use the same
> file, an error will occur.
>
Good idea, but then the helpers to set the flag should not be internal
to cachefiles and the locking semantics should be clear.
FYI, overlayfs already takes an "exclusive lock" on upper/work dir
among all ovl instances.
How do you feel about hoisting ovl_inuse_* helpers to fs.h
and rename s/I_OVL_INUSE/I_EXCL_INUSE?
Whether deny rmdir should have its own flag or not I don't know,
but from ovl POV I *think* it should not be a problem to deny rmdir
for the ovl upper/work dirs as long as ovl is mounted(?).
From our experience, adding the exclusive lock caused regressions
in setups with container runtimes that had mount leaks bugs.
I am hoping that all those mount leaks bugs were fixed, but one never
knows what sort of regressions deny rmdir of upper/work may cause.
Another problem with generic deny of rmdir is that users getting
EBUSY have no way to figure out the reason.
At least for a specific subsystem (i.e. cachefiles) users should be able
to check if the denied dir is in the subsystem's inventory(?)
Thanks,
Amir.
Amir Goldstein <[email protected]> wrote:
> Good idea, but then the helpers to set the flag should not be internal
> to cachefiles and the locking semantics should be clear.
I could move them out, at least partially. They do log some information
that's private to cachefiles through the tracepoint, but it's just one number
and could be passed in as a parameter. I could move the tracepoint to
somewhere more generic.
> FYI, overlayfs already takes an "exclusive lock" on upper/work dir
> among all ovl instances.
>
> How do you feel about hoisting ovl_inuse_* helpers to fs.h
> and rename s/I_OVL_INUSE/I_EXCL_INUSE?
Fine by me. Sharing a cache with or through an overlay would make for very
fun coherency management.
> Whether deny rmdir should have its own flag or not I don't know,
> but from ovl POV I *think* it should not be a problem to deny rmdir
> for the ovl upper/work dirs as long as ovl is mounted(?).
What's the consequence of someone rearranging the directories directly in the
contributing dirs whilst there's an overlay over them?
> Another problem with generic deny of rmdir is that users getting
> EBUSY have no way to figure out the reason.
> At least for a specific subsystem (i.e. cachefiles) users should be able
> to check if the denied dir is in the subsystem's inventory(?)
I could add a tracepoint for that. It could form a set with the cachefiles
tracepoints if I move those out too.
David
On Fri, Jan 28, 2022 at 11:35:59AM +0000, David Howells wrote:
> > Whether deny rmdir should have its own flag or not I don't know,
> > but from ovl POV I *think* it should not be a problem to deny rmdir
> > for the ovl upper/work dirs as long as ovl is mounted(?).
>
> What's the consequence of someone rearranging the directories directly in the
> contributing dirs whilst there's an overlay over them?
"Don't do it, then - presumably the kernel won't panic, but don't expect it to
try and invent nice semantics for the crap you are trying to pull"
On Fri, Jan 28, 2022 at 3:17 PM Al Viro <[email protected]> wrote:
>
> On Fri, Jan 28, 2022 at 11:35:59AM +0000, David Howells wrote:
>
> > > Whether deny rmdir should have its own flag or not I don't know,
> > > but from ovl POV I *think* it should not be a problem to deny rmdir
> > > for the ovl upper/work dirs as long as ovl is mounted(?).
> >
> > What's the consequence of someone rearranging the directories directly in the
> > contributing dirs whilst there's an overlay over them?
>
> "Don't do it, then - presumably the kernel won't panic, but don't expect it to
> try and invent nice semantics for the crap you are trying to pull"
IIUC, I think that is the point Dave was trying to make.
Nothing good can come out of allowing users to manipulate the overlay
upper/work dirs, so denying rmdir on those dirs that are already marked
with the OVL_INUSE flag is probably not a bad idea anyway, so ovl
and cachefiles could potentially use the same flag with same semantics.
Thanks,
Amir.