2009-10-14 09:59:14

by Joel Becker

[permalink] [raw]
Subject: [PATCH 0/2] [RFC] Adding the MAY_CREATE flag to ->permission()

Hey,
Ran into a fun problem in ocfs2. ocfs2, being a cluster
filesystem, has cluster locks. Being nice to our users, we allow
signals to interrupt the cluster locking layer if it hasn't gotten too
far yet (sleeping on local locking rather than the cluster).
Now, system calls are only allowed to return -ERESTARTSYS if
they can be safely restarted. In ocfs2_mknod(), which underlies
mkdir(2), mknod(2), and creat(2), we allow signals to interrupt us while
we gather our locks, but once we start changing things, there's no going
back. Everyone else does the same thing.
The problem is open(O_CREAT|O_EXCL). See, ocfs2_mknod() will
successfully create the file. Then we get back to
__open_namei_create(), which promptly calls may_open(). This is
backended by ocfs2_permission(), and it needs the cluster lock to
check the new inode's permissions. Send a signal here, and the ocfs2
code will return -ERESTARTSYS. (This is easily verified via
'git-checkout'). When entry.S restarts the open(O_CREAT|O_EXCL), it
gets -EEXIST. Ouch!
We can't naively block signals in ocfs2_permission(). The
majority of calls are not for O_CREAT|O_EXCL. So how do we let
ocfs2_permission() know about this case?
Christoph's suggestion was a new flag to ->permission(). I've
picked MAY_CREATE, but I'm totally open to a better name. I'm open to a
better solution too.
Following this are the MAY_CREATE patch and the ocfs2 patch to
make use of it.

Joel




2009-10-14 09:59:15

by Joel Becker

[permalink] [raw]
Subject: [PATCH 1/2] vfs: Add MAY_CREATE to the permission() flags.

A simple rule of system calls is that you cannot return -ERESTARTSYS
after you've made non-idempotent changes. ocfs2 has run into this with
open(O_CREAT|O_EXCL). Once you've created the file, you can't restart
the open(), because O_CREAT|O_EXCL will trigger -EEXIST.

The problem is that ocfs2 is catching the signal ->permission(), called
by may_open(). This happens after ->create() has successfully created
the file. ocfs2_permission() has to get a cluster lock, and this is
what can be interrupted by a signal. Now, obviously we want to block
signals in the O_CREAT|O_EXCL case, but ocfs2_permission() has no way of
knowing it just got called from open_namei_create().

So we add the MAY_CREATE flag to permission(). open_namei_create() will
pass it to may_open(), and then ocfs2 can block signals in
ocfs2_permission() as appropriate. The same is true of any other
filesystem that has to do work in may_open().

Signed-off-by: Joel Becker <[email protected]>
---
fs/namei.c | 2 +-
include/linux/fs.h | 1 +
2 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index d11f404..d54cb98 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1623,7 +1623,7 @@ out_unlock:
if (error)
return error;
/* Don't check for write permission, don't truncate */
- return may_open(&nd->path, 0, flag & ~O_TRUNC);
+ return may_open(&nd->path, MAY_CREATE, flag & ~O_TRUNC);
}

/*
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 2620a8c..b1a454c 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -53,6 +53,7 @@ struct inodes_stat_t {
#define MAY_APPEND 8
#define MAY_ACCESS 16
#define MAY_OPEN 32
+#define MAY_CREATE 64

/*
* flags in file.f_mode. Note that FMODE_READ and FMODE_WRITE must correspond
--
1.6.3.3

2009-10-14 09:59:17

by Joel Becker

[permalink] [raw]
Subject: [PATCH 2/2] ocfs2: Use MAY_CREATE in ocfs2_permission()

ocfs2 has a problem with open(O_CREAT|O_EXCL). Once you've created the
file, you can't restart the open(), because O_CREAT|O_EXCL will trigger
-EEXIST.

The problem is that ocfs2 is catching the signal ->permission(), called
by may_open(). This happens after ->create() has successfully created
the file. ocfs2_permission() has to get a cluster lock, and this is
what can be interrupted by a signal. Now, obviously we want to block
signals in the O_CREAT|O_EXCL case, but ocfs2_permission() has no way of
knowing it just got called from open_namei_create().

We key on the MAY_CREATE flag passed to permission to block signals.

Signed-off-by: Joel Becker <[email protected]>
---
fs/ocfs2/file.c | 13 +++++++++++++
1 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index 89fc8ee..b8749fa 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -1141,9 +1141,18 @@ bail:
int ocfs2_permission(struct inode *inode, int mask)
{
int ret;
+ sigset_t oldset;

mlog_entry_void();

+ /*
+ * If this inode was just created by open(O_CREAT|O_EXCL), we
+ * can't allow signal restarting. So we need to block signals
+ * around the cluster locking.
+ */
+ if (mask & MAY_CREATE)
+ ocfs2_block_signals(&oldset);
+
ret = ocfs2_inode_lock(inode, NULL, 0);
if (ret) {
if (ret != -ENOENT)
@@ -1154,7 +1163,11 @@ int ocfs2_permission(struct inode *inode, int mask)
ret = generic_permission(inode, mask, ocfs2_check_acl);

ocfs2_inode_unlock(inode, 0);
+
out:
+ if (mask & MAY_CREATE)
+ ocfs2_unblock_signals(&oldset);
+
mlog_exit(ret);
return ret;
}
--
1.6.3.3