2014-01-17 10:07:32

by Pavel Shilovsky

[permalink] [raw]
Subject: [PATCH v7 0/7] Add O_DENY* support for VFS and CIFS/NFS

This patchset adds support of O_DENY* flags for Linux fs layer. These flags can be used by any application that needs share reservations to organize a file access. VFS already has some sort of this capability - now it's done through flock/LOCK_MAND mechanis, but that approach is non-atomic. This patchset build new capabilities on top of the existing one but doesn't bring any changes into the flock call semantic.

These flags can be used by NFS (built-in-kernel) and CIFS (Samba) servers and Wine applications through VFS (for local filesystems) or CIFS/NFS modules. This will help when e.g. Samba and NFS server share the same directory for Windows and Linux users or Wine applications use Samba/NFS share to access the same data from different clients.

According to the previous discussions the most problematic question is how to prevent situations like DoS attacks where e.g /lib/liba.so file can be open with DENYREAD, or smth like this. That's why extra mount option 'sharelock' is added for switching on/off O_DENY* flags processing. It allows to avoid use of these flags on critical places (like /, /lib) and turn them on if we really want it proccessed that way.

So, we have 3 new flags:
O_DENYREAD - to prevent other opens with read access,
O_DENYWRITE - to prevent other opens with write access,
O_DENYDELETE - to prevent delete operations

The patchset avoids data races problem on newely created files: open with O_CREAT can return the -ESHAREDENIED error for a successfully created file if this files was locked with a sharelock by another task. Also, it turns flock/LOCK_MAND capability off for mounts with 'sharelock' option. This lets not mix one share reservation approach with another.

Patches #1, #2 and #3 add O_DENY* flags to fcntl and implement VFS part. A patch #4 is related to CIFS client changes, #5 -- NFSD, #6 and #7 describe NFSv4 client parts.

The preliminary patch for Samba that replaces the existing use of flock/LOCK_MAND mechanism with O_DENY* flags:
http://git.etersoft.ru/people/piastry/packages/?p=samba.git;a=commitdiff;h=173070262b7f29134ad6b2e49a760017c11aec4a

The future part of open man page patch:

-----
O_DENYREAD, O_DENYWRIYE, O_DENYDELETE - used to inforce a mandatory share reservation scheme of the file access. If these flag is passed, the open fails with -ESHAREDENIED in following cases:
1) if O_DENYREAD flag is specified and there is another open with READ access to the file;
2) if O_DENYWRITE flag is specified and there is another open with WRITE access to the file;
3) if READ access is requested and there is another open with O_DENYREAD flags;
4) if WRITE access is requested and there is another open with O_DENYWRITE flags.

If O_DENYDELETE flag is specified and the open succeded, any further unlink operation will fail with -ESHAREDENIED untill this open is closed. Now this flag is processed by VFS and CIFS filesystem. NFS returns -EINVAL for opens with this flag.

This mechanism is done through flocks. If O_DENY* flags are specified, flock syscall is processed after the open. The share modes are translated into flock according the following rules:
1) !O_DENYREAD -> LOCK_READ | LOCK_MAND
2) !O_DENYWRITE -> LOCK_WRITE | LOCK_MAND
3) !O_DENYDELETE -> LOCK_DELETE | LOCK_MAND
-----

Pavel Shilovsky (7):
VFS: Introduce new O_DENY* open flags
VFS: Add O_DENYDELETE support for VFS
locks: Disable LOCK_MAND support for MS_SHARELOCK mounts
CIFS: Add O_DENY* open flags support
NFSD: Pass share reservations flags to VFS
NFSv4: Add deny state handling for nfs4_state struct
NFSv4: Add O_DENY* open flags support

arch/alpha/include/uapi/asm/errno.h | 2 +
arch/alpha/include/uapi/asm/fcntl.h | 3 +
arch/mips/include/uapi/asm/errno.h | 2 +
arch/parisc/include/uapi/asm/errno.h | 2 +
arch/parisc/include/uapi/asm/fcntl.h | 3 +
arch/sparc/include/uapi/asm/errno.h | 2 +
arch/sparc/include/uapi/asm/fcntl.h | 3 +
fs/cifs/cifsacl.c | 2 +
fs/cifs/cifsfs.c | 2 +-
fs/cifs/cifsglob.h | 17 +-
fs/cifs/cifssmb.c | 2 +-
fs/cifs/dir.c | 6 +
fs/cifs/file.c | 20 ++-
fs/cifs/inode.c | 12 +-
fs/cifs/link.c | 2 +
fs/cifs/netmisc.c | 2 +-
fs/cifs/smb1ops.c | 3 +
fs/cifs/smb2inode.c | 1 +
fs/cifs/smb2maperror.c | 2 +-
fs/cifs/smb2ops.c | 6 +
fs/cifs/smb2pdu.c | 2 +-
fs/fcntl.c | 5 +-
fs/locks.c | 161 ++++++++++++++++--
fs/namei.c | 56 +++++-
fs/nfs/dir.c | 39 ++++-
fs/nfs/inode.c | 8 +-
fs/nfs/internal.h | 3 +-
fs/nfs/nfs4_fs.h | 83 ++++++++-
fs/nfs/nfs4file.c | 8 +-
fs/nfs/nfs4proc.c | 310 +++++++++++++++++++++++-----------
fs/nfs/nfs4state.c | 65 ++++---
fs/nfs/nfs4super.c | 9 +-
fs/nfs/nfs4xdr.c | 21 ++-
fs/nfs/super.c | 7 +-
fs/nfsd/nfs4state.c | 46 ++++-
fs/nfsd/nfsproc.c | 1 +
fs/proc_namespace.c | 1 +
include/linux/fs.h | 15 ++
include/linux/nfs_fs.h | 5 +-
include/linux/nfs_xdr.h | 1 +
include/uapi/asm-generic/errno.h | 2 +
include/uapi/asm-generic/fcntl.h | 12 ++
include/uapi/linux/fs.h | 1 +
43 files changed, 789 insertions(+), 166 deletions(-)

--
1.7.10.4


2014-01-17 10:07:36

by Pavel Shilovsky

[permalink] [raw]
Subject: [PATCH v7 2/7] VFS: Add O_DENYDELETE support for VFS

Introduce new LOCK_DELETE flock flag that is suggested to be used
internally only to map O_DENYDELETE open flag:

!O_DENYDELETE -> LOCK_DELETE | LOCK_MAND.

Signed-off-by: Pavel Shilovsky <[email protected]>
---
fs/locks.c | 56 ++++++++++++++++++++++++++++++++------
fs/namei.c | 3 ++
include/linux/fs.h | 6 ++++
include/uapi/asm-generic/fcntl.h | 1 +
4 files changed, 57 insertions(+), 9 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index ffde4d4..7ecc511 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -295,7 +295,7 @@ EXPORT_SYMBOL(locks_copy_lock);

static inline int flock_translate_cmd(int cmd) {
if (cmd & LOCK_MAND)
- return cmd & (LOCK_MAND | LOCK_RW);
+ return cmd & (LOCK_MAND | LOCK_RW | LOCK_DELETE);
switch (cmd) {
case LOCK_SH:
return F_RDLCK;
@@ -717,6 +717,8 @@ deny_flags_to_cmd(unsigned int flags)
cmd |= LOCK_READ;
if (!(flags & O_DENYWRITE))
cmd |= LOCK_WRITE;
+ if (!(flags & O_DENYDELETE))
+ cmd |= LOCK_DELETE;

return cmd;
}
@@ -941,6 +943,34 @@ out:
return error;
}

+int
+sharelock_may_delete(struct dentry *dentry)
+{
+ struct file_lock **before;
+ struct inode *inode = dentry->d_inode;
+ int rc = 0;
+
+ if (!IS_SHARELOCK(inode))
+ return rc;
+
+ spin_lock(&inode->i_lock);
+ for_each_lock(inode, before) {
+ struct file_lock *fl = *before;
+ if (IS_POSIX(fl))
+ break;
+ if (IS_LEASE(fl))
+ continue;
+ if (!(fl->fl_type & LOCK_MAND))
+ continue;
+ if (fl->fl_type & LOCK_DELETE)
+ continue;
+ rc = 1;
+ break;
+ }
+ spin_unlock(&inode->i_lock);
+ return rc;
+}
+
/*
* Determine if a file is allowed to be opened with specified access and share
* modes. Lock the file and return 0 if checks passed, otherwise return
@@ -955,10 +985,6 @@ sharelock_lock_file(struct file *filp)
if (!IS_SHARELOCK(filp->f_path.dentry->d_inode))
return error;

- /* Disable O_DENYDELETE support for now */
- if (filp->f_flags & O_DENYDELETE)
- return -EINVAL;
-
error = flock_make_lock(filp, &lock, deny_flags_to_cmd(filp->f_flags));
if (error)
return error;
@@ -1872,6 +1898,12 @@ SYSCALL_DEFINE2(flock, unsigned int, fd, unsigned int, cmd)
if (!f.file)
goto out;

+ /* LOCK_DELETE is defined to be translated from O_DENYDELETE only */
+ if (cmd & LOCK_DELETE) {
+ error = -EINVAL;
+ goto out;
+ }
+
can_sleep = !(cmd & LOCK_NB);
cmd &= ~LOCK_NB;
unlock = (cmd == LOCK_UN);
@@ -2419,10 +2451,16 @@ static void lock_get_status(struct seq_file *f, struct file_lock *fl,
seq_printf(f, "UNKNOWN UNKNOWN ");
}
if (fl->fl_type & LOCK_MAND) {
- seq_printf(f, "%s ",
- (fl->fl_type & LOCK_READ)
- ? (fl->fl_type & LOCK_WRITE) ? "RW " : "READ "
- : (fl->fl_type & LOCK_WRITE) ? "WRITE" : "NONE ");
+ if (fl->fl_type & LOCK_DELETE)
+ seq_printf(f, "%s ",
+ (fl->fl_type & LOCK_READ) ?
+ (fl->fl_type & LOCK_WRITE) ? "RWDEL" : "RDDEL" :
+ (fl->fl_type & LOCK_WRITE) ? "WRDEL" : "DEL ");
+ else
+ seq_printf(f, "%s ",
+ (fl->fl_type & LOCK_READ) ?
+ (fl->fl_type & LOCK_WRITE) ? "RW " : "READ " :
+ (fl->fl_type & LOCK_WRITE) ? "WRITE" : "NONE ");
} else {
seq_printf(f, "%s ",
(lease_breaking(fl))
diff --git a/fs/namei.c b/fs/namei.c
index 2b741a1..1643ab8 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2383,6 +2383,7 @@ static inline int check_sticky(struct inode *dir, struct inode *inode)
* 9. We can't remove a root or mountpoint.
* 10. We don't allow removal of NFS sillyrenamed files; it's handled by
* nfs_async_unlink().
+ * 11. We can't do it if victim is locked by O_DENYDELETE sharelock.
*/
static int may_delete(struct inode *dir, struct dentry *victim, bool isdir)
{
@@ -2416,6 +2417,8 @@ static int may_delete(struct inode *dir, struct dentry *victim, bool isdir)
return -ENOENT;
if (victim->d_flags & DCACHE_NFSFS_RENAMED)
return -EBUSY;
+ if (sharelock_may_delete(victim))
+ return -ESHAREDENIED;
return 0;
}

diff --git a/include/linux/fs.h b/include/linux/fs.h
index aa061ca..6dc9275 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1030,6 +1030,7 @@ extern int lease_modify(struct file_lock **, int);
extern int lock_may_read(struct inode *, loff_t start, unsigned long count);
extern int lock_may_write(struct inode *, loff_t start, unsigned long count);
extern int sharelock_lock_file(struct file *);
+extern int sharelock_may_delete(struct dentry *);
#else /* !CONFIG_FILE_LOCKING */
static inline int fcntl_getlk(struct file *file, struct flock __user *user)
{
@@ -1176,6 +1177,11 @@ static inline int sharelock_lock_file(struct file *filp)
return 0;
}

+static inline int sharelock_may_delete(struct dentry *dentry)
+{
+ return 0;
+}
+
#endif /* !CONFIG_FILE_LOCKING */


diff --git a/include/uapi/asm-generic/fcntl.h b/include/uapi/asm-generic/fcntl.h
index 9881cfe..41ba131 100644
--- a/include/uapi/asm-generic/fcntl.h
+++ b/include/uapi/asm-generic/fcntl.h
@@ -175,6 +175,7 @@ struct f_owner_ex {
blocking */
#define LOCK_UN 8 /* remove lock */

+#define LOCK_DELETE 16 /* which allows to delete a file */
#define LOCK_MAND 32 /* This is a mandatory flock ... */
#define LOCK_READ 64 /* which allows concurrent read operations */
#define LOCK_WRITE 128 /* which allows concurrent write operations */
--
1.7.10.4

2014-01-17 10:07:48

by Pavel Shilovsky

[permalink] [raw]
Subject: [PATCH v7 5/7] NFSD: Pass share reservations flags to VFS

that maps them into O_DENY* flags and make them visible for
applications on mounts with sharelock option.

Signed-off-by: Pavel Shilovsky <[email protected]>
---
fs/locks.c | 1 +
fs/nfsd/nfs4state.c | 46 +++++++++++++++++++++++++++++++++++++++++++++-
fs/nfsd/nfsproc.c | 1 +
3 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/fs/locks.c b/fs/locks.c
index 6e78c0a..805afd4 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -1005,6 +1005,7 @@ sharelock_lock_file(struct file *filp)
locks_free_lock(lock);
return error;
}
+EXPORT_SYMBOL(sharelock_lock_file);

static int __posix_lock_file(struct inode *inode, struct file_lock *request, struct file_lock *conflock)
{
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 105d6fa..e613bff 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -571,6 +571,19 @@ test_deny(u32 access, struct nfs4_ol_stateid *stp)
return test_bit(access, &stp->st_deny_bmap);
}

+static int nfs4_deny_to_odeny(u32 access)
+{
+ switch (access & NFS4_SHARE_DENY_BOTH) {
+ case NFS4_SHARE_DENY_READ:
+ return O_DENYREAD;
+ case NFS4_SHARE_DENY_WRITE:
+ return O_DENYWRITE;
+ case NFS4_SHARE_DENY_BOTH:
+ return O_DENYREAD | O_DENYWRITE;
+ }
+ return 0;
+}
+
static int nfs4_access_to_omode(u32 access)
{
switch (access & NFS4_SHARE_ACCESS_BOTH) {
@@ -2960,6 +2973,21 @@ nfsd4_truncate(struct svc_rqst *rqstp, struct svc_fh *fh,
}

static __be32
+nfs4_vfs_set_deny(struct nfs4_file *fp, unsigned long share_access,
+ unsigned long deny_access)
+{
+ int oflag, rc;
+ __be32 status = nfs_ok;
+
+ oflag = nfs4_access_to_omode(share_access);
+ fp->fi_fds[oflag]->f_flags |= nfs4_deny_to_odeny(deny_access);
+ rc = sharelock_lock_file(fp->fi_fds[oflag]);
+ if (rc)
+ status = nfserrno(rc);
+ return status;
+}
+
+static __be32
nfs4_upgrade_open(struct svc_rqst *rqstp, struct nfs4_file *fp, struct svc_fh *cur_fh, struct nfs4_ol_stateid *stp, struct nfsd4_open *open)
{
u32 op_share_access = open->op_share_access;
@@ -2980,6 +3008,14 @@ nfs4_upgrade_open(struct svc_rqst *rqstp, struct nfs4_file *fp, struct svc_fh *c
}
return status;
}
+ status = nfs4_vfs_set_deny(fp, op_share_access, open->op_share_deny);
+ if (status) {
+ if (new_access) {
+ int oflag = nfs4_access_to_omode(op_share_access);
+ nfs4_file_put_access(fp, oflag);
+ }
+ return status;
+ }
/* remember the open */
set_access(op_share_access, stp);
set_deny(open->op_share_deny, stp);
@@ -3234,7 +3270,7 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf

/*
* OPEN the file, or upgrade an existing OPEN.
- * If truncate fails, the OPEN fails.
+ * If truncate or set_deny fails, the OPEN fails.
*/
if (stp) {
/* Stateid was found, this is an OPEN upgrade */
@@ -3248,6 +3284,10 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf
status = nfsd4_truncate(rqstp, current_fh, open);
if (status)
goto out;
+ status = nfs4_vfs_set_deny(fp, open->op_share_access,
+ open->op_share_deny);
+ if (status)
+ goto out;
stp = open->op_stp;
open->op_stp = NULL;
init_open_stateid(stp, fp, open);
@@ -3955,6 +3995,10 @@ nfsd4_open_downgrade(struct svc_rqst *rqstp,
}
nfs4_stateid_downgrade(stp, od->od_share_access);

+ status = nfs4_vfs_set_deny(stp->st_file, od->od_share_access,
+ od->od_share_deny);
+ if (status)
+ goto out;
reset_union_bmap_deny(od->od_share_deny, stp);

update_stateid(&stp->st_stid.sc_stateid);
diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c
index 54c6b3d..ff95544 100644
--- a/fs/nfsd/nfsproc.c
+++ b/fs/nfsd/nfsproc.c
@@ -743,6 +743,7 @@ nfserrno (int errno)
{ nfserr_notsupp, -EOPNOTSUPP },
{ nfserr_toosmall, -ETOOSMALL },
{ nfserr_serverfault, -ESERVERFAULT },
+ { nfserr_share_denied, -ESHAREDENIED },
};
int i;

--
1.7.10.4

2014-01-17 10:07:46

by Pavel Shilovsky

[permalink] [raw]
Subject: [PATCH v7 4/7] CIFS: Add O_DENY* open flags support

Construct share_access value from O_DENY* flags and send it to
the server. Use NTCreateAndX command rather than Trans2
all the time we have any of O_DENY* flags regardless of unix
extensions support. Also change smb error mapping of
NT_STATUS_SHARING_VIOLATION to -ESHAREDENIED.

Signed-off-by: Pavel Shilovsky <[email protected]>
---
fs/cifs/cifsacl.c | 2 ++
fs/cifs/cifsfs.c | 2 +-
fs/cifs/cifsglob.h | 17 ++++++++++++++++-
fs/cifs/cifssmb.c | 2 +-
fs/cifs/dir.c | 6 ++++++
fs/cifs/file.c | 20 +++++++++++++++++---
fs/cifs/inode.c | 12 ++++++++++--
fs/cifs/link.c | 2 ++
fs/cifs/netmisc.c | 2 +-
fs/cifs/smb1ops.c | 3 +++
fs/cifs/smb2inode.c | 1 +
fs/cifs/smb2maperror.c | 2 +-
fs/cifs/smb2ops.c | 6 ++++++
fs/cifs/smb2pdu.c | 2 +-
fs/locks.c | 11 ++++++++++-
include/linux/fs.h | 1 +
16 files changed, 79 insertions(+), 12 deletions(-)

diff --git a/fs/cifs/cifsacl.c b/fs/cifs/cifsacl.c
index 8f9b4f7..6665602 100644
--- a/fs/cifs/cifsacl.c
+++ b/fs/cifs/cifsacl.c
@@ -912,6 +912,7 @@ static struct cifs_ntsd *get_cifs_acl_by_path(struct cifs_sb_info *cifs_sb,
oparms.tcon = tcon;
oparms.cifs_sb = cifs_sb;
oparms.desired_access = READ_CONTROL;
+ oparms.share_access = FILE_SHARE_ALL;
oparms.create_options = create_options;
oparms.disposition = FILE_OPEN;
oparms.path = path;
@@ -981,6 +982,7 @@ int set_cifs_acl(struct cifs_ntsd *pnntsd, __u32 acllen,
oparms.tcon = tcon;
oparms.cifs_sb = cifs_sb;
oparms.desired_access = access_flags;
+ oparms.share_access = FILE_SHARE_ALL;
oparms.create_options = create_options;
oparms.disposition = FILE_OPEN;
oparms.path = path;
diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
index 849f613..1e07a97 100644
--- a/fs/cifs/cifsfs.c
+++ b/fs/cifs/cifsfs.c
@@ -819,7 +819,7 @@ struct file_system_type cifs_fs_type = {
.name = "cifs",
.mount = cifs_do_mount,
.kill_sb = cifs_kill_sb,
- /* .fs_flags */
+ .fs_flags = FS_DOES_SHARELOCK,
};
MODULE_ALIAS_FS("cifs");
const struct inode_operations cifs_dir_inode_ops = {
diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
index 61228b7..acc799c 100644
--- a/fs/cifs/cifsglob.h
+++ b/fs/cifs/cifsglob.h
@@ -489,7 +489,7 @@ struct smb_vol {
CIFS_MOUNT_CIFS_BACKUPUID | CIFS_MOUNT_CIFS_BACKUPGID)

#define CIFS_MS_MASK (MS_RDONLY | MS_MANDLOCK | MS_NOEXEC | MS_NOSUID | \
- MS_NODEV | MS_SYNCHRONOUS)
+ MS_NODEV | MS_SYNCHRONOUS | MS_SHARELOCK)

struct cifs_mnt_data {
struct cifs_sb_info *cifs_sb;
@@ -961,6 +961,7 @@ struct cifs_open_parms {
struct cifs_sb_info *cifs_sb;
int disposition;
int desired_access;
+ int share_access;
int create_options;
const char *path;
struct cifs_fid *fid;
@@ -1005,6 +1006,20 @@ struct cifsFileInfo {
struct work_struct oplock_break; /* work for oplock breaks */
};

+static inline int
+cifs_get_share_flags(unsigned int flags)
+{
+ int share_access = 0;
+
+ if (!(flags & O_DENYREAD))
+ share_access |= FILE_SHARE_READ;
+ if (!(flags & O_DENYWRITE))
+ share_access |= FILE_SHARE_WRITE;
+ if (!(flags & O_DENYDELETE))
+ share_access |= FILE_SHARE_DELETE;
+ return share_access;
+}
+
struct cifs_io_parms {
__u16 netfid;
#ifdef CONFIG_CIFS_SMB2
diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c
index 4d881c3..e6b2b87 100644
--- a/fs/cifs/cifssmb.c
+++ b/fs/cifs/cifssmb.c
@@ -1347,7 +1347,7 @@ openRetry:
if (create_options & CREATE_OPTION_READONLY)
req->FileAttributes |= cpu_to_le32(ATTR_READONLY);

- req->ShareAccess = cpu_to_le32(FILE_SHARE_ALL);
+ req->ShareAccess = cpu_to_le32(oparms->share_access);
req->CreateDisposition = cpu_to_le32(disposition);
req->CreateOptions = cpu_to_le32(create_options & CREATE_OPTIONS_MASK);

diff --git a/fs/cifs/dir.c b/fs/cifs/dir.c
index d3a6796..b1287601 100644
--- a/fs/cifs/dir.c
+++ b/fs/cifs/dir.c
@@ -198,6 +198,7 @@ cifs_do_create(struct inode *inode, struct dentry *direntry, unsigned int xid,
int rc = -ENOENT;
int create_options = CREATE_NOT_DIR;
int desired_access;
+ int share_access = FILE_SHARE_ALL;
struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb);
struct cifs_tcon *tcon = tlink_tcon(tlink);
char *full_path = NULL;
@@ -217,7 +218,11 @@ cifs_do_create(struct inode *inode, struct dentry *direntry, unsigned int xid,
goto out;
}

+ if (IS_SHARELOCK(inode))
+ share_access = cifs_get_share_flags(oflags);
+
if (tcon->unix_ext && cap_unix(tcon->ses) && !tcon->broken_posix_open &&
+ (share_access == FILE_SHARE_ALL) &&
(CIFS_UNIX_POSIX_PATH_OPS_CAP &
le64_to_cpu(tcon->fsUnixInfo.Capability))) {
rc = cifs_posix_open(full_path, &newinode, inode->i_sb, mode,
@@ -324,6 +329,7 @@ cifs_do_create(struct inode *inode, struct dentry *direntry, unsigned int xid,
oparms.tcon = tcon;
oparms.cifs_sb = cifs_sb;
oparms.desired_access = desired_access;
+ oparms.share_access = share_access;
oparms.create_options = create_options;
oparms.disposition = disposition;
oparms.path = full_path;
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 853d6d1..ad92572 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -179,6 +179,7 @@ cifs_nt_open(char *full_path, struct inode *inode, struct cifs_sb_info *cifs_sb,
{
int rc;
int desired_access;
+ int share_access = FILE_SHARE_ALL;
int disposition;
int create_options = CREATE_NOT_DIR;
FILE_ALL_INFO *buf;
@@ -215,6 +216,8 @@ cifs_nt_open(char *full_path, struct inode *inode, struct cifs_sb_info *cifs_sb,
*********************************************************************/

disposition = cifs_get_disposition(f_flags);
+ if (IS_SHARELOCK(inode))
+ share_access = cifs_get_share_flags(f_flags);

/* BB pass O_SYNC flag through on file attributes .. BB */

@@ -228,6 +231,7 @@ cifs_nt_open(char *full_path, struct inode *inode, struct cifs_sb_info *cifs_sb,
oparms.tcon = tcon;
oparms.cifs_sb = cifs_sb;
oparms.desired_access = desired_access;
+ oparms.share_access = share_access;
oparms.create_options = create_options;
oparms.disposition = disposition;
oparms.path = full_path;
@@ -432,11 +436,11 @@ void cifsFileInfo_put(struct cifsFileInfo *cifs_file)
}

int cifs_open(struct inode *inode, struct file *file)
-
{
int rc = -EACCES;
unsigned int xid;
__u32 oplock;
+ int share_access = FILE_SHARE_ALL;
struct cifs_sb_info *cifs_sb;
struct TCP_Server_Info *server;
struct cifs_tcon *tcon;
@@ -472,8 +476,12 @@ int cifs_open(struct inode *inode, struct file *file)
else
oplock = 0;

- if (!tcon->broken_posix_open && tcon->unix_ext &&
- cap_unix(tcon->ses) && (CIFS_UNIX_POSIX_PATH_OPS_CAP &
+ if (IS_SHARELOCK(inode))
+ share_access = cifs_get_share_flags(file->f_flags);
+
+ if (!tcon->broken_posix_open && tcon->unix_ext && cap_unix(tcon->ses) &&
+ (share_access == FILE_SHARE_ALL) &&
+ (CIFS_UNIX_POSIX_PATH_OPS_CAP &
le64_to_cpu(tcon->fsUnixInfo.Capability))) {
/* can not refresh inode info since size could be stale */
rc = cifs_posix_open(full_path, &inode, inode->i_sb,
@@ -595,6 +603,7 @@ cifs_reopen_file(struct cifsFileInfo *cfile, bool can_flush)
struct inode *inode;
char *full_path = NULL;
int desired_access;
+ int share_access = FILE_SHARE_ALL;
int disposition = FILE_OPEN;
int create_options = CREATE_NOT_DIR;
struct cifs_open_parms oparms;
@@ -635,7 +644,11 @@ cifs_reopen_file(struct cifsFileInfo *cfile, bool can_flush)
else
oplock = 0;

+ if (IS_SHARELOCK(inode))
+ share_access = cifs_get_share_flags(cfile->f_flags);
+
if (tcon->unix_ext && cap_unix(tcon->ses) &&
+ (share_access == FILE_SHARE_ALL) &&
(CIFS_UNIX_POSIX_PATH_OPS_CAP &
le64_to_cpu(tcon->fsUnixInfo.Capability))) {
/*
@@ -670,6 +683,7 @@ cifs_reopen_file(struct cifsFileInfo *cfile, bool can_flush)
oparms.tcon = tcon;
oparms.cifs_sb = cifs_sb;
oparms.desired_access = desired_access;
+ oparms.share_access = share_access;
oparms.create_options = create_options;
oparms.disposition = disposition;
oparms.path = full_path;
diff --git a/fs/cifs/inode.c b/fs/cifs/inode.c
index 9cb9679..8a27905 100644
--- a/fs/cifs/inode.c
+++ b/fs/cifs/inode.c
@@ -441,6 +441,7 @@ cifs_sfu_type(struct cifs_fattr *fattr, const char *path,
oparms.tcon = tcon;
oparms.cifs_sb = cifs_sb;
oparms.desired_access = GENERIC_READ;
+ oparms.share_access = FILE_SHARE_ALL;
oparms.create_options = CREATE_NOT_DIR;
oparms.disposition = FILE_OPEN;
oparms.path = path;
@@ -1068,6 +1069,7 @@ cifs_rename_pending_delete(const char *full_path, struct dentry *dentry,
oparms.tcon = tcon;
oparms.cifs_sb = cifs_sb;
oparms.desired_access = DELETE | FILE_WRITE_ATTRIBUTES;
+ oparms.share_access = FILE_SHARE_ALL;
oparms.create_options = CREATE_NOT_DIR;
oparms.disposition = FILE_OPEN;
oparms.path = full_path;
@@ -1139,6 +1141,8 @@ cifs_rename_pending_delete(const char *full_path, struct dentry *dentry,
out_close:
CIFSSMBClose(xid, tcon, fid.netfid);
out:
+ if (rc == -ESHAREDENIED)
+ rc = -EBUSY;
kfree(info_buf);
cifs_put_tlink(tlink);
return rc;
@@ -1237,7 +1241,8 @@ psx_del_no_retry:
cifs_drop_nlink(inode);
} else if (rc == -ENOENT) {
d_drop(dentry);
- } else if (rc == -EBUSY) {
+ } else if (rc == -ESHAREDENIED) {
+ rc = -EBUSY;
if (server->ops->rename_pending_delete) {
rc = server->ops->rename_pending_delete(full_path,
dentry, xid);
@@ -1587,7 +1592,7 @@ cifs_do_rename(const unsigned int xid, struct dentry *from_dentry,
* source. Note that cross directory moves do not work with
* rename by filehandle to various Windows servers.
*/
- if (rc == 0 || rc != -EBUSY)
+ if (rc != -ESHAREDENIED)
goto do_rename_exit;

/* open-file renames don't work across directories */
@@ -1598,6 +1603,7 @@ cifs_do_rename(const unsigned int xid, struct dentry *from_dentry,
oparms.cifs_sb = cifs_sb;
/* open the file to be renamed -- we need DELETE perms */
oparms.desired_access = DELETE;
+ oparms.share_access = FILE_SHARE_ALL;
oparms.create_options = CREATE_NOT_DIR;
oparms.disposition = FILE_OPEN;
oparms.path = from_path;
@@ -1613,6 +1619,8 @@ cifs_do_rename(const unsigned int xid, struct dentry *from_dentry,
CIFSSMBClose(xid, tcon, fid.netfid);
}
do_rename_exit:
+ if (rc == -ESHAREDENIED)
+ rc = -EBUSY;
cifs_put_tlink(tlink);
return rc;
}
diff --git a/fs/cifs/link.c b/fs/cifs/link.c
index 46eff5e..4dbe6cd 100644
--- a/fs/cifs/link.c
+++ b/fs/cifs/link.c
@@ -329,6 +329,7 @@ cifs_query_mf_symlink(unsigned int xid, struct cifs_tcon *tcon,
oparms.tcon = tcon;
oparms.cifs_sb = cifs_sb;
oparms.desired_access = GENERIC_READ;
+ oparms.share_access = FILE_SHARE_ALL;
oparms.create_options = CREATE_NOT_DIR;
oparms.disposition = FILE_OPEN;
oparms.path = path;
@@ -373,6 +374,7 @@ cifs_create_mf_symlink(unsigned int xid, struct cifs_tcon *tcon,
oparms.tcon = tcon;
oparms.cifs_sb = cifs_sb;
oparms.desired_access = GENERIC_WRITE;
+ oparms.share_access = FILE_SHARE_ALL;
oparms.create_options = create_options;
oparms.disposition = FILE_OPEN;
oparms.path = path;
diff --git a/fs/cifs/netmisc.c b/fs/cifs/netmisc.c
index 0498845..0d159e6 100644
--- a/fs/cifs/netmisc.c
+++ b/fs/cifs/netmisc.c
@@ -62,7 +62,7 @@ static const struct smb_to_posix_error mapping_table_ERRDOS[] = {
{ERRdiffdevice, -EXDEV},
{ERRnofiles, -ENOENT},
{ERRwriteprot, -EROFS},
- {ERRbadshare, -EBUSY},
+ {ERRbadshare, -ESHAREDENIED},
{ERRlock, -EACCES},
{ERRunsup, -EINVAL},
{ERRnosuchshare, -ENXIO},
diff --git a/fs/cifs/smb1ops.c b/fs/cifs/smb1ops.c
index 9ac5bfc..569ac21 100644
--- a/fs/cifs/smb1ops.c
+++ b/fs/cifs/smb1ops.c
@@ -566,6 +566,7 @@ cifs_query_path_info(const unsigned int xid, struct cifs_tcon *tcon,
oparms.tcon = tcon;
oparms.cifs_sb = cifs_sb;
oparms.desired_access = FILE_READ_ATTRIBUTES;
+ oparms.share_access = FILE_SHARE_ALL;
oparms.create_options = 0;
oparms.disposition = FILE_OPEN;
oparms.path = full_path;
@@ -802,6 +803,7 @@ smb_set_file_info(struct inode *inode, const char *full_path,
oparms.tcon = tcon;
oparms.cifs_sb = cifs_sb;
oparms.desired_access = SYNCHRONIZE | FILE_WRITE_ATTRIBUTES;
+ oparms.share_access = FILE_SHARE_ALL;
oparms.create_options = CREATE_NOT_DIR;
oparms.disposition = FILE_OPEN;
oparms.path = full_path;
@@ -969,6 +971,7 @@ cifs_query_symlink(const unsigned int xid, struct cifs_tcon *tcon,
oparms.tcon = tcon;
oparms.cifs_sb = cifs_sb;
oparms.desired_access = FILE_READ_ATTRIBUTES;
+ oparms.share_access = FILE_SHARE_ALL;
oparms.create_options = OPEN_REPARSE_POINT;
oparms.disposition = FILE_OPEN;
oparms.path = full_path;
diff --git a/fs/cifs/smb2inode.c b/fs/cifs/smb2inode.c
index 84c012a..bb4c325 100644
--- a/fs/cifs/smb2inode.c
+++ b/fs/cifs/smb2inode.c
@@ -55,6 +55,7 @@ smb2_open_op_close(const unsigned int xid, struct cifs_tcon *tcon,

oparms.tcon = tcon;
oparms.desired_access = desired_access;
+ oparms.share_access = FILE_SHARE_ALL;
oparms.disposition = create_disposition;
oparms.create_options = create_options;
oparms.fid = &fid;
diff --git a/fs/cifs/smb2maperror.c b/fs/cifs/smb2maperror.c
index 94bd4fb..d11ad98 100644
--- a/fs/cifs/smb2maperror.c
+++ b/fs/cifs/smb2maperror.c
@@ -356,7 +356,7 @@ static const struct status_to_posix_error smb2_error_map_table[] = {
{STATUS_PORT_CONNECTION_REFUSED, -ECONNREFUSED,
"STATUS_PORT_CONNECTION_REFUSED"},
{STATUS_INVALID_PORT_HANDLE, -EIO, "STATUS_INVALID_PORT_HANDLE"},
- {STATUS_SHARING_VIOLATION, -EBUSY, "STATUS_SHARING_VIOLATION"},
+ {STATUS_SHARING_VIOLATION, -ESHAREDENIED, "STATUS_SHARING_VIOLATION"},
{STATUS_QUOTA_EXCEEDED, -EDQUOT, "STATUS_QUOTA_EXCEEDED"},
{STATUS_INVALID_PAGE_PROTECTION, -EIO,
"STATUS_INVALID_PAGE_PROTECTION"},
diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c
index 757da3e..8129045 100644
--- a/fs/cifs/smb2ops.c
+++ b/fs/cifs/smb2ops.c
@@ -246,6 +246,7 @@ smb3_qfs_tcon(const unsigned int xid, struct cifs_tcon *tcon)

oparms.tcon = tcon;
oparms.desired_access = FILE_READ_ATTRIBUTES;
+ oparms.share_access = FILE_SHARE_ALL;
oparms.disposition = FILE_OPEN;
oparms.create_options = 0;
oparms.fid = &fid;
@@ -280,6 +281,7 @@ smb2_qfs_tcon(const unsigned int xid, struct cifs_tcon *tcon)

oparms.tcon = tcon;
oparms.desired_access = FILE_READ_ATTRIBUTES;
+ oparms.share_access = FILE_SHARE_ALL;
oparms.disposition = FILE_OPEN;
oparms.create_options = 0;
oparms.fid = &fid;
@@ -313,6 +315,7 @@ smb2_is_path_accessible(const unsigned int xid, struct cifs_tcon *tcon,

oparms.tcon = tcon;
oparms.desired_access = FILE_READ_ATTRIBUTES;
+ oparms.share_access = FILE_SHARE_ALL;
oparms.disposition = FILE_OPEN;
oparms.create_options = 0;
oparms.fid = &fid;
@@ -721,6 +724,7 @@ smb2_query_dir_first(const unsigned int xid, struct cifs_tcon *tcon,

oparms.tcon = tcon;
oparms.desired_access = FILE_READ_ATTRIBUTES | FILE_READ_DATA;
+ oparms.share_access = FILE_SHARE_ALL;
oparms.disposition = FILE_OPEN;
oparms.create_options = 0;
oparms.fid = fid;
@@ -808,6 +812,7 @@ smb2_queryfs(const unsigned int xid, struct cifs_tcon *tcon,

oparms.tcon = tcon;
oparms.desired_access = FILE_READ_ATTRIBUTES;
+ oparms.share_access = FILE_SHARE_ALL;
oparms.disposition = FILE_OPEN;
oparms.create_options = 0;
oparms.fid = &fid;
@@ -881,6 +886,7 @@ smb2_query_symlink(const unsigned int xid, struct cifs_tcon *tcon,

oparms.tcon = tcon;
oparms.desired_access = FILE_READ_ATTRIBUTES;
+ oparms.share_access = FILE_SHARE_ALL;
oparms.disposition = FILE_OPEN;
oparms.create_options = 0;
oparms.fid = &fid;
diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c
index 2013234..cca197e 100644
--- a/fs/cifs/smb2pdu.c
+++ b/fs/cifs/smb2pdu.c
@@ -1106,7 +1106,7 @@ SMB2_open(const unsigned int xid, struct cifs_open_parms *oparms, __le16 *path,
req->DesiredAccess = cpu_to_le32(oparms->desired_access);
/* File attributes ignored on open (used in create though) */
req->FileAttributes = cpu_to_le32(file_attributes);
- req->ShareAccess = FILE_SHARE_ALL_LE;
+ req->ShareAccess = cpu_to_le32(oparms->share_access);
req->CreateDisposition = cpu_to_le32(oparms->disposition);
req->CreateOptions = cpu_to_le32(oparms->create_options & CREATE_OPTIONS_MASK);
uni_path_len = (2 * UniStrnlen((wchar_t *)path, PATH_MAX)) + 2;
diff --git a/fs/locks.c b/fs/locks.c
index c7e780a..6e78c0a 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -953,6 +953,10 @@ sharelock_may_delete(struct dentry *dentry)
if (!IS_SHARELOCK(inode))
return rc;

+ /* Don't check a lock on file systems that do it internally */
+ if (inode->i_sb->s_type->fs_flags & FS_DOES_SHARELOCK)
+ return rc;
+
spin_lock(&inode->i_lock);
for_each_lock(inode, before) {
struct file_lock *fl = *before;
@@ -981,8 +985,13 @@ sharelock_lock_file(struct file *filp)
{
struct file_lock *lock;
int error = 0;
+ struct inode *inode = filp->f_path.dentry->d_inode;
+
+ if (!IS_SHARELOCK(inode))
+ return error;

- if (!IS_SHARELOCK(filp->f_path.dentry->d_inode))
+ /* Don't set a lock on file systems that do it internally */
+ if (inode->i_sb->s_type->fs_flags & FS_DOES_SHARELOCK)
return error;

error = flock_make_lock(filp, &lock, deny_flags_to_cmd(filp->f_flags));
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 6dc9275..1185c64 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1830,6 +1830,7 @@ struct file_system_type {
#define FS_USERNS_MOUNT 8 /* Can be mounted by userns root */
#define FS_USERNS_DEV_MOUNT 16 /* A userns mount does not imply MNT_NODEV */
#define FS_RENAME_DOES_D_MOVE 32768 /* FS will handle d_move() during rename() internally. */
+#define FS_DOES_SHARELOCK 65536 /* FS does sharelocks internally */
struct dentry *(*mount) (struct file_system_type *, int,
const char *, void *);
void (*kill_sb) (struct super_block *);
--
1.7.10.4

2014-01-17 10:09:05

by Pavel Shilovsky

[permalink] [raw]
Subject: [PATCH v7 6/7] NFSv4: Add deny state handling for nfs4_state struct

and prepare code intrastructure to handle O_DENY* flags.

Signed-off-by: Pavel Shilovsky <[email protected]>
---
fs/nfs/dir.c | 2 +-
fs/nfs/inode.c | 7 +-
fs/nfs/nfs4_fs.h | 41 +++++++++-
fs/nfs/nfs4file.c | 2 +-
fs/nfs/nfs4proc.c | 194 ++++++++++++++++++++++++++---------------------
fs/nfs/nfs4state.c | 34 ++++-----
fs/nfs/nfs4xdr.c | 7 +-
include/linux/nfs_fs.h | 5 +-
include/linux/nfs_xdr.h | 1 +
9 files changed, 177 insertions(+), 116 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 812154a..fe0c7bb 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -1362,7 +1362,7 @@ static fmode_t flags_to_mode(int flags)

static struct nfs_open_context *create_nfs_open_context(struct dentry *dentry, int open_flags)
{
- return alloc_nfs_open_context(dentry, flags_to_mode(open_flags));
+ return alloc_nfs_open_context(dentry, flags_to_mode(open_flags), 0);
}

static int do_open(struct inode *inode, struct file *filp)
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 00ad1c2..82f8593 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -714,7 +714,9 @@ void nfs_close_context(struct nfs_open_context *ctx, int is_sync)
}
EXPORT_SYMBOL_GPL(nfs_close_context);

-struct nfs_open_context *alloc_nfs_open_context(struct dentry *dentry, fmode_t f_mode)
+struct nfs_open_context *
+alloc_nfs_open_context(struct dentry *dentry, fmode_t f_mode,
+ unsigned int deny_mode)
{
struct nfs_open_context *ctx;
struct rpc_cred *cred = rpc_lookup_cred();
@@ -731,6 +733,7 @@ struct nfs_open_context *alloc_nfs_open_context(struct dentry *dentry, fmode_t f
ctx->cred = cred;
ctx->state = NULL;
ctx->mode = f_mode;
+ ctx->deny_mode = deny_mode;
ctx->flags = 0;
ctx->error = 0;
nfs_init_lock_context(&ctx->lock_context);
@@ -843,7 +846,7 @@ int nfs_open(struct inode *inode, struct file *filp)
{
struct nfs_open_context *ctx;

- ctx = alloc_nfs_open_context(filp->f_path.dentry, filp->f_mode);
+ ctx = alloc_nfs_open_context(filp->f_path.dentry, filp->f_mode, 0);
if (IS_ERR(ctx))
return PTR_ERR(ctx);
nfs_file_set_open_context(filp, ctx);
diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index 5609edc..c455acb 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -185,7 +185,9 @@ struct nfs4_state {
unsigned int n_rdonly; /* Number of read-only references */
unsigned int n_wronly; /* Number of write-only references */
unsigned int n_rdwr; /* Number of read/write references */
+
fmode_t state; /* State on the server (R,W, or RW) */
+ unsigned int deny_state; /* Deny state on the server */
atomic_t count;
};

@@ -421,9 +423,10 @@ extern void nfs4_put_state_owner(struct nfs4_state_owner *);
extern void nfs4_purge_state_owners(struct nfs_server *);
extern struct nfs4_state * nfs4_get_open_state(struct inode *, struct nfs4_state_owner *);
extern void nfs4_put_open_state(struct nfs4_state *);
-extern void nfs4_close_state(struct nfs4_state *, fmode_t);
-extern void nfs4_close_sync(struct nfs4_state *, fmode_t);
-extern void nfs4_state_set_mode_locked(struct nfs4_state *, fmode_t);
+extern void nfs4_close_state(struct nfs4_state *, fmode_t, unsigned int);
+extern void nfs4_close_sync(struct nfs4_state *, fmode_t, unsigned int);
+extern void nfs4_state_set_mode_locked(struct nfs4_state *, fmode_t,
+ unsigned int);
extern void nfs_inode_find_state_and_recover(struct inode *inode,
const nfs4_stateid *stateid);
extern void nfs4_schedule_lease_recovery(struct nfs_client *);
@@ -504,6 +507,38 @@ static inline bool nfs4_valid_open_stateid(const struct nfs4_state *state)
return test_bit(NFS_STATE_RECOVERY_FAILED, &state->flags) == 0;
}

+static inline unsigned int *
+get_state_n(struct nfs4_state *state, fmode_t mode, unsigned int deny_mode)
+{
+ switch (mode & (FMODE_READ|FMODE_WRITE)) {
+ case FMODE_READ:
+ return &state->n_rdonly;
+ case FMODE_WRITE:
+ return &state->n_wronly;
+ case FMODE_READ|FMODE_WRITE:
+ return &state->n_rdwr;
+ }
+ return NULL;
+}
+
+static inline void
+inc_state_n(struct nfs4_state *state, fmode_t mode, unsigned int deny_mode)
+{
+ unsigned int *state_n = get_state_n(state, mode, deny_mode);
+
+ if (state_n)
+ (*state_n)++;
+}
+
+static inline void
+dec_state_n(struct nfs4_state *state, fmode_t mode, unsigned int deny_mode)
+{
+ unsigned int *state_n = get_state_n(state, mode, deny_mode);
+
+ if (state_n)
+ (*state_n)--;
+}
+
#else

#define nfs4_close_state(a, b) do { } while (0)
diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c
index 8de3407..5f444f0 100644
--- a/fs/nfs/nfs4file.c
+++ b/fs/nfs/nfs4file.c
@@ -42,7 +42,7 @@ nfs4_file_open(struct inode *inode, struct file *filp)
parent = dget_parent(dentry);
dir = parent->d_inode;

- ctx = alloc_nfs_open_context(filp->f_path.dentry, filp->f_mode);
+ ctx = alloc_nfs_open_context(filp->f_path.dentry, filp->f_mode, 0);
err = PTR_ERR(ctx);
if (IS_ERR(ctx))
goto out;
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 15052b8..1b6f1fe 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -1087,25 +1087,36 @@ static int nfs4_wait_for_completion_rpc_task(struct rpc_task *task)
return ret;
}

+static inline unsigned int
+fmode_to_state_bit(fmode_t mode)
+{
+ switch (mode & (FMODE_READ|FMODE_WRITE)) {
+ case FMODE_READ:
+ return NFS_O_RDONLY_STATE;
+ case FMODE_WRITE:
+ return NFS_O_WRONLY_STATE;
+ default:
+ return NFS_O_RDWR_STATE;
+ }
+}
+
static int can_open_cached(struct nfs4_state *state, fmode_t mode, int open_mode)
{
int ret = 0;
+ unsigned int *state_n;

if (open_mode & (O_EXCL|O_TRUNC))
goto out;
- switch (mode & (FMODE_READ|FMODE_WRITE)) {
- case FMODE_READ:
- ret |= test_bit(NFS_O_RDONLY_STATE, &state->flags) != 0
- && state->n_rdonly != 0;
- break;
- case FMODE_WRITE:
- ret |= test_bit(NFS_O_WRONLY_STATE, &state->flags) != 0
- && state->n_wronly != 0;
- break;
- case FMODE_READ|FMODE_WRITE:
- ret |= test_bit(NFS_O_RDWR_STATE, &state->flags) != 0
- && state->n_rdwr != 0;
- }
+
+ state_n = get_state_n(state, mode, open_mode);
+ if (state_n == NULL)
+ goto out;
+
+ if ((mode & (FMODE_READ|FMODE_WRITE)) == 0)
+ goto out;
+
+ ret |= test_bit(fmode_to_state_bit(mode), &state->flags) != 0 &&
+ *state_n != 0;
out:
return ret;
}
@@ -1124,47 +1135,40 @@ static int can_open_delegated(struct nfs_delegation *delegation, fmode_t fmode)
return 1;
}

-static void update_open_stateflags(struct nfs4_state *state, fmode_t fmode)
+static void
+update_open_stateflags(struct nfs4_state *state, fmode_t fmode,
+ unsigned int deny_mode)
{
- switch (fmode) {
- case FMODE_WRITE:
- state->n_wronly++;
- break;
- case FMODE_READ:
- state->n_rdonly++;
- break;
- case FMODE_READ|FMODE_WRITE:
- state->n_rdwr++;
- }
- nfs4_state_set_mode_locked(state, state->state | fmode);
+ inc_state_n(state, fmode, deny_mode);
+ nfs4_state_set_mode_locked(state, state->state | fmode,
+ state->deny_state | deny_mode);
}

-static void nfs_set_open_stateid_locked(struct nfs4_state *state, nfs4_stateid *stateid, fmode_t fmode)
+static void
+nfs_set_open_stateid_locked(struct nfs4_state *state, nfs4_stateid *stateid,
+ fmode_t fmode, unsigned int deny_mode)
{
if (test_bit(NFS_DELEGATED_STATE, &state->flags) == 0)
nfs4_stateid_copy(&state->stateid, stateid);
nfs4_stateid_copy(&state->open_stateid, stateid);
set_bit(NFS_OPEN_STATE, &state->flags);
- switch (fmode) {
- case FMODE_READ:
- set_bit(NFS_O_RDONLY_STATE, &state->flags);
- break;
- case FMODE_WRITE:
- set_bit(NFS_O_WRONLY_STATE, &state->flags);
- break;
- case FMODE_READ|FMODE_WRITE:
- set_bit(NFS_O_RDWR_STATE, &state->flags);
- }
+ if ((fmode & (FMODE_READ|FMODE_WRITE)) != 0)
+ set_bit(fmode_to_state_bit(fmode), &state->flags);
}

-static void nfs_set_open_stateid(struct nfs4_state *state, nfs4_stateid *stateid, fmode_t fmode)
+static void
+nfs_set_open_stateid(struct nfs4_state *state, nfs4_stateid *stateid,
+ fmode_t fmode, unsigned int deny_mode)
{
write_seqlock(&state->seqlock);
- nfs_set_open_stateid_locked(state, stateid, fmode);
+ nfs_set_open_stateid_locked(state, stateid, fmode, deny_mode);
write_sequnlock(&state->seqlock);
}

-static void __update_open_stateid(struct nfs4_state *state, nfs4_stateid *open_stateid, const nfs4_stateid *deleg_stateid, fmode_t fmode)
+static void
+__update_open_stateid(struct nfs4_state *state, nfs4_stateid *open_stateid,
+ const nfs4_stateid *deleg_stateid, fmode_t fmode,
+ unsigned int deny_mode)
{
/*
* Protect the call to nfs4_state_set_mode_locked and
@@ -1176,14 +1180,18 @@ static void __update_open_stateid(struct nfs4_state *state, nfs4_stateid *open_s
set_bit(NFS_DELEGATED_STATE, &state->flags);
}
if (open_stateid != NULL)
- nfs_set_open_stateid_locked(state, open_stateid, fmode);
+ nfs_set_open_stateid_locked(state, open_stateid, fmode,
+ deny_mode);
write_sequnlock(&state->seqlock);
spin_lock(&state->owner->so_lock);
- update_open_stateflags(state, fmode);
+ update_open_stateflags(state, fmode, deny_mode);
spin_unlock(&state->owner->so_lock);
}

-static int update_open_stateid(struct nfs4_state *state, nfs4_stateid *open_stateid, nfs4_stateid *delegation, fmode_t fmode)
+static int
+update_open_stateid(struct nfs4_state *state, nfs4_stateid *open_stateid,
+ nfs4_stateid *delegation, fmode_t fmode,
+ unsigned int deny_mode)
{
struct nfs_inode *nfsi = NFS_I(state->inode);
struct nfs_delegation *deleg_cur;
@@ -1208,7 +1216,8 @@ static int update_open_stateid(struct nfs4_state *state, nfs4_stateid *open_stat
goto no_delegation_unlock;

nfs_mark_delegation_referenced(deleg_cur);
- __update_open_stateid(state, open_stateid, &deleg_cur->stateid, fmode);
+ __update_open_stateid(state, open_stateid, &deleg_cur->stateid, fmode,
+ deny_mode);
ret = 1;
no_delegation_unlock:
spin_unlock(&deleg_cur->lock);
@@ -1216,7 +1225,8 @@ no_delegation:
rcu_read_unlock();

if (!ret && open_stateid != NULL) {
- __update_open_stateid(state, open_stateid, NULL, fmode);
+ __update_open_stateid(state, open_stateid, NULL, fmode,
+ deny_mode);
ret = 1;
}

@@ -1245,6 +1255,7 @@ static struct nfs4_state *nfs4_try_open_cached(struct nfs4_opendata *opendata)
struct nfs_delegation *delegation;
int open_mode = opendata->o_arg.open_flags;
fmode_t fmode = opendata->o_arg.fmode;
+ unsigned int deny_mode = 0;
nfs4_stateid stateid;
int ret = -EAGAIN;

@@ -1252,7 +1263,7 @@ static struct nfs4_state *nfs4_try_open_cached(struct nfs4_opendata *opendata)
if (can_open_cached(state, fmode, open_mode)) {
spin_lock(&state->owner->so_lock);
if (can_open_cached(state, fmode, open_mode)) {
- update_open_stateflags(state, fmode);
+ update_open_stateflags(state, fmode, deny_mode);
spin_unlock(&state->owner->so_lock);
goto out_return_state;
}
@@ -1276,7 +1287,8 @@ static struct nfs4_state *nfs4_try_open_cached(struct nfs4_opendata *opendata)
ret = -EAGAIN;

/* Try to update the stateid using the delegation */
- if (update_open_stateid(state, NULL, &stateid, fmode))
+ if (update_open_stateid(state, NULL, &stateid, fmode,
+ deny_mode))
goto out_return_state;
}
out:
@@ -1341,7 +1353,7 @@ _nfs4_opendata_reclaim_to_nfs4_state(struct nfs4_opendata *data)
nfs4_opendata_check_deleg(data, state);
update:
update_open_stateid(state, &data->o_res.stateid, NULL,
- data->o_arg.fmode);
+ data->o_arg.fmode, 0);
atomic_inc(&state->count);

return state;
@@ -1376,7 +1388,7 @@ _nfs4_opendata_to_nfs4_state(struct nfs4_opendata *data)
if (data->o_res.delegation_type != 0)
nfs4_opendata_check_deleg(data, state);
update_open_stateid(state, &data->o_res.stateid, NULL,
- data->o_arg.fmode);
+ data->o_arg.fmode, 0);
iput(inode);
out:
nfs_release_seqid(data->o_arg.seqid);
@@ -1426,59 +1438,62 @@ static struct nfs4_opendata *nfs4_open_recoverdata_alloc(struct nfs_open_context
return opendata;
}

-static int nfs4_open_recover_helper(struct nfs4_opendata *opendata, fmode_t fmode, struct nfs4_state **res)
+static int
+nfs4_open_recover_helper(struct nfs4_opendata *opendata, fmode_t fmode,
+ unsigned int deny_mode, struct nfs4_state **res)
{
struct nfs4_state *newstate;
int ret;

- opendata->o_arg.open_flags = 0;
+ opendata->o_arg.open_flags = deny_mode;
opendata->o_arg.fmode = fmode;
memset(&opendata->o_res, 0, sizeof(opendata->o_res));
memset(&opendata->c_res, 0, sizeof(opendata->c_res));
nfs4_init_opendata_res(opendata);
ret = _nfs4_recover_proc_open(opendata);
if (ret != 0)
- return ret;
+ return ret;
newstate = nfs4_opendata_to_nfs4_state(opendata);
if (IS_ERR(newstate))
return PTR_ERR(newstate);
- nfs4_close_state(newstate, fmode);
+ nfs4_close_state(newstate, fmode, deny_mode);
*res = newstate;
return 0;
}

-static int nfs4_open_recover(struct nfs4_opendata *opendata, struct nfs4_state *state)
+static int
+nfs4_open_recover(struct nfs4_opendata *opendata, struct nfs4_state *state)
{
- struct nfs4_state *newstate;
+ struct nfs4_state *newstate = NULL;
int ret;
+ unsigned int fm, dm;
+ fmode_t fmodes[] = {FMODE_READ, FMODE_WRITE, FMODE_READ|FMODE_WRITE};
+ unsigned int dmodes[] = {0};

/* memory barrier prior to reading state->n_* */
clear_bit(NFS_DELEGATED_STATE, &state->flags);
clear_bit(NFS_OPEN_STATE, &state->flags);
smp_rmb();
- if (state->n_rdwr != 0) {
- clear_bit(NFS_O_RDWR_STATE, &state->flags);
- ret = nfs4_open_recover_helper(opendata, FMODE_READ|FMODE_WRITE, &newstate);
- if (ret != 0)
- return ret;
- if (newstate != state)
- return -ESTALE;
- }
- if (state->n_wronly != 0) {
- clear_bit(NFS_O_WRONLY_STATE, &state->flags);
- ret = nfs4_open_recover_helper(opendata, FMODE_WRITE, &newstate);
- if (ret != 0)
- return ret;
- if (newstate != state)
- return -ESTALE;
- }
- if (state->n_rdonly != 0) {
- clear_bit(NFS_O_RDONLY_STATE, &state->flags);
- ret = nfs4_open_recover_helper(opendata, FMODE_READ, &newstate);
- if (ret != 0)
- return ret;
- if (newstate != state)
- return -ESTALE;
+ /* walk through all possible fmode|denymode values */
+ for (fm = 0; fm < 3; fm++) {
+ unsigned int fmode_bit = fmode_to_state_bit(fmodes[fm]);
+
+ for (dm = 0; dm < 1; dm++) {
+ unsigned int *state_n;
+
+ state_n = get_state_n(state, fmodes[fm], dmodes[dm]);
+ if (state_n == NULL || *state_n == 0)
+ continue;
+
+ clear_bit(fmode_bit, &state->flags);
+
+ ret = nfs4_open_recover_helper(opendata, fmodes[fm],
+ dmodes[dm], &newstate);
+ if (ret != 0)
+ return ret;
+ if (newstate != state)
+ return -ESTALE;
+ }
}
/*
* We may have performed cached opens for all three recoveries.
@@ -1654,7 +1669,7 @@ static void nfs4_open_confirm_release(void *calldata)
goto out_free;
state = nfs4_opendata_to_nfs4_state(data);
if (!IS_ERR(state))
- nfs4_close_state(state, data->o_arg.fmode);
+ nfs4_close_state(state, data->o_arg.fmode, 0);
out_free:
nfs4_opendata_put(data);
}
@@ -1814,7 +1829,7 @@ static void nfs4_open_release(void *calldata)
goto out_free;
state = nfs4_opendata_to_nfs4_state(data);
if (!IS_ERR(state))
- nfs4_close_state(state, data->o_arg.fmode);
+ nfs4_close_state(state, data->o_arg.fmode, 0);
out_free:
nfs4_opendata_put(data);
}
@@ -1926,7 +1941,7 @@ static int nfs4_opendata_access(struct rpc_cred *cred,
return 0;

/* even though OPEN succeeded, access is denied. Close the file */
- nfs4_close_state(state, fmode);
+ nfs4_close_state(state, fmode, 0);
return -EACCES;
}

@@ -2478,8 +2493,9 @@ static void nfs4_free_closedata(void *data)
kfree(calldata);
}

-static void nfs4_close_clear_stateid_flags(struct nfs4_state *state,
- fmode_t fmode)
+static void
+nfs4_close_clear_stateid_flags(struct nfs4_state *state, fmode_t fmode,
+ unsigned int deny_mode)
{
spin_lock(&state->owner->so_lock);
clear_bit(NFS_O_RDWR_STATE, &state->flags);
@@ -2516,7 +2532,8 @@ static void nfs4_close_done(struct rpc_task *task, void *data)
if (calldata->roc)
pnfs_roc_set_barrier(state->inode,
calldata->roc_barrier);
- nfs_set_open_stateid(state, &calldata->res.stateid, 0);
+ nfs_set_open_stateid(state, &calldata->res.stateid, 0,
+ 0);
renew_lease(server, calldata->timestamp);
break;
case -NFS4ERR_ADMIN_REVOKED:
@@ -2532,7 +2549,8 @@ static void nfs4_close_done(struct rpc_task *task, void *data)
goto out_release;
}
}
- nfs4_close_clear_stateid_flags(state, calldata->arg.fmode);
+ nfs4_close_clear_stateid_flags(state, calldata->arg.fmode,
+ calldata->arg.deny_mode);
out_release:
nfs_release_seqid(calldata->arg.seqid);
nfs_refresh_inode(calldata->inode, calldata->res.fattr);
@@ -2552,6 +2570,7 @@ static void nfs4_close_prepare(struct rpc_task *task, void *data)

task->tk_msg.rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_OPEN_DOWNGRADE];
calldata->arg.fmode = FMODE_READ|FMODE_WRITE;
+ calldata->arg.deny_mode = 0;
spin_lock(&state->owner->so_lock);
/* Calculate the change in open mode */
if (state->n_rdwr == 0) {
@@ -2651,6 +2670,7 @@ int nfs4_do_close(struct nfs4_state *state, gfp_t gfp_mask, int wait)
if (calldata->arg.seqid == NULL)
goto out_free_calldata;
calldata->arg.fmode = 0;
+ calldata->arg.deny_mode = 0;
calldata->arg.bitmask = server->cache_consistency_bitmask;
calldata->res.fattr = &calldata->fattr;
calldata->res.seqid = calldata->arg.seqid;
@@ -2701,9 +2721,9 @@ static void nfs4_close_context(struct nfs_open_context *ctx, int is_sync)
if (ctx->state == NULL)
return;
if (is_sync)
- nfs4_close_sync(ctx->state, ctx->mode);
+ nfs4_close_sync(ctx->state, ctx->mode, ctx->deny_mode);
else
- nfs4_close_state(ctx->state, ctx->mode);
+ nfs4_close_state(ctx->state, ctx->mode, ctx->deny_mode);
}

#define FATTR4_WORD1_NFS40_MASK (2*FATTR4_WORD1_MOUNTED_ON_FILEID - 1UL)
@@ -3382,7 +3402,7 @@ nfs4_proc_create(struct inode *dir, struct dentry *dentry, struct iattr *sattr,
int opened = 0;
int status = 0;

- ctx = alloc_nfs_open_context(dentry, FMODE_READ);
+ ctx = alloc_nfs_open_context(dentry, FMODE_READ, 0);
if (IS_ERR(ctx))
return PTR_ERR(ctx);

diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 059c01b..168f868 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -633,8 +633,10 @@ nfs4_alloc_open_state(void)
}

void
-nfs4_state_set_mode_locked(struct nfs4_state *state, fmode_t fmode)
+nfs4_state_set_mode_locked(struct nfs4_state *state, fmode_t fmode,
+ unsigned int deny_mode)
{
+ state->deny_state = deny_mode;
if (state->state == fmode)
return;
/* NB! List reordering - see the reclaim code for why. */
@@ -727,8 +729,9 @@ void nfs4_put_open_state(struct nfs4_state *state)
/*
* Close the current file.
*/
-static void __nfs4_close(struct nfs4_state *state,
- fmode_t fmode, gfp_t gfp_mask, int wait)
+static void
+__nfs4_close(struct nfs4_state *state, fmode_t fmode, unsigned int deny_mode,
+ gfp_t gfp_mask, int wait)
{
struct nfs4_state_owner *owner = state->owner;
int call_close = 0;
@@ -737,16 +740,7 @@ static void __nfs4_close(struct nfs4_state *state,
atomic_inc(&owner->so_count);
/* Protect against nfs4_find_state() */
spin_lock(&owner->so_lock);
- switch (fmode & (FMODE_READ | FMODE_WRITE)) {
- case FMODE_READ:
- state->n_rdonly--;
- break;
- case FMODE_WRITE:
- state->n_wronly--;
- break;
- case FMODE_READ|FMODE_WRITE:
- state->n_rdwr--;
- }
+ dec_state_n(state, fmode, deny_mode);
newstate = FMODE_READ|FMODE_WRITE;
if (state->n_rdwr == 0) {
if (state->n_rdonly == 0) {
@@ -762,7 +756,7 @@ static void __nfs4_close(struct nfs4_state *state,
if (newstate == 0)
clear_bit(NFS_DELEGATED_STATE, &state->flags);
}
- nfs4_state_set_mode_locked(state, newstate);
+ nfs4_state_set_mode_locked(state, newstate, 0);
spin_unlock(&owner->so_lock);

if (!call_close) {
@@ -772,14 +766,18 @@ static void __nfs4_close(struct nfs4_state *state,
nfs4_do_close(state, gfp_mask, wait);
}

-void nfs4_close_state(struct nfs4_state *state, fmode_t fmode)
+void
+nfs4_close_state(struct nfs4_state *state, fmode_t fmode,
+ unsigned int deny_mode)
{
- __nfs4_close(state, fmode, GFP_NOFS, 0);
+ __nfs4_close(state, fmode, deny_mode, GFP_NOFS, 0);
}

-void nfs4_close_sync(struct nfs4_state *state, fmode_t fmode)
+void
+nfs4_close_sync(struct nfs4_state *state, fmode_t fmode,
+ unsigned int deny_mode)
{
- __nfs4_close(state, fmode, GFP_KERNEL, 1);
+ __nfs4_close(state, fmode, deny_mode, GFP_KERNEL, 1);
}

/*
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index 5be2868..ed507f4 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -1358,7 +1358,8 @@ static void encode_lookup(struct xdr_stream *xdr, const struct qstr *name, struc
encode_string(xdr, name->len, name->name);
}

-static void encode_share_access(struct xdr_stream *xdr, fmode_t fmode)
+static void encode_share_access(struct xdr_stream *xdr, fmode_t fmode,
+ int open_flags)
{
__be32 *p;

@@ -1387,7 +1388,7 @@ static inline void encode_openhdr(struct xdr_stream *xdr, const struct nfs_opena
* owner 4 = 32
*/
encode_nfs4_seqid(xdr, arg->seqid);
- encode_share_access(xdr, arg->fmode);
+ encode_share_access(xdr, arg->fmode, arg->open_flags);
p = reserve_space(xdr, 36);
p = xdr_encode_hyper(p, arg->clientid);
*p++ = cpu_to_be32(24);
@@ -1542,7 +1543,7 @@ static void encode_open_downgrade(struct xdr_stream *xdr, const struct nfs_close
encode_op_hdr(xdr, OP_OPEN_DOWNGRADE, decode_open_downgrade_maxsz, hdr);
encode_nfs4_stateid(xdr, arg->stateid);
encode_nfs4_seqid(xdr, arg->seqid);
- encode_share_access(xdr, arg->fmode);
+ encode_share_access(xdr, arg->fmode, arg->deny_mode);
}

static void
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 4899737..a6e1579 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -80,6 +80,7 @@ struct nfs_open_context {
struct rpc_cred *cred;
struct nfs4_state *state;
fmode_t mode;
+ unsigned int deny_mode;

unsigned long flags;
#define NFS_CONTEXT_ERROR_WRITE (0)
@@ -363,7 +364,9 @@ extern void nfs_setsecurity(struct inode *inode, struct nfs_fattr *fattr,
extern struct nfs_open_context *get_nfs_open_context(struct nfs_open_context *ctx);
extern void put_nfs_open_context(struct nfs_open_context *ctx);
extern struct nfs_open_context *nfs_find_open_context(struct inode *inode, struct rpc_cred *cred, fmode_t mode);
-extern struct nfs_open_context *alloc_nfs_open_context(struct dentry *dentry, fmode_t f_mode);
+extern struct nfs_open_context *alloc_nfs_open_context(struct dentry *dentry,
+ fmode_t f_mode,
+ unsigned int deny_mode);
extern void nfs_inode_attach_open_context(struct nfs_open_context *ctx);
extern void nfs_file_set_open_context(struct file *filp, struct nfs_open_context *ctx);
extern struct nfs_lock_context *nfs_get_lock_context(struct nfs_open_context *ctx);
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 3ccfcec..000c47f 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -398,6 +398,7 @@ struct nfs_closeargs {
nfs4_stateid * stateid;
struct nfs_seqid * seqid;
fmode_t fmode;
+ unsigned int deny_mode;
const u32 * bitmask;
};

--
1.7.10.4

2014-01-17 10:09:03

by Pavel Shilovsky

[permalink] [raw]
Subject: [PATCH v7 7/7] NFSv4: Add O_DENY* open flags support

Pass O_DENY* flags to NFSv4 open request. Make it return
-ESHAREDENIED on share conflicts with other opens and disable
O_DENYDELETE support since NFSv4 doesn't support it.

Add extra file descriptor counters for every fmode|denymode
value into nfs4_state. Make the client not repeat the previous
open requests to the server during delegation recall because
of possible conflicts with deny modes.

Signed-off-by: Pavel Shilovsky <[email protected]>
---
fs/nfs/dir.c | 39 ++++++++++++++-
fs/nfs/inode.c | 3 +-
fs/nfs/internal.h | 3 +-
fs/nfs/nfs4_fs.h | 48 ++++++++++++++++--
fs/nfs/nfs4file.c | 8 ++-
fs/nfs/nfs4proc.c | 138 +++++++++++++++++++++++++++++++++++++++++++---------
fs/nfs/nfs4state.c | 33 +++++++++++--
fs/nfs/nfs4super.c | 9 ++--
fs/nfs/nfs4xdr.c | 14 +++++-
fs/nfs/super.c | 7 ++-
10 files changed, 261 insertions(+), 41 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index fe0c7bb..627f9ea 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -1362,7 +1362,8 @@ static fmode_t flags_to_mode(int flags)

static struct nfs_open_context *create_nfs_open_context(struct dentry *dentry, int open_flags)
{
- return alloc_nfs_open_context(dentry, flags_to_mode(open_flags), 0);
+ return alloc_nfs_open_context(dentry, flags_to_mode(open_flags),
+ open_flags & (O_DENYREAD|O_DENYWRITE));
}

static int do_open(struct inode *inode, struct file *filp)
@@ -1411,6 +1412,10 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
if (err)
return err;

+ /* No support for O_DENYDELETE */
+ if (open_flags & O_DENYDELETE)
+ return -EINVAL;
+
/* NFS only supports OPEN on regular files */
if ((open_flags & O_DIRECTORY)) {
if (!d_unhashed(dentry)) {
@@ -2256,7 +2261,37 @@ static int nfs_open_permission_mask(int openflags)

int nfs_may_open(struct inode *inode, struct rpc_cred *cred, int openflags)
{
- return nfs_do_access(inode, cred, nfs_open_permission_mask(openflags));
+ int ret;
+ fmode_t mode = OPEN_FMODE(openflags);
+ struct nfs_inode *nfsi = NFS_I(inode);
+ struct nfs4_state *state;
+
+ ret = nfs_do_access(inode, cred, nfs_open_permission_mask(openflags));
+ if (ret)
+ goto out;
+
+ if (!IS_SHARELOCK(inode))
+ goto out;
+
+ spin_lock(&inode->i_lock);
+ /* Check for share reservation conflicts */
+ list_for_each_entry(state, &nfsi->open_states, inode_states) {
+ if ((state->state & FMODE_READ) && (openflags & O_DENYREAD))
+ goto out_set_err;
+ if ((state->state & FMODE_WRITE) && (openflags & O_DENYWRITE))
+ goto out_set_err;
+ if ((mode & FMODE_READ) && (state->deny_state & O_DENYREAD))
+ goto out_set_err;
+ if ((mode & FMODE_WRITE) && (state->deny_state & O_DENYWRITE))
+ goto out_set_err;
+ }
+ spin_unlock(&inode->i_lock);
+out:
+ return ret;
+out_set_err:
+ spin_unlock(&inode->i_lock);
+ ret = -ESHAREDENIED;
+ goto out;
}
EXPORT_SYMBOL_GPL(nfs_may_open);

diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 82f8593..a228dda 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -846,7 +846,8 @@ int nfs_open(struct inode *inode, struct file *filp)
{
struct nfs_open_context *ctx;

- ctx = alloc_nfs_open_context(filp->f_path.dentry, filp->f_mode, 0);
+ ctx = alloc_nfs_open_context(filp->f_path.dentry, filp->f_mode,
+ filp->f_flags & (O_DENYREAD|O_DENYWRITE));
if (IS_ERR(ctx))
return PTR_ERR(ctx);
nfs_file_set_open_context(filp, ctx);
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 8b5cc04..98f95fd 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -7,7 +7,8 @@
#include <linux/security.h>
#include <linux/crc32.h>

-#define NFS_MS_MASK (MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_SYNCHRONOUS)
+#define NFS_MS_MASK (MS_RDONLY | MS_NOSUID | MS_NODEV | MS_NOEXEC | \
+ MS_SYNCHRONOUS | MS_SHARELOCK)

struct nfs_string;

diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index c455acb..ca86f66 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -164,6 +164,10 @@ enum {
NFS_STATE_RECLAIM_NOGRACE, /* OPEN stateid needs to recover state */
NFS_STATE_POSIX_LOCKS, /* Posix locks are supported */
NFS_STATE_RECOVERY_FAILED, /* OPEN stateid state recovery failed */
+ NFS_O_DENYNONE_STATE, /* OPEN stateid has deny none state */
+ NFS_O_DENYREAD_STATE, /* OPEN stateid has deny read state */
+ NFS_O_DENYWRITE_STATE, /* OPEN stateid has deny write state */
+ NFS_O_DENYRDWR_STATE, /* OPEN stateid has deny rw state */
};

struct nfs4_state {
@@ -181,10 +185,19 @@ struct nfs4_state {
nfs4_stateid stateid; /* Current stateid: may be delegation */
nfs4_stateid open_stateid; /* OPEN stateid */

- /* The following 3 fields are protected by owner->so_lock */
+ /* The following 12 fields are protected by owner->so_lock */
unsigned int n_rdonly; /* Number of read-only references */
unsigned int n_wronly; /* Number of write-only references */
unsigned int n_rdwr; /* Number of read/write references */
+ unsigned int n_ro_dr; /* Number of read/denyread refs */
+ unsigned int n_wo_dr; /* Number of write/denyread refs */
+ unsigned int n_rw_dr; /* Number of rw/denyread references */
+ unsigned int n_ro_dw; /* Number of read/denywrite refs */
+ unsigned int n_wo_dw; /* Number of write/denywrite refs */
+ unsigned int n_rw_dw; /* Number of rw/denywrite references */
+ unsigned int n_ro_drw; /* Number of read/denyrw references */
+ unsigned int n_wo_drw; /* Number of write/denyrw references */
+ unsigned int n_rw_drw; /* Number of rw/denyrw references */

fmode_t state; /* State on the server (R,W, or RW) */
unsigned int deny_state; /* Deny state on the server */
@@ -512,11 +525,38 @@ get_state_n(struct nfs4_state *state, fmode_t mode, unsigned int deny_mode)
{
switch (mode & (FMODE_READ|FMODE_WRITE)) {
case FMODE_READ:
- return &state->n_rdonly;
+ switch (deny_mode & (O_DENYREAD|O_DENYWRITE)) {
+ case 0:
+ return &state->n_rdonly;
+ case O_DENYREAD:
+ return &state->n_ro_dr;
+ case O_DENYWRITE:
+ return &state->n_ro_dw;
+ case O_DENYREAD|O_DENYWRITE:
+ return &state->n_ro_drw;
+ }
case FMODE_WRITE:
- return &state->n_wronly;
+ switch (deny_mode & (O_DENYREAD|O_DENYWRITE)) {
+ case 0:
+ return &state->n_wronly;
+ case O_DENYREAD:
+ return &state->n_wo_dr;
+ case O_DENYWRITE:
+ return &state->n_wo_dw;
+ case O_DENYREAD|O_DENYWRITE:
+ return &state->n_wo_drw;
+ }
case FMODE_READ|FMODE_WRITE:
- return &state->n_rdwr;
+ switch (deny_mode & (O_DENYREAD|O_DENYWRITE)) {
+ case 0:
+ return &state->n_rdwr;
+ case O_DENYREAD:
+ return &state->n_rw_dr;
+ case O_DENYWRITE:
+ return &state->n_rw_dw;
+ case O_DENYREAD|O_DENYWRITE:
+ return &state->n_rw_drw;
+ }
}
return NULL;
}
diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c
index 5f444f0..bfea7fa 100644
--- a/fs/nfs/nfs4file.c
+++ b/fs/nfs/nfs4file.c
@@ -33,6 +33,10 @@ nfs4_file_open(struct inode *inode, struct file *filp)

dprintk("NFS: open file(%pd2)\n", dentry);

+ /* No support for O_DENYDELETE */
+ if (openflags & O_DENYDELETE)
+ return -EINVAL;
+
if ((openflags & O_ACCMODE) == 3)
openflags--;

@@ -42,7 +46,8 @@ nfs4_file_open(struct inode *inode, struct file *filp)
parent = dget_parent(dentry);
dir = parent->d_inode;

- ctx = alloc_nfs_open_context(filp->f_path.dentry, filp->f_mode, 0);
+ ctx = alloc_nfs_open_context(filp->f_path.dentry, filp->f_mode,
+ filp->f_flags & (O_DENYREAD|O_DENYWRITE));
err = PTR_ERR(ctx);
if (IS_ERR(ctx))
goto out;
@@ -63,6 +68,7 @@ nfs4_file_open(struct inode *inode, struct file *filp)
case -EDQUOT:
case -ENOSPC:
case -EROFS:
+ case -ESHAREDENIED:
goto out_put_ctx;
default:
goto out_drop;
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 1b6f1fe..8fafe59 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -155,7 +155,7 @@ static int nfs4_map_errors(int err)
case -NFS4ERR_BADNAME:
return -EINVAL;
case -NFS4ERR_SHARE_DENIED:
- return -EACCES;
+ return -ESHAREDENIED;
case -NFS4ERR_MINOR_VERS_MISMATCH:
return -EPROTONOSUPPORT;
case -NFS4ERR_ACCESS:
@@ -996,6 +996,8 @@ static struct nfs4_opendata *nfs4_opendata_alloc(struct dentry *dentry,
p->owner = sp;
atomic_inc(&sp->so_count);
p->o_arg.open_flags = flags;
+ if (!IS_SHARELOCK(dir))
+ p->o_arg.open_flags &= ~(O_DENYREAD | O_DENYWRITE);
p->o_arg.fmode = fmode & (FMODE_READ|FMODE_WRITE);
/* don't put an ACCESS op in OPEN compound if O_EXCL, because ACCESS
* will return permission denied for all bits until close */
@@ -1100,6 +1102,20 @@ fmode_to_state_bit(fmode_t mode)
}
}

+static inline unsigned int
+denymode_to_state_bit(unsigned int deny_mode)
+{
+ switch (deny_mode & (O_DENYREAD|O_DENYWRITE)) {
+ case O_DENYREAD:
+ return NFS_O_DENYREAD_STATE;
+ case O_DENYWRITE:
+ return NFS_O_DENYWRITE_STATE;
+ case O_DENYREAD|O_DENYWRITE:
+ return NFS_O_DENYRDWR_STATE;
+ }
+ return NFS_O_DENYNONE_STATE;
+}
+
static int can_open_cached(struct nfs4_state *state, fmode_t mode, int open_mode)
{
int ret = 0;
@@ -1108,6 +1124,15 @@ static int can_open_cached(struct nfs4_state *state, fmode_t mode, int open_mode
if (open_mode & (O_EXCL|O_TRUNC))
goto out;

+ if ((state->state & FMODE_READ) && (open_mode & O_DENYREAD))
+ goto out;
+ if ((state->state & FMODE_WRITE) && (open_mode & O_DENYWRITE))
+ goto out;
+ if ((mode & FMODE_READ) && (state->deny_state & O_DENYREAD))
+ goto out;
+ if ((mode & FMODE_WRITE) && (state->deny_state & O_DENYWRITE))
+ goto out;
+
state_n = get_state_n(state, mode, open_mode);
if (state_n == NULL)
goto out;
@@ -1116,17 +1141,22 @@ static int can_open_cached(struct nfs4_state *state, fmode_t mode, int open_mode
goto out;

ret |= test_bit(fmode_to_state_bit(mode), &state->flags) != 0 &&
- *state_n != 0;
+ test_bit(denymode_to_state_bit(open_mode), &state->flags) != 0
+ && *state_n != 0;
out:
return ret;
}

-static int can_open_delegated(struct nfs_delegation *delegation, fmode_t fmode)
+static int
+can_open_delegated(struct nfs_delegation *delegation, fmode_t fmode,
+ unsigned int deny_mode)
{
if (delegation == NULL)
return 0;
if ((delegation->type & fmode) != fmode)
return 0;
+ if ((deny_mode & O_DENYREAD) && (delegation->type == FMODE_READ))
+ return 0;
if (test_bit(NFS_DELEGATION_NEED_RECLAIM, &delegation->flags))
return 0;
if (test_bit(NFS_DELEGATION_RETURNING, &delegation->flags))
@@ -1154,6 +1184,7 @@ nfs_set_open_stateid_locked(struct nfs4_state *state, nfs4_stateid *stateid,
set_bit(NFS_OPEN_STATE, &state->flags);
if ((fmode & (FMODE_READ|FMODE_WRITE)) != 0)
set_bit(fmode_to_state_bit(fmode), &state->flags);
+ set_bit(denymode_to_state_bit(deny_mode), &state->flags);
}

static void
@@ -1255,7 +1286,7 @@ static struct nfs4_state *nfs4_try_open_cached(struct nfs4_opendata *opendata)
struct nfs_delegation *delegation;
int open_mode = opendata->o_arg.open_flags;
fmode_t fmode = opendata->o_arg.fmode;
- unsigned int deny_mode = 0;
+ unsigned int deny_mode = open_mode & (O_DENYREAD|O_DENYWRITE);
nfs4_stateid stateid;
int ret = -EAGAIN;

@@ -1271,7 +1302,7 @@ static struct nfs4_state *nfs4_try_open_cached(struct nfs4_opendata *opendata)
}
rcu_read_lock();
delegation = rcu_dereference(nfsi->delegation);
- if (!can_open_delegated(delegation, fmode)) {
+ if (!can_open_delegated(delegation, fmode, deny_mode)) {
rcu_read_unlock();
break;
}
@@ -1353,7 +1384,8 @@ _nfs4_opendata_reclaim_to_nfs4_state(struct nfs4_opendata *data)
nfs4_opendata_check_deleg(data, state);
update:
update_open_stateid(state, &data->o_res.stateid, NULL,
- data->o_arg.fmode, 0);
+ data->o_arg.fmode,
+ data->o_arg.open_flags & (O_DENYREAD|O_DENYWRITE));
atomic_inc(&state->count);

return state;
@@ -1388,7 +1420,8 @@ _nfs4_opendata_to_nfs4_state(struct nfs4_opendata *data)
if (data->o_res.delegation_type != 0)
nfs4_opendata_check_deleg(data, state);
update_open_stateid(state, &data->o_res.stateid, NULL,
- data->o_arg.fmode, 0);
+ data->o_arg.fmode,
+ data->o_arg.open_flags & (O_DENYREAD|O_DENYWRITE));
iput(inode);
out:
nfs_release_seqid(data->o_arg.seqid);
@@ -1461,14 +1494,20 @@ nfs4_open_recover_helper(struct nfs4_opendata *opendata, fmode_t fmode,
return 0;
}

+/*
+ * Recover an open state on the server. @reset indicates if we need to
+ * flush all opens or just those that were cached localy.
+ */
static int
-nfs4_open_recover(struct nfs4_opendata *opendata, struct nfs4_state *state)
+nfs4_open_recover(struct nfs4_opendata *opendata, struct nfs4_state *state,
+ bool reset)
{
struct nfs4_state *newstate = NULL;
int ret;
unsigned int fm, dm;
fmode_t fmodes[] = {FMODE_READ, FMODE_WRITE, FMODE_READ|FMODE_WRITE};
- unsigned int dmodes[] = {0};
+ unsigned int dmodes[] = {0, O_DENYREAD, O_DENYWRITE,
+ O_DENYREAD|O_DENYWRITE};

/* memory barrier prior to reading state->n_* */
clear_bit(NFS_DELEGATED_STATE, &state->flags);
@@ -1478,14 +1517,21 @@ nfs4_open_recover(struct nfs4_opendata *opendata, struct nfs4_state *state)
for (fm = 0; fm < 3; fm++) {
unsigned int fmode_bit = fmode_to_state_bit(fmodes[fm]);

- for (dm = 0; dm < 1; dm++) {
+ for (dm = 0; dm < 4; dm++) {
unsigned int *state_n;
+ unsigned int deny_bit;

state_n = get_state_n(state, fmodes[fm], dmodes[dm]);
if (state_n == NULL || *state_n == 0)
continue;

+ deny_bit = denymode_to_state_bit(dmodes[dm]);
+ if (!reset && test_bit(fmode_bit, &state->flags) &&
+ test_bit(deny_bit, &state->flags))
+ continue;
+
clear_bit(fmode_bit, &state->flags);
+ clear_bit(deny_bit, &state->flags);

ret = nfs4_open_recover_helper(opendata, fmodes[fm],
dmodes[dm], &newstate);
@@ -1530,7 +1576,7 @@ static int _nfs4_do_open_reclaim(struct nfs_open_context *ctx, struct nfs4_state
delegation_type = delegation->type;
rcu_read_unlock();
opendata->o_arg.u.delegation_type = delegation_type;
- status = nfs4_open_recover(opendata, state);
+ status = nfs4_open_recover(opendata, state, true);
nfs4_opendata_put(opendata);
return status;
}
@@ -1628,7 +1674,7 @@ int nfs4_open_delegation_recall(struct nfs_open_context *ctx, struct nfs4_state
if (IS_ERR(opendata))
return PTR_ERR(opendata);
nfs4_stateid_copy(&opendata->o_arg.u.delegation, stateid);
- err = nfs4_open_recover(opendata, state);
+ err = nfs4_open_recover(opendata, state, false);
nfs4_opendata_put(opendata);
return nfs4_handle_delegation_recall_error(server, state, stateid, err);
}
@@ -1669,7 +1715,8 @@ static void nfs4_open_confirm_release(void *calldata)
goto out_free;
state = nfs4_opendata_to_nfs4_state(data);
if (!IS_ERR(state))
- nfs4_close_state(state, data->o_arg.fmode, 0);
+ nfs4_close_state(state, data->o_arg.fmode,
+ data->o_arg.open_flags & (O_DENYREAD|O_DENYWRITE));
out_free:
nfs4_opendata_put(data);
}
@@ -1742,7 +1789,8 @@ static void nfs4_open_prepare(struct rpc_task *task, void *calldata)
delegation = rcu_dereference(NFS_I(data->state->inode)->delegation);
if (data->o_arg.claim != NFS4_OPEN_CLAIM_DELEGATE_CUR &&
data->o_arg.claim != NFS4_OPEN_CLAIM_DELEG_CUR_FH &&
- can_open_delegated(delegation, data->o_arg.fmode))
+ can_open_delegated(delegation, data->o_arg.fmode,
+ data->o_arg.open_flags & (O_DENYREAD|O_DENYWRITE)))
goto unlock_no_action;
rcu_read_unlock();
}
@@ -1829,7 +1877,8 @@ static void nfs4_open_release(void *calldata)
goto out_free;
state = nfs4_opendata_to_nfs4_state(data);
if (!IS_ERR(state))
- nfs4_close_state(state, data->o_arg.fmode, 0);
+ nfs4_close_state(state, data->o_arg.fmode,
+ data->o_arg.open_flags & (O_DENYREAD|O_DENYWRITE));
out_free:
nfs4_opendata_put(data);
}
@@ -1941,7 +1990,7 @@ static int nfs4_opendata_access(struct rpc_cred *cred,
return 0;

/* even though OPEN succeeded, access is denied. Close the file */
- nfs4_close_state(state, fmode, 0);
+ nfs4_close_state(state, fmode, openflags & (O_DENYREAD|O_DENYWRITE));
return -EACCES;
}

@@ -2006,7 +2055,7 @@ static int _nfs4_open_expired(struct nfs_open_context *ctx, struct nfs4_state *s
NFS4_OPEN_CLAIM_FH);
if (IS_ERR(opendata))
return PTR_ERR(opendata);
- ret = nfs4_open_recover(opendata, state);
+ ret = nfs4_open_recover(opendata, state, true);
if (ret == -ESTALE)
d_drop(ctx->dentry);
nfs4_opendata_put(opendata);
@@ -2125,6 +2174,10 @@ static int nfs41_check_open_stateid(struct nfs4_state *state)
clear_bit(NFS_O_WRONLY_STATE, &state->flags);
clear_bit(NFS_O_RDWR_STATE, &state->flags);
clear_bit(NFS_OPEN_STATE, &state->flags);
+ clear_bit(NFS_O_DENYNONE_STATE, &state->flags);
+ clear_bit(NFS_O_DENYREAD_STATE, &state->flags);
+ clear_bit(NFS_O_DENYWRITE_STATE, &state->flags);
+ clear_bit(NFS_O_DENYRDWR_STATE, &state->flags);
}
return status;
}
@@ -2498,18 +2551,40 @@ nfs4_close_clear_stateid_flags(struct nfs4_state *state, fmode_t fmode,
unsigned int deny_mode)
{
spin_lock(&state->owner->so_lock);
- clear_bit(NFS_O_RDWR_STATE, &state->flags);
switch (fmode & (FMODE_READ|FMODE_WRITE)) {
+ case FMODE_WRITE|FMODE_READ:
+ break;
case FMODE_WRITE:
+ clear_bit(NFS_O_RDWR_STATE, &state->flags);
clear_bit(NFS_O_RDONLY_STATE, &state->flags);
break;
case FMODE_READ:
+ clear_bit(NFS_O_RDWR_STATE, &state->flags);
clear_bit(NFS_O_WRONLY_STATE, &state->flags);
break;
case 0:
+ clear_bit(NFS_O_RDWR_STATE, &state->flags);
clear_bit(NFS_O_RDONLY_STATE, &state->flags);
clear_bit(NFS_O_WRONLY_STATE, &state->flags);
clear_bit(NFS_OPEN_STATE, &state->flags);
+ clear_bit(NFS_O_DENYNONE_STATE, &state->flags);
+ }
+
+ switch (deny_mode & (O_DENYREAD|O_DENYWRITE)) {
+ case O_DENYREAD|O_DENYWRITE:
+ break;
+ case O_DENYWRITE:
+ clear_bit(NFS_O_DENYRDWR_STATE, &state->flags);
+ clear_bit(NFS_O_DENYREAD_STATE, &state->flags);
+ break;
+ case O_DENYREAD:
+ clear_bit(NFS_O_DENYRDWR_STATE, &state->flags);
+ clear_bit(NFS_O_DENYWRITE_STATE, &state->flags);
+ break;
+ case 0:
+ clear_bit(NFS_O_DENYRDWR_STATE, &state->flags);
+ clear_bit(NFS_O_DENYREAD_STATE, &state->flags);
+ clear_bit(NFS_O_DENYWRITE_STATE, &state->flags);
}
spin_unlock(&state->owner->so_lock);
}
@@ -2570,21 +2645,40 @@ static void nfs4_close_prepare(struct rpc_task *task, void *data)

task->tk_msg.rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_OPEN_DOWNGRADE];
calldata->arg.fmode = FMODE_READ|FMODE_WRITE;
- calldata->arg.deny_mode = 0;
+ calldata->arg.deny_mode = O_DENYREAD|O_DENYWRITE;
spin_lock(&state->owner->so_lock);
/* Calculate the change in open mode */
- if (state->n_rdwr == 0) {
- if (state->n_rdonly == 0) {
+ if (state->n_rdwr + state->n_rw_dr +
+ state->n_rw_dw + state->n_rw_drw == 0) {
+ if (state->n_rdonly + state->n_ro_dr + state->n_ro_dw +
+ state->n_ro_drw == 0) {
call_close |= test_bit(NFS_O_RDONLY_STATE, &state->flags);
call_close |= test_bit(NFS_O_RDWR_STATE, &state->flags);
calldata->arg.fmode &= ~FMODE_READ;
}
- if (state->n_wronly == 0) {
+ if (state->n_wronly + state->n_wo_dr + state->n_wo_dw +
+ state->n_wo_drw == 0) {
call_close |= test_bit(NFS_O_WRONLY_STATE, &state->flags);
call_close |= test_bit(NFS_O_RDWR_STATE, &state->flags);
calldata->arg.fmode &= ~FMODE_WRITE;
}
}
+ if (state->n_ro_drw + state->n_wo_drw + state->n_rw_drw == 0) {
+ if (state->n_ro_dr + state->n_wo_dr + state->n_rw_dr == 0) {
+ call_close |= test_bit(NFS_O_DENYREAD_STATE,
+ &state->flags);
+ call_close |= test_bit(NFS_O_DENYRDWR_STATE,
+ &state->flags);
+ calldata->arg.deny_mode &= ~O_DENYREAD;
+ }
+ if (state->n_ro_dw + state->n_wo_dw + state->n_rw_dw == 0) {
+ call_close |= test_bit(NFS_O_DENYWRITE_STATE,
+ &state->flags);
+ call_close |= test_bit(NFS_O_DENYRDWR_STATE,
+ &state->flags);
+ calldata->arg.deny_mode &= ~O_DENYWRITE;
+ }
+ }
if (!nfs4_valid_open_stateid(state))
call_close = 0;
spin_unlock(&state->owner->so_lock);
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 168f868..bdd7808 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -736,19 +736,23 @@ __nfs4_close(struct nfs4_state *state, fmode_t fmode, unsigned int deny_mode,
struct nfs4_state_owner *owner = state->owner;
int call_close = 0;
fmode_t newstate;
+ unsigned int newdeny_mode;

atomic_inc(&owner->so_count);
/* Protect against nfs4_find_state() */
spin_lock(&owner->so_lock);
dec_state_n(state, fmode, deny_mode);
newstate = FMODE_READ|FMODE_WRITE;
- if (state->n_rdwr == 0) {
- if (state->n_rdonly == 0) {
+ if (state->n_rdwr + state->n_rw_dr +
+ state->n_rw_dw + state->n_rw_drw == 0) {
+ if (state->n_rdonly + state->n_ro_dr + state->n_ro_dw +
+ state->n_ro_drw == 0) {
newstate &= ~FMODE_READ;
call_close |= test_bit(NFS_O_RDONLY_STATE, &state->flags);
call_close |= test_bit(NFS_O_RDWR_STATE, &state->flags);
}
- if (state->n_wronly == 0) {
+ if (state->n_wronly + state->n_wo_dr + state->n_wo_dw +
+ state->n_wo_drw == 0) {
newstate &= ~FMODE_WRITE;
call_close |= test_bit(NFS_O_WRONLY_STATE, &state->flags);
call_close |= test_bit(NFS_O_RDWR_STATE, &state->flags);
@@ -756,7 +760,24 @@ __nfs4_close(struct nfs4_state *state, fmode_t fmode, unsigned int deny_mode,
if (newstate == 0)
clear_bit(NFS_DELEGATED_STATE, &state->flags);
}
- nfs4_state_set_mode_locked(state, newstate, 0);
+ newdeny_mode = O_DENYREAD|O_DENYWRITE;
+ if (state->n_ro_drw + state->n_wo_drw + state->n_rw_drw == 0) {
+ if (state->n_ro_dr + state->n_wo_dr + state->n_rw_dr == 0) {
+ newdeny_mode &= ~O_DENYREAD;
+ call_close |= test_bit(NFS_O_DENYREAD_STATE,
+ &state->flags);
+ call_close |= test_bit(NFS_O_DENYRDWR_STATE,
+ &state->flags);
+ }
+ if (state->n_ro_dw + state->n_wo_dw + state->n_rw_dw == 0) {
+ newdeny_mode &= ~O_DENYWRITE;
+ call_close |= test_bit(NFS_O_DENYWRITE_STATE,
+ &state->flags);
+ call_close |= test_bit(NFS_O_DENYRDWR_STATE,
+ &state->flags);
+ }
+ }
+ nfs4_state_set_mode_locked(state, newstate, newdeny_mode);
spin_unlock(&owner->so_lock);

if (!call_close) {
@@ -1541,6 +1562,10 @@ static void nfs4_clear_open_state(struct nfs4_state *state)
clear_bit(NFS_O_RDONLY_STATE, &state->flags);
clear_bit(NFS_O_WRONLY_STATE, &state->flags);
clear_bit(NFS_O_RDWR_STATE, &state->flags);
+ clear_bit(NFS_O_DENYNONE_STATE, &state->flags);
+ clear_bit(NFS_O_DENYREAD_STATE, &state->flags);
+ clear_bit(NFS_O_DENYWRITE_STATE, &state->flags);
+ clear_bit(NFS_O_DENYRDWR_STATE, &state->flags);
spin_lock(&state->state_lock);
list_for_each_entry(lock, &state->lock_states, ls_locks) {
lock->ls_seqid.flags = 0;
diff --git a/fs/nfs/nfs4super.c b/fs/nfs/nfs4super.c
index 65ab0a0..1f91571 100644
--- a/fs/nfs/nfs4super.c
+++ b/fs/nfs/nfs4super.c
@@ -29,7 +29,8 @@ static struct file_system_type nfs4_remote_fs_type = {
.name = "nfs4",
.mount = nfs4_remote_mount,
.kill_sb = nfs_kill_super,
- .fs_flags = FS_RENAME_DOES_D_MOVE|FS_BINARY_MOUNTDATA,
+ .fs_flags = FS_RENAME_DOES_D_MOVE | FS_BINARY_MOUNTDATA |
+ FS_DOES_SHARELOCK,
};

static struct file_system_type nfs4_remote_referral_fs_type = {
@@ -37,7 +38,8 @@ static struct file_system_type nfs4_remote_referral_fs_type = {
.name = "nfs4",
.mount = nfs4_remote_referral_mount,
.kill_sb = nfs_kill_super,
- .fs_flags = FS_RENAME_DOES_D_MOVE|FS_BINARY_MOUNTDATA,
+ .fs_flags = FS_RENAME_DOES_D_MOVE | FS_BINARY_MOUNTDATA |
+ FS_DOES_SHARELOCK,
};

struct file_system_type nfs4_referral_fs_type = {
@@ -45,7 +47,8 @@ struct file_system_type nfs4_referral_fs_type = {
.name = "nfs4",
.mount = nfs4_referral_mount,
.kill_sb = nfs_kill_super,
- .fs_flags = FS_RENAME_DOES_D_MOVE|FS_BINARY_MOUNTDATA,
+ .fs_flags = FS_RENAME_DOES_D_MOVE | FS_BINARY_MOUNTDATA |
+ FS_DOES_SHARELOCK,
};

static const struct super_operations nfs4_sops = {
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index ed507f4..870d297 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -1377,7 +1377,19 @@ static void encode_share_access(struct xdr_stream *xdr, fmode_t fmode,
default:
*p++ = cpu_to_be32(0);
}
- *p = cpu_to_be32(0); /* for linux, share_deny = 0 always */
+ switch (open_flags & (O_DENYREAD|O_DENYWRITE)) {
+ case O_DENYREAD:
+ *p = cpu_to_be32(NFS4_SHARE_DENY_READ);
+ break;
+ case O_DENYWRITE:
+ *p = cpu_to_be32(NFS4_SHARE_DENY_WRITE);
+ break;
+ case O_DENYREAD|O_DENYWRITE:
+ *p = cpu_to_be32(NFS4_SHARE_DENY_BOTH);
+ break;
+ default:
+ *p = cpu_to_be32(0);
+ }
}

static inline void encode_openhdr(struct xdr_stream *xdr, const struct nfs_openargs *arg)
diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index 910ed90..73bc4d8 100644
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -333,7 +333,8 @@ struct file_system_type nfs4_fs_type = {
.name = "nfs4",
.mount = nfs_fs_mount,
.kill_sb = nfs_kill_super,
- .fs_flags = FS_RENAME_DOES_D_MOVE|FS_BINARY_MOUNTDATA,
+ .fs_flags = FS_RENAME_DOES_D_MOVE | FS_BINARY_MOUNTDATA |
+ FS_DOES_SHARELOCK,
};
MODULE_ALIAS_FS("nfs4");
MODULE_ALIAS("nfs4");
@@ -2699,6 +2700,8 @@ nfs_xdev_mount(struct file_system_type *fs_type, int flags,
struct nfs_server *server;
struct dentry *mntroot = ERR_PTR(-ENOMEM);
struct nfs_subversion *nfs_mod = NFS_SB(data->sb)->nfs_client->cl_nfs_mod;
+ /* save sharelock option */
+ int sharelock = data->sb->s_flags & MS_SHARELOCK;

dprintk("--> nfs_xdev_mount()\n");

@@ -2710,7 +2713,7 @@ nfs_xdev_mount(struct file_system_type *fs_type, int flags,
if (IS_ERR(server))
mntroot = ERR_CAST(server);
else
- mntroot = nfs_fs_mount_common(server, flags,
+ mntroot = nfs_fs_mount_common(server, flags | sharelock,
dev_name, &mount_info, nfs_mod);

dprintk("<-- nfs_xdev_mount() = %ld\n",
--
1.7.10.4

2014-01-17 10:09:51

by Pavel Shilovsky

[permalink] [raw]
Subject: [PATCH v7 3/7] locks: Disable LOCK_MAND support for MS_SHARELOCK mounts

Signed-off-by: Pavel Shilovsky <[email protected]>
---
fs/locks.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/fs/locks.c b/fs/locks.c
index 7ecc511..c7e780a 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -1904,6 +1904,12 @@ SYSCALL_DEFINE2(flock, unsigned int, fd, unsigned int, cmd)
goto out;
}

+ /* Disable LOCK_MAND support for FS mounted with MS_SHARELOCK */
+ if ((cmd & LOCK_MAND) && IS_SHARELOCK(f.file->f_path.dentry->d_inode)) {
+ error = -EPERM;
+ goto out;
+ }
+
can_sleep = !(cmd & LOCK_NB);
cmd &= ~LOCK_NB;
unlock = (cmd == LOCK_UN);
--
1.7.10.4

2014-01-17 10:10:46

by Pavel Shilovsky

[permalink] [raw]
Subject: [PATCH v7 1/7] VFS: Introduce new O_DENY* open flags

This patch adds 3 flags:
1) O_DENYREAD that doesn't permit read access,
2) O_DENYWRITE that doesn't permit write access,
3) O_DENYDELETE that doesn't permit delete or rename.

Network filesystems CIFS, SMB2.0, SMB3.0 and NFSv4 have such flags -
this change can benefit cifs and nfs modules as well as Samba and
NFS file servers that export the same directory for Windows clients,
or Wine applications that access the same files simultaneously.

These flags are only take affect for opens on mounts with new sharelock
option. They are translated to flock's flags:

!O_DENYREAD -> LOCK_READ | LOCK_MAND
!O_DENYWRITE -> LOCK_WRITE | LOCK_MAND

and set through flock_lock_file on a file. If the file can't be locked
due conflicts with another open with O_DENY* flags, a new -ESHAREDENIED
error code is returned.

Create codepath is slightly changed to prevent data races on newly
created files: when open with O_CREAT can return -ESHAREDENIED error
for successfully created files due to a sharelock set by another task.

Temporary disable O_DENYDELETE support - will enable it in further
patches.

Signed-off-by: Pavel Shilovsky <[email protected]>
---
arch/alpha/include/uapi/asm/errno.h | 2 +
arch/alpha/include/uapi/asm/fcntl.h | 3 ++
arch/mips/include/uapi/asm/errno.h | 2 +
arch/parisc/include/uapi/asm/errno.h | 2 +
arch/parisc/include/uapi/asm/fcntl.h | 3 ++
arch/sparc/include/uapi/asm/errno.h | 2 +
arch/sparc/include/uapi/asm/fcntl.h | 3 ++
fs/fcntl.c | 5 +-
fs/locks.c | 97 +++++++++++++++++++++++++++++++---
fs/namei.c | 53 ++++++++++++++++++-
fs/proc_namespace.c | 1 +
include/linux/fs.h | 8 +++
include/uapi/asm-generic/errno.h | 2 +
include/uapi/asm-generic/fcntl.h | 11 ++++
include/uapi/linux/fs.h | 1 +
15 files changed, 185 insertions(+), 10 deletions(-)

diff --git a/arch/alpha/include/uapi/asm/errno.h b/arch/alpha/include/uapi/asm/errno.h
index 17f92aa..953a6d6 100644
--- a/arch/alpha/include/uapi/asm/errno.h
+++ b/arch/alpha/include/uapi/asm/errno.h
@@ -124,4 +124,6 @@

#define EHWPOISON 139 /* Memory page has hardware error */

+#define ESHAREDENIED 140 /* File is locked with a sharelock */
+
#endif
diff --git a/arch/alpha/include/uapi/asm/fcntl.h b/arch/alpha/include/uapi/asm/fcntl.h
index 09f49a6..265344b 100644
--- a/arch/alpha/include/uapi/asm/fcntl.h
+++ b/arch/alpha/include/uapi/asm/fcntl.h
@@ -33,6 +33,9 @@

#define O_PATH 040000000
#define __O_TMPFILE 0100000000
+#define O_DENYREAD 0200000000 /* Do not permit read access */
+#define O_DENYWRITE 0400000000 /* Do not permit write access */
+#define O_DENYDELETE 01000000000 /* Do not permit delete or rename */

#define F_GETLK 7
#define F_SETLK 8
diff --git a/arch/mips/include/uapi/asm/errno.h b/arch/mips/include/uapi/asm/errno.h
index 02d645d..f1a4068 100644
--- a/arch/mips/include/uapi/asm/errno.h
+++ b/arch/mips/include/uapi/asm/errno.h
@@ -123,6 +123,8 @@

#define EHWPOISON 168 /* Memory page has hardware error */

+#define ESHAREDENIED 169 /* File is locked with a sharelock */
+
#define EDQUOT 1133 /* Quota exceeded */


diff --git a/arch/parisc/include/uapi/asm/errno.h b/arch/parisc/include/uapi/asm/errno.h
index f3a8aa5..654c232 100644
--- a/arch/parisc/include/uapi/asm/errno.h
+++ b/arch/parisc/include/uapi/asm/errno.h
@@ -124,4 +124,6 @@

#define EHWPOISON 257 /* Memory page has hardware error */

+#define ESHAREDENIED 258 /* File is locked with a sharelock */
+
#endif
diff --git a/arch/parisc/include/uapi/asm/fcntl.h b/arch/parisc/include/uapi/asm/fcntl.h
index 34a46cb..5865964 100644
--- a/arch/parisc/include/uapi/asm/fcntl.h
+++ b/arch/parisc/include/uapi/asm/fcntl.h
@@ -21,6 +21,9 @@

#define O_PATH 020000000
#define __O_TMPFILE 040000000
+#define O_DENYREAD 0200000000 /* Do not permit read access */
+#define O_DENYWRITE 0400000000 /* Do not permit write access */
+#define O_DENYDELETE 01000000000 /* Do not permit delete or rename */

#define F_GETLK64 8
#define F_SETLK64 9
diff --git a/arch/sparc/include/uapi/asm/errno.h b/arch/sparc/include/uapi/asm/errno.h
index 20423e17..fe339b5 100644
--- a/arch/sparc/include/uapi/asm/errno.h
+++ b/arch/sparc/include/uapi/asm/errno.h
@@ -114,4 +114,6 @@

#define EHWPOISON 135 /* Memory page has hardware error */

+#define ESHAREDENIED 136 /* File is locked with a sharelock */
+
#endif
diff --git a/arch/sparc/include/uapi/asm/fcntl.h b/arch/sparc/include/uapi/asm/fcntl.h
index 7e8ace5..ab68170 100644
--- a/arch/sparc/include/uapi/asm/fcntl.h
+++ b/arch/sparc/include/uapi/asm/fcntl.h
@@ -36,6 +36,9 @@

#define O_PATH 0x1000000
#define __O_TMPFILE 0x2000000
+#define O_DENYREAD 0x4000000 /* Do not permit read access */
+#define O_DENYWRITE 0x8000000 /* Do not permit write access */
+#define O_DENYDELETE 0x10000000 /* Do not permit delete or rename */

#define F_GETOWN 5 /* for sockets. */
#define F_SETOWN 6 /* for sockets. */
diff --git a/fs/fcntl.c b/fs/fcntl.c
index ef68665..3f85887 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -729,14 +729,15 @@ static int __init fcntl_init(void)
* Exceptions: O_NONBLOCK is a two bit define on parisc; O_NDELAY
* is defined as O_NONBLOCK on some platforms and not on others.
*/
- BUILD_BUG_ON(20 - 1 /* for O_RDONLY being 0 */ != HWEIGHT32(
+ BUILD_BUG_ON(23 - 1 /* for O_RDONLY being 0 */ != HWEIGHT32(
O_RDONLY | O_WRONLY | O_RDWR |
O_CREAT | O_EXCL | O_NOCTTY |
O_TRUNC | O_APPEND | /* O_NONBLOCK | */
__O_SYNC | O_DSYNC | FASYNC |
O_DIRECT | O_LARGEFILE | O_DIRECTORY |
O_NOFOLLOW | O_NOATIME | O_CLOEXEC |
- __FMODE_EXEC | O_PATH | __O_TMPFILE
+ __FMODE_EXEC | O_PATH | __O_TMPFILE |
+ O_DENYREAD | O_DENYWRITE | O_DENYDELETE
));

fasync_cache = kmem_cache_create("fasync_cache",
diff --git a/fs/locks.c b/fs/locks.c
index 92a0f0a..ffde4d4 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -708,20 +708,73 @@ static int posix_locks_conflict(struct file_lock *caller_fl, struct file_lock *s
return (locks_conflict(caller_fl, sys_fl));
}

-/* Determine if lock sys_fl blocks lock caller_fl. FLOCK specific
- * checking before calling the locks_conflict().
+static unsigned int
+deny_flags_to_cmd(unsigned int flags)
+{
+ unsigned int cmd = LOCK_MAND;
+
+ if (!(flags & O_DENYREAD))
+ cmd |= LOCK_READ;
+ if (!(flags & O_DENYWRITE))
+ cmd |= LOCK_WRITE;
+
+ return cmd;
+}
+
+/*
+ * locks_mand_conflict - Determine if there's a share reservation conflict
+ * @caller_fl: lock we're attempting to acquire
+ * @sys_fl: lock already present on system that we're checking against
+ *
+ * Check to see if there's a share_reservation conflict. LOCK_READ/LOCK_WRITE
+ * tell us whether the reservation allows other readers and writers.
+ */
+static int
+locks_mand_conflict(struct file_lock *caller_fl, struct file_lock *sys_fl)
+{
+ unsigned char caller_type = caller_fl->fl_type;
+ unsigned char sys_type = sys_fl->fl_type;
+ fmode_t caller_fmode = caller_fl->fl_file->f_mode;
+ fmode_t sys_fmode = sys_fl->fl_file->f_mode;
+
+ /* they can only conflict if FS is mounted with MS_SHARELOCK */
+ if (!IS_SHARELOCK(caller_fl->fl_file->f_path.dentry->d_inode))
+ return 0;
+
+ /* they can only conflict if they're both LOCK_MAND */
+ if (!(caller_type & LOCK_MAND) || !(sys_type & LOCK_MAND))
+ return 0;
+
+ if (!(caller_type & LOCK_READ) && (sys_fmode & FMODE_READ))
+ return 1;
+ if (!(caller_type & LOCK_WRITE) && (sys_fmode & FMODE_WRITE))
+ return 1;
+ if (!(sys_type & LOCK_READ) && (caller_fmode & FMODE_READ))
+ return 1;
+ if (!(sys_type & LOCK_WRITE) && (caller_fmode & FMODE_WRITE))
+ return 1;
+
+ return 0;
+}
+
+/*
+ * Determine if lock sys_fl blocks lock caller_fl. FLOCK specific checking
+ * before calling the locks_conflict().
*/
static int flock_locks_conflict(struct file_lock *caller_fl, struct file_lock *sys_fl)
{
- /* FLOCK locks referring to the same filp do not conflict with
+ if (!IS_FLOCK(sys_fl))
+ return 0;
+ if ((caller_fl->fl_type & LOCK_MAND) || (sys_fl->fl_type & LOCK_MAND))
+ return locks_mand_conflict(caller_fl, sys_fl);
+ /*
+ * FLOCK locks referring to the same filp do not conflict with
* each other.
*/
- if (!IS_FLOCK(sys_fl) || (caller_fl->fl_file == sys_fl->fl_file))
- return (0);
- if ((caller_fl->fl_type & LOCK_MAND) || (sys_fl->fl_type & LOCK_MAND))
+ if (caller_fl->fl_file == sys_fl->fl_file)
return 0;

- return (locks_conflict(caller_fl, sys_fl));
+ return locks_conflict(caller_fl, sys_fl);
}

void
@@ -888,6 +941,36 @@ out:
return error;
}

+/*
+ * Determine if a file is allowed to be opened with specified access and share
+ * modes. Lock the file and return 0 if checks passed, otherwise return
+ * -ESHAREDENIED.
+ */
+int
+sharelock_lock_file(struct file *filp)
+{
+ struct file_lock *lock;
+ int error = 0;
+
+ if (!IS_SHARELOCK(filp->f_path.dentry->d_inode))
+ return error;
+
+ /* Disable O_DENYDELETE support for now */
+ if (filp->f_flags & O_DENYDELETE)
+ return -EINVAL;
+
+ error = flock_make_lock(filp, &lock, deny_flags_to_cmd(filp->f_flags));
+ if (error)
+ return error;
+
+ error = flock_lock_file(filp, lock);
+ if (error == -EAGAIN)
+ error = -ESHAREDENIED;
+
+ locks_free_lock(lock);
+ return error;
+}
+
static int __posix_lock_file(struct inode *inode, struct file_lock *request, struct file_lock *conflock)
{
struct file_lock *fl;
diff --git a/fs/namei.c b/fs/namei.c
index 3531dee..2b741a1 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2725,9 +2725,14 @@ static int atomic_open(struct nameidata *nd, struct dentry *dentry,
acc_mode = MAY_OPEN;
}
error = may_open(&file->f_path, acc_mode, open_flag);
- if (error)
+ if (error) {
fput(file);
+ goto out;
+ }

+ error = sharelock_lock_file(file);
+ if (error)
+ fput(file);
out:
dput(dentry);
return error;
@@ -2919,6 +2924,40 @@ retry_lookup:
}
mutex_lock(&dir->d_inode->i_mutex);
error = lookup_open(nd, path, file, op, got_write, opened);
+
+ /*
+ * For sharelock mounts if a file was created but not opened, we need
+ * to keep parent i_mutex until we finish the open to prevent races when
+ * somebody opens newly created by us file and locks it with a sharelock
+ * before we open it.
+ */
+ if (IS_SHARELOCK(dir->d_inode) && error > 0 && *opened & FILE_CREATED) {
+ /* Don't check for write permission, don't truncate */
+ open_flag &= ~O_TRUNC;
+ will_truncate = false;
+ acc_mode = MAY_OPEN;
+ path_to_nameidata(path, nd);
+
+ error = may_open(&nd->path, acc_mode, open_flag);
+ if (error) {
+ mutex_unlock(&dir->d_inode->i_mutex);
+ goto out;
+ }
+ file->f_path.mnt = nd->path.mnt;
+ error = finish_open(file, nd->path.dentry, NULL, opened);
+ if (error) {
+ mutex_unlock(&dir->d_inode->i_mutex);
+ if (error == -EOPENSTALE)
+ goto stale_open;
+ goto out;
+ }
+ error = sharelock_lock_file(file);
+ mutex_unlock(&dir->d_inode->i_mutex);
+ if (error)
+ goto exit_fput;
+ goto opened;
+ }
+
mutex_unlock(&dir->d_inode->i_mutex);

if (error <= 0) {
@@ -3034,6 +3073,18 @@ finish_open_created:
goto stale_open;
goto out;
}
+
+ if (IS_SHARELOCK(dir->d_inode)) {
+ /*
+ * Lock parent i_mutex to prevent races with sharelocks on
+ * newly created files.
+ */
+ mutex_lock(&dir->d_inode->i_mutex);
+ error = sharelock_lock_file(file);
+ mutex_unlock(&dir->d_inode->i_mutex);
+ if (error)
+ goto exit_fput;
+ }
opened:
error = open_check_o_direct(file);
if (error)
diff --git a/fs/proc_namespace.c b/fs/proc_namespace.c
index 439406e..dd374d4 100644
--- a/fs/proc_namespace.c
+++ b/fs/proc_namespace.c
@@ -44,6 +44,7 @@ static int show_sb_opts(struct seq_file *m, struct super_block *sb)
{ MS_SYNCHRONOUS, ",sync" },
{ MS_DIRSYNC, ",dirsync" },
{ MS_MANDLOCK, ",mand" },
+ { MS_SHARELOCK, ",sharelock" },
{ 0, NULL }
};
const struct proc_fs_info *fs_infop;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 121f11f..aa061ca 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1029,6 +1029,7 @@ extern int vfs_setlease(struct file *, long, struct file_lock **);
extern int lease_modify(struct file_lock **, int);
extern int lock_may_read(struct inode *, loff_t start, unsigned long count);
extern int lock_may_write(struct inode *, loff_t start, unsigned long count);
+extern int sharelock_lock_file(struct file *);
#else /* !CONFIG_FILE_LOCKING */
static inline int fcntl_getlk(struct file *file, struct flock __user *user)
{
@@ -1169,6 +1170,12 @@ static inline int lock_may_write(struct inode *inode, loff_t start,
{
return 1;
}
+
+static inline int sharelock_lock_file(struct file *filp)
+{
+ return 0;
+}
+
#endif /* !CONFIG_FILE_LOCKING */


@@ -1675,6 +1682,7 @@ struct super_operations {
#define IS_PRIVATE(inode) ((inode)->i_flags & S_PRIVATE)
#define IS_IMA(inode) ((inode)->i_flags & S_IMA)
#define IS_AUTOMOUNT(inode) ((inode)->i_flags & S_AUTOMOUNT)
+#define IS_SHARELOCK(inode) __IS_FLG(inode, MS_SHARELOCK)
#define IS_NOSEC(inode) ((inode)->i_flags & S_NOSEC)

/*
diff --git a/include/uapi/asm-generic/errno.h b/include/uapi/asm-generic/errno.h
index 1e1ea6e..aff869c 100644
--- a/include/uapi/asm-generic/errno.h
+++ b/include/uapi/asm-generic/errno.h
@@ -110,4 +110,6 @@

#define EHWPOISON 133 /* Memory page has hardware error */

+#define ESHAREDENIED 134 /* File is locked with a sharelock */
+
#endif
diff --git a/include/uapi/asm-generic/fcntl.h b/include/uapi/asm-generic/fcntl.h
index 95e46c8..9881cfe 100644
--- a/include/uapi/asm-generic/fcntl.h
+++ b/include/uapi/asm-generic/fcntl.h
@@ -92,6 +92,17 @@
#define O_TMPFILE (__O_TMPFILE | O_DIRECTORY)
#define O_TMPFILE_MASK (__O_TMPFILE | O_DIRECTORY | O_CREAT)

+#ifndef O_DENYREAD
+#define O_DENYREAD 040000000 /* Do not permit read access */
+#endif
+/* FMODE_NONOTIFY 0100000000 */
+#ifndef O_DENYWRITE
+#define O_DENYWRITE 0200000000 /* Do not permit write access */
+#endif
+#ifndef O_DENYDELETE
+#define O_DENYDELETE 0400000000 /* Do not permit delete or rename */
+#endif
+
#ifndef O_NDELAY
#define O_NDELAY O_NONBLOCK
#endif
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index 6c28b61..11f0ecf 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -86,6 +86,7 @@ struct inodes_stat_t {
#define MS_KERNMOUNT (1<<22) /* this is a kern_mount call */
#define MS_I_VERSION (1<<23) /* Update inode I_version field */
#define MS_STRICTATIME (1<<24) /* Always perform atime updates */
+#define MS_SHARELOCK (1<<25) /* Allow share locks on an FS */

/* These sb flags are internal to the kernel */
#define MS_NOSEC (1<<28)
--
1.7.10.4

2014-01-17 18:19:10

by Alan Cox

[permalink] [raw]
Subject: Re: [PATCH v7 1/7] VFS: Introduce new O_DENY* open flags

> +#define ESHAREDENIED 258 /* File is locked with a sharelock */

Have you prepared C library patches to match this ?

(and why not just use EPERM, it has the meaning you want already)


> + * Check to see if there's a share_reservation conflict. LOCK_READ/LOCK_WRITE
> + * tell us whether the reservation allows other readers and writers.
> + */
> +static int
> +locks_mand_conflict(struct file_lock *caller_fl, struct file_lock *sys_fl)
> +{

Shouldn't this also check for CAP_SYS_DAC or some similar permission so
that root can override such a mess (eg to fix full disks in an
emergency) ?


> +
> + /*
> + * For sharelock mounts if a file was created but not opened, we need
> + * to keep parent i_mutex until we finish the open to prevent races when
> + * somebody opens newly created by us file and locks it with a sharelock
> + * before we open it.
> + */
> + if (IS_SHARELOCK(dir->d_inode) && error > 0 && *opened & FILE_CREATED) {
> + /* Don't check for write permission, don't truncate */
> + open_flag &= ~O_TRUNC;
> + will_truncate = false;
> + acc_mode = MAY_OPEN;
> + path_to_nameidata(path, nd);
> +
> + error = may_open(&nd->path, acc_mode, open_flag);
> + if (error) {
> + mutex_unlock(&dir->d_inode->i_mutex);
> + goto out;
> + }
> + file->f_path.mnt = nd->path.mnt;
> + error = finish_open(file, nd->path.dentry, NULL, opened);
> + if (error) {
> + mutex_unlock(&dir->d_inode->i_mutex);
> + if (error == -EOPENSTALE)
> + goto stale_open;
> + goto out;
> + }
> + error = sharelock_lock_file(file);
> + mutex_unlock(&dir->d_inode->i_mutex);
> + if (error)
> + goto exit_fput;
> + goto opened;
> + }
> +
> mutex_unlock(&dir->d_inode->i_mutex);

What stops the file system changing mount flags via a remount between
these two ?

>
> if (error <= 0) {
> @@ -3034,6 +3073,18 @@ finish_open_created:
> goto stale_open;
> goto out;
> }
> +
> + if (IS_SHARELOCK(dir->d_inode)) {
> + /*
> + * Lock parent i_mutex to prevent races with sharelocks on
> + * newly created files.
> + */
> + mutex_lock(&dir->d_inode->i_mutex);
> + error = sharelock_lock_file(file);
> + mutex_unlock(&dir->d_inode->i_mutex);
> + if (error)
> + goto exit_fput;
> + }
> opened:

2014-01-17 18:43:41

by Frank Filz

[permalink] [raw]
Subject: RE: [PATCH v7 0/7] Add O_DENY* support for VFS and CIFS/NFS

This looks wonderful and will be useful to the Ganesha user space NFS server
also.

I do have a couple questions.

1. How will this interact with the idea of private locks from the patch set
Jeff Layton has been pushing?

2. If a process opens multiple file descriptors with deny modes, will they
conflict with each other (which is the behavior we will want for Ganesha).

3. Is there any functionality to upgrade or downgrade the access and deny
modes (thinking in terms of NFS v4 support of OPEN upgrade and
OPEN_DOWNGRADE operations).

Thanks

Frank

> -----Original Message-----
> From: [email protected] [mailto:linux-nfs-
> [email protected]] On Behalf Of Pavel Shilovsky
> Sent: Friday, January 17, 2014 2:07 AM
> To: [email protected]
> Cc: [email protected]; [email protected]; linux-
> [email protected]; [email protected]
> Subject: [PATCH v7 0/7] Add O_DENY* support for VFS and CIFS/NFS
>
> This patchset adds support of O_DENY* flags for Linux fs layer. These
flags
> can be used by any application that needs share reservations to organize a
> file access. VFS already has some sort of this capability - now it's done
> through flock/LOCK_MAND mechanis, but that approach is non-atomic. This
> patchset build new capabilities on top of the existing one but doesn't
bring
> any changes into the flock call semantic.
>
> These flags can be used by NFS (built-in-kernel) and CIFS (Samba) servers
> and Wine applications through VFS (for local filesystems) or CIFS/NFS
> modules. This will help when e.g. Samba and NFS server share the same
> directory for Windows and Linux users or Wine applications use Samba/NFS
> share to access the same data from different clients.
>
> According to the previous discussions the most problematic question is how
> to prevent situations like DoS attacks where e.g /lib/liba.so file can be
open
> with DENYREAD, or smth like this. That's why extra mount option
'sharelock'
> is added for switching on/off O_DENY* flags processing. It allows to avoid
use
> of these flags on critical places (like /, /lib) and turn them on if we
really want
> it proccessed that way.
>
> So, we have 3 new flags:
> O_DENYREAD - to prevent other opens with read access, O_DENYWRITE - to
> prevent other opens with write access, O_DENYDELETE - to prevent delete
> operations
>
> The patchset avoids data races problem on newely created files: open with
> O_CREAT can return the -ESHAREDENIED error for a successfully created file
> if this files was locked with a sharelock by another task. Also, it turns
> flock/LOCK_MAND capability off for mounts with 'sharelock' option. This
lets
> not mix one share reservation approach with another.
>
> Patches #1, #2 and #3 add O_DENY* flags to fcntl and implement VFS part. A
> patch #4 is related to CIFS client changes, #5 -- NFSD, #6 and #7 describe
> NFSv4 client parts.
>
> The preliminary patch for Samba that replaces the existing use of
> flock/LOCK_MAND mechanism with O_DENY* flags:
> http://git.etersoft.ru/people/piastry/packages/?p=samba.git;a=commitdiff;
> h=173070262b7f29134ad6b2e49a760017c11aec4a
>
> The future part of open man page patch:
>
> -----
> O_DENYREAD, O_DENYWRIYE, O_DENYDELETE - used to inforce a mandatory
> share reservation scheme of the file access. If these flag is passed, the
open
> fails with -ESHAREDENIED in following cases:
> 1) if O_DENYREAD flag is specified and there is another open with READ
> access to the file;
> 2) if O_DENYWRITE flag is specified and there is another open with WRITE
> access to the file;
> 3) if READ access is requested and there is another open with O_DENYREAD
> flags;
> 4) if WRITE access is requested and there is another open with
> O_DENYWRITE flags.
>
> If O_DENYDELETE flag is specified and the open succeded, any further
unlink
> operation will fail with -ESHAREDENIED untill this open is closed. Now
this flag
> is processed by VFS and CIFS filesystem. NFS returns -EINVAL for opens
with
> this flag.
>
> This mechanism is done through flocks. If O_DENY* flags are specified,
flock
> syscall is processed after the open. The share modes are translated into
flock
> according the following rules:
> 1) !O_DENYREAD -> LOCK_READ | LOCK_MAND
> 2) !O_DENYWRITE -> LOCK_WRITE | LOCK_MAND
> 3) !O_DENYDELETE -> LOCK_DELETE | LOCK_MAND
> -----
>
> Pavel Shilovsky (7):
> VFS: Introduce new O_DENY* open flags
> VFS: Add O_DENYDELETE support for VFS
> locks: Disable LOCK_MAND support for MS_SHARELOCK mounts
> CIFS: Add O_DENY* open flags support
> NFSD: Pass share reservations flags to VFS
> NFSv4: Add deny state handling for nfs4_state struct
> NFSv4: Add O_DENY* open flags support
>
> arch/alpha/include/uapi/asm/errno.h | 2 +
> arch/alpha/include/uapi/asm/fcntl.h | 3 +
> arch/mips/include/uapi/asm/errno.h | 2 +
> arch/parisc/include/uapi/asm/errno.h | 2 +
> arch/parisc/include/uapi/asm/fcntl.h | 3 +
> arch/sparc/include/uapi/asm/errno.h | 2 +
> arch/sparc/include/uapi/asm/fcntl.h | 3 +
> fs/cifs/cifsacl.c | 2 +
> fs/cifs/cifsfs.c | 2 +-
> fs/cifs/cifsglob.h | 17 +-
> fs/cifs/cifssmb.c | 2 +-
> fs/cifs/dir.c | 6 +
> fs/cifs/file.c | 20 ++-
> fs/cifs/inode.c | 12 +-
> fs/cifs/link.c | 2 +
> fs/cifs/netmisc.c | 2 +-
> fs/cifs/smb1ops.c | 3 +
> fs/cifs/smb2inode.c | 1 +
> fs/cifs/smb2maperror.c | 2 +-
> fs/cifs/smb2ops.c | 6 +
> fs/cifs/smb2pdu.c | 2 +-
> fs/fcntl.c | 5 +-
> fs/locks.c | 161 ++++++++++++++++--
> fs/namei.c | 56 +++++-
> fs/nfs/dir.c | 39 ++++-
> fs/nfs/inode.c | 8 +-
> fs/nfs/internal.h | 3 +-
> fs/nfs/nfs4_fs.h | 83 ++++++++-
> fs/nfs/nfs4file.c | 8 +-
> fs/nfs/nfs4proc.c | 310
+++++++++++++++++++++++-----------
> fs/nfs/nfs4state.c | 65 ++++---
> fs/nfs/nfs4super.c | 9 +-
> fs/nfs/nfs4xdr.c | 21 ++-
> fs/nfs/super.c | 7 +-
> fs/nfsd/nfs4state.c | 46 ++++-
> fs/nfsd/nfsproc.c | 1 +
> fs/proc_namespace.c | 1 +
> include/linux/fs.h | 15 ++
> include/linux/nfs_fs.h | 5 +-
> include/linux/nfs_xdr.h | 1 +
> include/uapi/asm-generic/errno.h | 2 +
> include/uapi/asm-generic/fcntl.h | 12 ++
> include/uapi/linux/fs.h | 1 +
> 43 files changed, 789 insertions(+), 166 deletions(-)
>
> --
> 1.7.10.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the
> body of a message to [email protected] More majordomo info at
> http://vger.kernel.org/majordomo-info.html

2014-01-20 08:46:11

by Volker Lendecke

[permalink] [raw]
Subject: Re: [PATCH v7 0/7] Add O_DENY* support for VFS and CIFS/NFS

Hi!

On Fri, Jan 17, 2014 at 02:07:05PM +0400, Pavel Shilovsky wrote:
> If O_DENYDELETE flag is specified and the open succeded,
> any further unlink operation will fail with -ESHAREDENIED
> untill this open is closed. Now this flag is processed by
> VFS and CIFS filesystem. NFS returns -EINVAL for opens
> with this flag.

This looks really, really good, thanks!

One question: If Samba wants to open a file for delete
access, there's no corresponding flag in the open
permissions. There can be the case where Samba wants to open
*just* for future unlink, no read or write access required.
Is there a way to achieve this atomically correct?

Thanks,

Volker

--
SerNet GmbH, Bahnhofsallee 1b, 37081 G?ttingen
phone: +49-551-370000-0, fax: +49-551-370000-9
AG G?ttingen, HRB 2816, GF: Dr. Johannes Loxen
http://www.sernet.de, mailto:[email protected]

2014-01-20 09:56:30

by Pavel Shilovsky

[permalink] [raw]
Subject: Re: [PATCH v7 0/7] Add O_DENY* support for VFS and CIFS/NFS

2014/1/17 Frank Filz <[email protected]>:
> This looks wonderful and will be useful to the Ganesha user space NFS server
> also.
>
> I do have a couple questions.
>
> 1. How will this interact with the idea of private locks from the patch set
> Jeff Layton has been pushing?

They don't touch each other.

>
> 2. If a process opens multiple file descriptors with deny modes, will they
> conflict with each other (which is the behavior we will want for Ganesha).

Yes, a deny mode is associated with file descriptor - so, it will
conflict with any other access/deny modes of file descriptors from any
process.

>
> 3. Is there any functionality to upgrade or downgrade the access and deny
> modes (thinking in terms of NFS v4 support of OPEN upgrade and
> OPEN_DOWNGRADE operations).

The proposed patchset doesn't allow to change deny modes after an open
is done. But we can add a functionality to let flock syscall change
deny modes as on option.

--
Best regards,
Pavel Shilovsky.

2014-01-20 10:20:48

by Pavel Shilovsky

[permalink] [raw]
Subject: Re: [PATCH v7 0/7] Add O_DENY* support for VFS and CIFS/NFS

2014/1/20 Volker Lendecke <[email protected]>:
> Hi!
>
> On Fri, Jan 17, 2014 at 02:07:05PM +0400, Pavel Shilovsky wrote:
>> If O_DENYDELETE flag is specified and the open succeded,
>> any further unlink operation will fail with -ESHAREDENIED
>> untill this open is closed. Now this flag is processed by
>> VFS and CIFS filesystem. NFS returns -EINVAL for opens
>> with this flag.
>
> This looks really, really good, thanks!
>
> One question: If Samba wants to open a file for delete
> access, there's no corresponding flag in the open
> permissions. There can be the case where Samba wants to open
> *just* for future unlink, no read or write access required.
> Is there a way to achieve this atomically correct?

You can try to use O_PATH flag. It doesn't give you a delete access
but should be ok because further deleting will be done without file
descriptor -- through unlink syscall.

--
Best regards,
Pavel Shilovsky.

2014-01-20 10:31:34

by Volker Lendecke

[permalink] [raw]
Subject: Re: [PATCH v7 0/7] Add O_DENY* support for VFS and CIFS/NFS

On Mon, Jan 20, 2014 at 02:20:43PM +0400, Pavel Shilovsky wrote:
> > One question: If Samba wants to open a file for delete
> > access, there's no corresponding flag in the open
> > permissions. There can be the case where Samba wants to open
> > *just* for future unlink, no read or write access required.
> > Is there a way to achieve this atomically correct?
>
> You can try to use O_PATH flag. It doesn't give you a delete access
> but should be ok because further deleting will be done without file
> descriptor -- through unlink syscall.

Ok, I did not know about O_PATH. Thanks for that!

So I do an open with O_PATH. How do I then make sure that
nobody else has a O_DENYDELETE set without doing the unlink
itself?

Thanks,

Volker

--
SerNet GmbH, Bahnhofsallee 1b, 37081 G?ttingen
phone: +49-551-370000-0, fax: +49-551-370000-9
AG G?ttingen, HRB 2816, GF: Dr. Johannes Loxen
http://www.sernet.de, mailto:[email protected]

2014-01-20 10:45:05

by Pavel Shilovsky

[permalink] [raw]
Subject: Re: [PATCH v7 1/7] VFS: Introduce new O_DENY* open flags

2014/1/17 One Thousand Gnomes <[email protected]>:
>> +#define ESHAREDENIED 258 /* File is locked with a sharelock */
>
> Have you prepared C library patches to match this ?

I don't have it for now.

>
> (and why not just use EPERM, it has the meaning you want already)

We need to determine if we have a share reservation error to map it
accurately in file servers and wine.

>
>
>> + * Check to see if there's a share_reservation conflict. LOCK_READ/LOCK_WRITE
>> + * tell us whether the reservation allows other readers and writers.
>> + */
>> +static int
>> +locks_mand_conflict(struct file_lock *caller_fl, struct file_lock *sys_fl)
>> +{
>
> Shouldn't this also check for CAP_SYS_DAC or some similar permission so
> that root can override such a mess (eg to fix full disks in an
> emergency) ?

May be it's better to let root an ability to remount the system
without sharelock mount option and then fix an emergency?

>
>
>> +
>> + /*
>> + * For sharelock mounts if a file was created but not opened, we need
>> + * to keep parent i_mutex until we finish the open to prevent races when
>> + * somebody opens newly created by us file and locks it with a sharelock
>> + * before we open it.
>> + */
>> + if (IS_SHARELOCK(dir->d_inode) && error > 0 && *opened & FILE_CREATED) {
>> + /* Don't check for write permission, don't truncate */
>> + open_flag &= ~O_TRUNC;
>> + will_truncate = false;
>> + acc_mode = MAY_OPEN;
>> + path_to_nameidata(path, nd);
>> +
>> + error = may_open(&nd->path, acc_mode, open_flag);
>> + if (error) {
>> + mutex_unlock(&dir->d_inode->i_mutex);
>> + goto out;
>> + }
>> + file->f_path.mnt = nd->path.mnt;
>> + error = finish_open(file, nd->path.dentry, NULL, opened);
>> + if (error) {
>> + mutex_unlock(&dir->d_inode->i_mutex);
>> + if (error == -EOPENSTALE)
>> + goto stale_open;
>> + goto out;
>> + }
>> + error = sharelock_lock_file(file);
>> + mutex_unlock(&dir->d_inode->i_mutex);
>> + if (error)
>> + goto exit_fput;
>> + goto opened;
>> + }
>> +
>> mutex_unlock(&dir->d_inode->i_mutex);
>
> What stops the file system changing mount flags via a remount between
> these two ?

Nothing, but it doesn't bring problems here: if the first IS_SHARELOCK
condition is true, we don't get into the second (see 'goto opened').

>
>>
>> if (error <= 0) {
>> @@ -3034,6 +3073,18 @@ finish_open_created:
>> goto stale_open;
>> goto out;
>> }
>> +
>> + if (IS_SHARELOCK(dir->d_inode)) {
>> + /*
>> + * Lock parent i_mutex to prevent races with sharelocks on
>> + * newly created files.
>> + */
>> + mutex_lock(&dir->d_inode->i_mutex);
>> + error = sharelock_lock_file(file);
>> + mutex_unlock(&dir->d_inode->i_mutex);
>> + if (error)
>> + goto exit_fput;
>> + }
>> opened:
> --
> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html



--
Best regards,
Pavel Shilovsky.

2014-01-20 13:34:36

by Alan Cox

[permalink] [raw]
Subject: Re: [PATCH v7 1/7] VFS: Introduce new O_DENY* open flags

> > (and why not just use EPERM, it has the meaning you want already)
> We need to determine if we have a share reservation error to map it
> accurately in file servers and wine.

Ok

> > Shouldn't this also check for CAP_SYS_DAC or some similar permission so
> > that root can override such a mess (eg to fix full disks in an
> > emergency) ?
>
> May be it's better to let root an ability to remount the system
> without sharelock mount option and then fix an emergency?

Doesn't that involve breaking the service for all users of the system
relying upon those locks, while root being allowed to ignore the locks
does not ?

Alan

2014-01-21 13:19:10

by Pavel Shilovsky

[permalink] [raw]
Subject: Re: [PATCH v7 1/7] VFS: Introduce new O_DENY* open flags

2014/1/20 One Thousand Gnomes <[email protected]>:
>> > Shouldn't this also check for CAP_SYS_DAC or some similar permission so
>> > that root can override such a mess (eg to fix full disks in an
>> > emergency) ?
>>
>> May be it's better to let root an ability to remount the system
>> without sharelock mount option and then fix an emergency?
>
> Doesn't that involve breaking the service for all users of the system
> relying upon those locks, while root being allowed to ignore the locks
> does not ?

If we allow root to remount without "sharelock" (or bypass conflict
checks), it will definitely break the service for all users. Probably
it's better to stop the service (that cause all sharelocks to be
unlocked), fix the emergency and start the service again.

In the current state, the patchset doesn't allow any sort of ignoring
those locks for mounts with "sharelock" option (either remount without
"sharelock" or set special capabilities). It was done to make sure
nothing breaks applications relying upon sharelock behavior. Also,
that's why "sharelock" mount option was added: this behavior is
dangerous to be on critical system paths like "/" or "/lib" and not
suitable for global use.

--
Best regards,
Pavel Shilovsky.

2014-01-21 13:31:43

by Pavel Shilovsky

[permalink] [raw]
Subject: Re: [PATCH v7 0/7] Add O_DENY* support for VFS and CIFS/NFS

2014/1/20 Volker Lendecke <[email protected]>:
>> You can try to use O_PATH flag. It doesn't give you a delete access
>> but should be ok because further deleting will be done without file
>> descriptor -- through unlink syscall.
>
> Ok, I did not know about O_PATH. Thanks for that!
>
> So I do an open with O_PATH. How do I then make sure that
> nobody else has a O_DENYDELETE set without doing the unlink
> itself?

It's not possible with the current API to do it through open syscall.
Another possibility is to look at /proc/locks. But I think we really
need O_DELETE flag that will force a file to be removed on close - we
will be able to do O_DENYDELETE checks atomically.

--
Best regards,
Pavel Shilovsky.

2014-01-21 14:17:23

by Volker Lendecke

[permalink] [raw]
Subject: Re: [PATCH v7 0/7] Add O_DENY* support for VFS and CIFS/NFS

On Tue, Jan 21, 2014 at 05:31:39PM +0400, Pavel Shilovsky wrote:
> It's not possible with the current API to do it through open syscall.
> Another possibility is to look at /proc/locks. But I think we really
> need O_DELETE flag that will force a file to be removed on close - we
> will be able to do O_DENYDELETE checks atomically.

I don't depend on this at open time. I can do it later with
some syscall, that's fine. I just want to find a way to do
it correctly. The problem is -- you can't "try" an unlink.
So we have to probe whether we can unlink. And opening for
O_DENYDELETE by no means says we *will* unlink. We just have
to keep the option to do it exclusively.

With best regards,

Volker

--
SerNet GmbH, Bahnhofsallee 1b, 37081 G?ttingen
phone: +49-551-370000-0, fax: +49-551-370000-9
AG G?ttingen, HRB 2816, GF: Dr. Johannes Loxen
http://www.sernet.de, mailto:[email protected]

2014-01-27 19:49:54

by Frank Filz

[permalink] [raw]
Subject: RE: [PATCH v7 0/7] Add O_DENY* support for VFS and CIFS/NFS

> 2014/1/17 Frank Filz <[email protected]>:
> > This looks wonderful and will be useful to the Ganesha user space NFS
> > server also.
> >
> > I do have a couple questions.
> >
> > 1. How will this interact with the idea of private locks from the
> > patch set Jeff Layton has been pushing?
>
> They don't touch each other.
>
> >
> > 2. If a process opens multiple file descriptors with deny modes, will
> > they conflict with each other (which is the behavior we will want for
> Ganesha).
>
> Yes, a deny mode is associated with file descriptor - so, it will conflict
with any
> other access/deny modes of file descriptors from any process.
>
> >
> > 3. Is there any functionality to upgrade or downgrade the access and
> > deny modes (thinking in terms of NFS v4 support of OPEN upgrade and
> > OPEN_DOWNGRADE operations).
>
> The proposed patchset doesn't allow to change deny modes after an open is
> done. But we can add a functionality to let flock syscall change deny
modes
> as on option.

Yes, that would be good. NFS v4 allows upgrade/downgrade of acces/deny modes
(hmm, changing access mode would be trickier, but we need to be able to
upgrade/downgrade those as well as deny modes...). It could still be done
with an fcntl. I don't know if any NFS clients make use of
upgrade/downgrade (I do know there are pynfs tests for it).

Interestingly, upgrade/downgrade would be a solution to Ganesha's problems
with POSIX locks being dumped on any file descriptor close since we could
change the access mode as needed rather than needing to close and open the
file.

Frank

2014-01-31 15:51:57

by Pavel Shilovsky

[permalink] [raw]
Subject: Re: [PATCH v7 0/7] Add O_DENY* support for VFS and CIFS/NFS

2014-01-17 Pavel Shilovsky <[email protected]>:
> This patchset adds support of O_DENY* flags for Linux fs layer. These flags can be used by any application that needs share reservations to organize a file access. VFS already has some sort of this capability - now it's done through flock/LOCK_MAND mechanis, but that approach is non-atomic. This patchset build new capabilities on top of the existing one but doesn't bring any changes into the flock call semantic.
>
> These flags can be used by NFS (built-in-kernel) and CIFS (Samba) servers and Wine applications through VFS (for local filesystems) or CIFS/NFS modules. This will help when e.g. Samba and NFS server share the same directory for Windows and Linux users or Wine applications use Samba/NFS share to access the same data from different clients.
>
> According to the previous discussions the most problematic question is how to prevent situations like DoS attacks where e.g /lib/liba.so file can be open with DENYREAD, or smth like this. That's why extra mount option 'sharelock' is added for switching on/off O_DENY* flags processing. It allows to avoid use of these flags on critical places (like /, /lib) and turn them on if we really want it proccessed that way.
>
> So, we have 3 new flags:
> O_DENYREAD - to prevent other opens with read access,
> O_DENYWRITE - to prevent other opens with write access,
> O_DENYDELETE - to prevent delete operations
>
> The patchset avoids data races problem on newely created files: open with O_CREAT can return the -ESHAREDENIED error for a successfully created file if this files was locked with a sharelock by another task. Also, it turns flock/LOCK_MAND capability off for mounts with 'sharelock' option. This lets not mix one share reservation approach with another.
>
> Patches #1, #2 and #3 add O_DENY* flags to fcntl and implement VFS part. A patch #4 is related to CIFS client changes, #5 -- NFSD, #6 and #7 describe NFSv4 client parts.
>
> The preliminary patch for Samba that replaces the existing use of flock/LOCK_MAND mechanism with O_DENY* flags:
> http://git.etersoft.ru/people/piastry/packages/?p=samba.git;a=commitdiff;h=173070262b7f29134ad6b2e49a760017c11aec4a
>
> The future part of open man page patch:
>
> -----
> O_DENYREAD, O_DENYWRIYE, O_DENYDELETE - used to inforce a mandatory share reservation scheme of the file access. If these flag is passed, the open fails with -ESHAREDENIED in following cases:
> 1) if O_DENYREAD flag is specified and there is another open with READ access to the file;
> 2) if O_DENYWRITE flag is specified and there is another open with WRITE access to the file;
> 3) if READ access is requested and there is another open with O_DENYREAD flags;
> 4) if WRITE access is requested and there is another open with O_DENYWRITE flags.
>
> If O_DENYDELETE flag is specified and the open succeded, any further unlink operation will fail with -ESHAREDENIED untill this open is closed. Now this flag is processed by VFS and CIFS filesystem. NFS returns -EINVAL for opens with this flag.
>
> This mechanism is done through flocks. If O_DENY* flags are specified, flock syscall is processed after the open. The share modes are translated into flock according the following rules:
> 1) !O_DENYREAD -> LOCK_READ | LOCK_MAND
> 2) !O_DENYWRITE -> LOCK_WRITE | LOCK_MAND
> 3) !O_DENYDELETE -> LOCK_DELETE | LOCK_MAND
> -----
>
> Pavel Shilovsky (7):
> VFS: Introduce new O_DENY* open flags
> VFS: Add O_DENYDELETE support for VFS
> locks: Disable LOCK_MAND support for MS_SHARELOCK mounts
> CIFS: Add O_DENY* open flags support
> NFSD: Pass share reservations flags to VFS
> NFSv4: Add deny state handling for nfs4_state struct
> NFSv4: Add O_DENY* open flags support
>
> arch/alpha/include/uapi/asm/errno.h | 2 +
> arch/alpha/include/uapi/asm/fcntl.h | 3 +
> arch/mips/include/uapi/asm/errno.h | 2 +
> arch/parisc/include/uapi/asm/errno.h | 2 +
> arch/parisc/include/uapi/asm/fcntl.h | 3 +
> arch/sparc/include/uapi/asm/errno.h | 2 +
> arch/sparc/include/uapi/asm/fcntl.h | 3 +
> fs/cifs/cifsacl.c | 2 +
> fs/cifs/cifsfs.c | 2 +-
> fs/cifs/cifsglob.h | 17 +-
> fs/cifs/cifssmb.c | 2 +-
> fs/cifs/dir.c | 6 +
> fs/cifs/file.c | 20 ++-
> fs/cifs/inode.c | 12 +-
> fs/cifs/link.c | 2 +
> fs/cifs/netmisc.c | 2 +-
> fs/cifs/smb1ops.c | 3 +
> fs/cifs/smb2inode.c | 1 +
> fs/cifs/smb2maperror.c | 2 +-
> fs/cifs/smb2ops.c | 6 +
> fs/cifs/smb2pdu.c | 2 +-
> fs/fcntl.c | 5 +-
> fs/locks.c | 161 ++++++++++++++++--
> fs/namei.c | 56 +++++-
> fs/nfs/dir.c | 39 ++++-
> fs/nfs/inode.c | 8 +-
> fs/nfs/internal.h | 3 +-
> fs/nfs/nfs4_fs.h | 83 ++++++++-
> fs/nfs/nfs4file.c | 8 +-
> fs/nfs/nfs4proc.c | 310 +++++++++++++++++++++++-----------
> fs/nfs/nfs4state.c | 65 ++++---
> fs/nfs/nfs4super.c | 9 +-
> fs/nfs/nfs4xdr.c | 21 ++-
> fs/nfs/super.c | 7 +-
> fs/nfsd/nfs4state.c | 46 ++++-
> fs/nfsd/nfsproc.c | 1 +
> fs/proc_namespace.c | 1 +
> include/linux/fs.h | 15 ++
> include/linux/nfs_fs.h | 5 +-
> include/linux/nfs_xdr.h | 1 +
> include/uapi/asm-generic/errno.h | 2 +
> include/uapi/asm-generic/fcntl.h | 12 ++
> include/uapi/linux/fs.h | 1 +
> 43 files changed, 789 insertions(+), 166 deletions(-)
>
> --
> 1.7.10.4
>

Any other thoughts/reviews on this patchset?

--
Best regards,
Pavel Shilovsky.

2014-02-01 13:20:33

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH v7 1/7] VFS: Introduce new O_DENY* open flags

On Fri, 17 Jan 2014 14:07:06 +0400
Pavel Shilovsky <[email protected]> wrote:

> This patch adds 3 flags:
> 1) O_DENYREAD that doesn't permit read access,
> 2) O_DENYWRITE that doesn't permit write access,
> 3) O_DENYDELETE that doesn't permit delete or rename.
>
> Network filesystems CIFS, SMB2.0, SMB3.0 and NFSv4 have such flags -
> this change can benefit cifs and nfs modules as well as Samba and
> NFS file servers that export the same directory for Windows clients,
> or Wine applications that access the same files simultaneously.
>
> These flags are only take affect for opens on mounts with new sharelock
> option. They are translated to flock's flags:
>
> !O_DENYREAD -> LOCK_READ | LOCK_MAND
> !O_DENYWRITE -> LOCK_WRITE | LOCK_MAND
>
> and set through flock_lock_file on a file. If the file can't be locked
> due conflicts with another open with O_DENY* flags, a new -ESHAREDENIED
> error code is returned.
>
> Create codepath is slightly changed to prevent data races on newly
> created files: when open with O_CREAT can return -ESHAREDENIED error
> for successfully created files due to a sharelock set by another task.
>
> Temporary disable O_DENYDELETE support - will enable it in further
> patches.
>
> Signed-off-by: Pavel Shilovsky <[email protected]>
> ---
> arch/alpha/include/uapi/asm/errno.h | 2 +
> arch/alpha/include/uapi/asm/fcntl.h | 3 ++
> arch/mips/include/uapi/asm/errno.h | 2 +
> arch/parisc/include/uapi/asm/errno.h | 2 +
> arch/parisc/include/uapi/asm/fcntl.h | 3 ++
> arch/sparc/include/uapi/asm/errno.h | 2 +
> arch/sparc/include/uapi/asm/fcntl.h | 3 ++
> fs/fcntl.c | 5 +-
> fs/locks.c | 97 +++++++++++++++++++++++++++++++---
> fs/namei.c | 53 ++++++++++++++++++-
> fs/proc_namespace.c | 1 +
> include/linux/fs.h | 8 +++
> include/uapi/asm-generic/errno.h | 2 +
> include/uapi/asm-generic/fcntl.h | 11 ++++
> include/uapi/linux/fs.h | 1 +
> 15 files changed, 185 insertions(+), 10 deletions(-)
>

You might consider breaking this patch into two. One patch that makes
LOCK_MAND locks actually work and that adds MS_SHARELOCK, and one patch
that hooks that up to open(). Given the locking involved with the
i_mutex it would be best to present this as a series of small,
incremental changes.

> diff --git a/arch/alpha/include/uapi/asm/errno.h b/arch/alpha/include/uapi/asm/errno.h
> index 17f92aa..953a6d6 100644
> --- a/arch/alpha/include/uapi/asm/errno.h
> +++ b/arch/alpha/include/uapi/asm/errno.h
> @@ -124,4 +124,6 @@
>
> #define EHWPOISON 139 /* Memory page has hardware error */
>
> +#define ESHAREDENIED 140 /* File is locked with a sharelock */
> +
> #endif
> diff --git a/arch/alpha/include/uapi/asm/fcntl.h b/arch/alpha/include/uapi/asm/fcntl.h
> index 09f49a6..265344b 100644
> --- a/arch/alpha/include/uapi/asm/fcntl.h
> +++ b/arch/alpha/include/uapi/asm/fcntl.h
> @@ -33,6 +33,9 @@
>
> #define O_PATH 040000000
> #define __O_TMPFILE 0100000000
> +#define O_DENYREAD 0200000000 /* Do not permit read access */
> +#define O_DENYWRITE 0400000000 /* Do not permit write access */
> +#define O_DENYDELETE 01000000000 /* Do not permit delete or rename */
>
> #define F_GETLK 7
> #define F_SETLK 8
> diff --git a/arch/mips/include/uapi/asm/errno.h b/arch/mips/include/uapi/asm/errno.h
> index 02d645d..f1a4068 100644
> --- a/arch/mips/include/uapi/asm/errno.h
> +++ b/arch/mips/include/uapi/asm/errno.h
> @@ -123,6 +123,8 @@
>
> #define EHWPOISON 168 /* Memory page has hardware error */
>
> +#define ESHAREDENIED 169 /* File is locked with a sharelock */
> +
> #define EDQUOT 1133 /* Quota exceeded */
>
>
> diff --git a/arch/parisc/include/uapi/asm/errno.h b/arch/parisc/include/uapi/asm/errno.h
> index f3a8aa5..654c232 100644
> --- a/arch/parisc/include/uapi/asm/errno.h
> +++ b/arch/parisc/include/uapi/asm/errno.h
> @@ -124,4 +124,6 @@
>
> #define EHWPOISON 257 /* Memory page has hardware error */
>
> +#define ESHAREDENIED 258 /* File is locked with a sharelock */
> +
> #endif
> diff --git a/arch/parisc/include/uapi/asm/fcntl.h b/arch/parisc/include/uapi/asm/fcntl.h
> index 34a46cb..5865964 100644
> --- a/arch/parisc/include/uapi/asm/fcntl.h
> +++ b/arch/parisc/include/uapi/asm/fcntl.h
> @@ -21,6 +21,9 @@
>
> #define O_PATH 020000000
> #define __O_TMPFILE 040000000
> +#define O_DENYREAD 0200000000 /* Do not permit read access */
> +#define O_DENYWRITE 0400000000 /* Do not permit write access */
> +#define O_DENYDELETE 01000000000 /* Do not permit delete or rename */
>
> #define F_GETLK64 8
> #define F_SETLK64 9
> diff --git a/arch/sparc/include/uapi/asm/errno.h b/arch/sparc/include/uapi/asm/errno.h
> index 20423e17..fe339b5 100644
> --- a/arch/sparc/include/uapi/asm/errno.h
> +++ b/arch/sparc/include/uapi/asm/errno.h
> @@ -114,4 +114,6 @@
>
> #define EHWPOISON 135 /* Memory page has hardware error */
>
> +#define ESHAREDENIED 136 /* File is locked with a sharelock */
> +
> #endif
> diff --git a/arch/sparc/include/uapi/asm/fcntl.h b/arch/sparc/include/uapi/asm/fcntl.h
> index 7e8ace5..ab68170 100644
> --- a/arch/sparc/include/uapi/asm/fcntl.h
> +++ b/arch/sparc/include/uapi/asm/fcntl.h
> @@ -36,6 +36,9 @@
>
> #define O_PATH 0x1000000
> #define __O_TMPFILE 0x2000000
> +#define O_DENYREAD 0x4000000 /* Do not permit read access */
> +#define O_DENYWRITE 0x8000000 /* Do not permit write access */
> +#define O_DENYDELETE 0x10000000 /* Do not permit delete or rename */
>

It'd probably be best to add O_DENYDELETE in a separate patch, rather
than disabling it temporarily.

> #define F_GETOWN 5 /* for sockets. */
> #define F_SETOWN 6 /* for sockets. */
> diff --git a/fs/fcntl.c b/fs/fcntl.c
> index ef68665..3f85887 100644
> --- a/fs/fcntl.c
> +++ b/fs/fcntl.c
> @@ -729,14 +729,15 @@ static int __init fcntl_init(void)
> * Exceptions: O_NONBLOCK is a two bit define on parisc; O_NDELAY
> * is defined as O_NONBLOCK on some platforms and not on others.
> */
> - BUILD_BUG_ON(20 - 1 /* for O_RDONLY being 0 */ != HWEIGHT32(
> + BUILD_BUG_ON(23 - 1 /* for O_RDONLY being 0 */ != HWEIGHT32(
> O_RDONLY | O_WRONLY | O_RDWR |
> O_CREAT | O_EXCL | O_NOCTTY |
> O_TRUNC | O_APPEND | /* O_NONBLOCK | */
> __O_SYNC | O_DSYNC | FASYNC |
> O_DIRECT | O_LARGEFILE | O_DIRECTORY |
> O_NOFOLLOW | O_NOATIME | O_CLOEXEC |
> - __FMODE_EXEC | O_PATH | __O_TMPFILE
> + __FMODE_EXEC | O_PATH | __O_TMPFILE |
> + O_DENYREAD | O_DENYWRITE | O_DENYDELETE
> ));
>
> fasync_cache = kmem_cache_create("fasync_cache",
> diff --git a/fs/locks.c b/fs/locks.c
> index 92a0f0a..ffde4d4 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -708,20 +708,73 @@ static int posix_locks_conflict(struct file_lock *caller_fl, struct file_lock *s
> return (locks_conflict(caller_fl, sys_fl));
> }
>
> -/* Determine if lock sys_fl blocks lock caller_fl. FLOCK specific
> - * checking before calling the locks_conflict().
> +static unsigned int
> +deny_flags_to_cmd(unsigned int flags)
> +{
> + unsigned int cmd = LOCK_MAND;
> +
> + if (!(flags & O_DENYREAD))
> + cmd |= LOCK_READ;
> + if (!(flags & O_DENYWRITE))
> + cmd |= LOCK_WRITE;
> +
> + return cmd;
> +}
> +
> +/*
> + * locks_mand_conflict - Determine if there's a share reservation conflict
> + * @caller_fl: lock we're attempting to acquire
> + * @sys_fl: lock already present on system that we're checking against
> + *
> + * Check to see if there's a share_reservation conflict. LOCK_READ/LOCK_WRITE
> + * tell us whether the reservation allows other readers and writers.
> + */
> +static int
> +locks_mand_conflict(struct file_lock *caller_fl, struct file_lock *sys_fl)
> +{
> + unsigned char caller_type = caller_fl->fl_type;
> + unsigned char sys_type = sys_fl->fl_type;
> + fmode_t caller_fmode = caller_fl->fl_file->f_mode;
> + fmode_t sys_fmode = sys_fl->fl_file->f_mode;
> +
> + /* they can only conflict if FS is mounted with MS_SHARELOCK */
> + if (!IS_SHARELOCK(caller_fl->fl_file->f_path.dentry->d_inode))
> + return 0;
> +
> + /* they can only conflict if they're both LOCK_MAND */
> + if (!(caller_type & LOCK_MAND) || !(sys_type & LOCK_MAND))
> + return 0;
> +
> + if (!(caller_type & LOCK_READ) && (sys_fmode & FMODE_READ))
> + return 1;
> + if (!(caller_type & LOCK_WRITE) && (sys_fmode & FMODE_WRITE))
> + return 1;
> + if (!(sys_type & LOCK_READ) && (caller_fmode & FMODE_READ))
> + return 1;
> + if (!(sys_type & LOCK_WRITE) && (caller_fmode & FMODE_WRITE))
> + return 1;
> +
> + return 0;
> +}
> +
> +/*
> + * Determine if lock sys_fl blocks lock caller_fl. FLOCK specific checking
> + * before calling the locks_conflict().
> */
> static int flock_locks_conflict(struct file_lock *caller_fl, struct file_lock *sys_fl)
> {
> - /* FLOCK locks referring to the same filp do not conflict with
> + if (!IS_FLOCK(sys_fl))
> + return 0;
> + if ((caller_fl->fl_type & LOCK_MAND) || (sys_fl->fl_type & LOCK_MAND))
> + return locks_mand_conflict(caller_fl, sys_fl);

nit: Seems like the above could be optimized a little. You know that
locks_mand_conflict is only relevant if both are LOCK_MAND, and one of
the first things that locks_mand_conflict does is to check that both
have that set.

> + /*
> + * FLOCK locks referring to the same filp do not conflict with
> * each other.
> */
> - if (!IS_FLOCK(sys_fl) || (caller_fl->fl_file == sys_fl->fl_file))
> - return (0);
> - if ((caller_fl->fl_type & LOCK_MAND) || (sys_fl->fl_type & LOCK_MAND))
> + if (caller_fl->fl_file == sys_fl->fl_file)
> return 0;
>
> - return (locks_conflict(caller_fl, sys_fl));
> + return locks_conflict(caller_fl, sys_fl);
> }
>
> void
> @@ -888,6 +941,36 @@ out:
> return error;
> }
>
> +/*
> + * Determine if a file is allowed to be opened with specified access and share
> + * modes. Lock the file and return 0 if checks passed, otherwise return
> + * -ESHAREDENIED.
> + */
> +int
> +sharelock_lock_file(struct file *filp)
> +{
> + struct file_lock *lock;
> + int error = 0;
> +
> + if (!IS_SHARELOCK(filp->f_path.dentry->d_inode))
> + return error;
> +
> + /* Disable O_DENYDELETE support for now */
> + if (filp->f_flags & O_DENYDELETE)
> + return -EINVAL;
> +
> + error = flock_make_lock(filp, &lock, deny_flags_to_cmd(filp->f_flags));
> + if (error)
> + return error;
> +
> + error = flock_lock_file(filp, lock);
> + if (error == -EAGAIN)
> + error = -ESHAREDENIED;
> +
> + locks_free_lock(lock);
> + return error;
> +}
> +
> static int __posix_lock_file(struct inode *inode, struct file_lock *request, struct file_lock *conflock)
> {
> struct file_lock *fl;
> diff --git a/fs/namei.c b/fs/namei.c
> index 3531dee..2b741a1 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -2725,9 +2725,14 @@ static int atomic_open(struct nameidata *nd, struct dentry *dentry,
> acc_mode = MAY_OPEN;
> }
> error = may_open(&file->f_path, acc_mode, open_flag);
> - if (error)
> + if (error) {
> fput(file);
> + goto out;
> + }
>
> + error = sharelock_lock_file(file);
> + if (error)
> + fput(file);
> out:
> dput(dentry);
> return error;
> @@ -2919,6 +2924,40 @@ retry_lookup:
> }
> mutex_lock(&dir->d_inode->i_mutex);
> error = lookup_open(nd, path, file, op, got_write, opened);
> +
> + /*
> + * For sharelock mounts if a file was created but not opened, we need
> + * to keep parent i_mutex until we finish the open to prevent races when
> + * somebody opens newly created by us file and locks it with a sharelock
> + * before we open it.
> + */
> + if (IS_SHARELOCK(dir->d_inode) && error > 0 && *opened & FILE_CREATED) {
> + /* Don't check for write permission, don't truncate */
> + open_flag &= ~O_TRUNC;
> + will_truncate = false;
> + acc_mode = MAY_OPEN;
> + path_to_nameidata(path, nd);
> +
> + error = may_open(&nd->path, acc_mode, open_flag);
> + if (error) {
> + mutex_unlock(&dir->d_inode->i_mutex);
> + goto out;
> + }
> + file->f_path.mnt = nd->path.mnt;
> + error = finish_open(file, nd->path.dentry, NULL, opened);
> + if (error) {
> + mutex_unlock(&dir->d_inode->i_mutex);
> + if (error == -EOPENSTALE)
> + goto stale_open;
> + goto out;
> + }
> + error = sharelock_lock_file(file);
> + mutex_unlock(&dir->d_inode->i_mutex);
> + if (error)
> + goto exit_fput;
> + goto opened;
> + }
> +
> mutex_unlock(&dir->d_inode->i_mutex);
>
> if (error <= 0) {
> @@ -3034,6 +3073,18 @@ finish_open_created:
> goto stale_open;
> goto out;
> }
> +
> + if (IS_SHARELOCK(dir->d_inode)) {
> + /*
> + * Lock parent i_mutex to prevent races with sharelocks on
> + * newly created files.
> + */
> + mutex_lock(&dir->d_inode->i_mutex);
> + error = sharelock_lock_file(file);
> + mutex_unlock(&dir->d_inode->i_mutex);
> + if (error)
> + goto exit_fput;
> + }
> opened:
> error = open_check_o_direct(file);
> if (error)
> diff --git a/fs/proc_namespace.c b/fs/proc_namespace.c
> index 439406e..dd374d4 100644
> --- a/fs/proc_namespace.c
> +++ b/fs/proc_namespace.c
> @@ -44,6 +44,7 @@ static int show_sb_opts(struct seq_file *m, struct super_block *sb)
> { MS_SYNCHRONOUS, ",sync" },
> { MS_DIRSYNC, ",dirsync" },
> { MS_MANDLOCK, ",mand" },
> + { MS_SHARELOCK, ",sharelock" },
> { 0, NULL }
> };
> const struct proc_fs_info *fs_infop;
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 121f11f..aa061ca 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1029,6 +1029,7 @@ extern int vfs_setlease(struct file *, long, struct file_lock **);
> extern int lease_modify(struct file_lock **, int);
> extern int lock_may_read(struct inode *, loff_t start, unsigned long count);
> extern int lock_may_write(struct inode *, loff_t start, unsigned long count);
> +extern int sharelock_lock_file(struct file *);
> #else /* !CONFIG_FILE_LOCKING */
> static inline int fcntl_getlk(struct file *file, struct flock __user *user)
> {
> @@ -1169,6 +1170,12 @@ static inline int lock_may_write(struct inode *inode, loff_t start,
> {
> return 1;
> }
> +
> +static inline int sharelock_lock_file(struct file *filp)
> +{
> + return 0;
> +}
> +
> #endif /* !CONFIG_FILE_LOCKING */
>
>
> @@ -1675,6 +1682,7 @@ struct super_operations {
> #define IS_PRIVATE(inode) ((inode)->i_flags & S_PRIVATE)
> #define IS_IMA(inode) ((inode)->i_flags & S_IMA)
> #define IS_AUTOMOUNT(inode) ((inode)->i_flags & S_AUTOMOUNT)
> +#define IS_SHARELOCK(inode) __IS_FLG(inode, MS_SHARELOCK)
> #define IS_NOSEC(inode) ((inode)->i_flags & S_NOSEC)
>
> /*
> diff --git a/include/uapi/asm-generic/errno.h b/include/uapi/asm-generic/errno.h
> index 1e1ea6e..aff869c 100644
> --- a/include/uapi/asm-generic/errno.h
> +++ b/include/uapi/asm-generic/errno.h
> @@ -110,4 +110,6 @@
>
> #define EHWPOISON 133 /* Memory page has hardware error */
>
> +#define ESHAREDENIED 134 /* File is locked with a sharelock */
> +
> #endif
> diff --git a/include/uapi/asm-generic/fcntl.h b/include/uapi/asm-generic/fcntl.h
> index 95e46c8..9881cfe 100644
> --- a/include/uapi/asm-generic/fcntl.h
> +++ b/include/uapi/asm-generic/fcntl.h
> @@ -92,6 +92,17 @@
> #define O_TMPFILE (__O_TMPFILE | O_DIRECTORY)
> #define O_TMPFILE_MASK (__O_TMPFILE | O_DIRECTORY | O_CREAT)
>
> +#ifndef O_DENYREAD
> +#define O_DENYREAD 040000000 /* Do not permit read access */
> +#endif
> +/* FMODE_NONOTIFY 0100000000 */
> +#ifndef O_DENYWRITE
> +#define O_DENYWRITE 0200000000 /* Do not permit write access */
> +#endif
> +#ifndef O_DENYDELETE
> +#define O_DENYDELETE 0400000000 /* Do not permit delete or rename */
> +#endif
> +

One thing to consider: We found with the addition of O_TMPFILE that the
open() api is not particularly helpful when it comes to informing
appications when a flag isn't supported:

http://lwn.net/Articles/562294/

...having a plan to cope with that here would be best. How can an
application determine at runtime that O_DENY* actually *work*? It may
be best to step back and consider a new syscall for this (open2() ?).

> #ifndef O_NDELAY
> #define O_NDELAY O_NONBLOCK
> #endif
> diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
> index 6c28b61..11f0ecf 100644
> --- a/include/uapi/linux/fs.h
> +++ b/include/uapi/linux/fs.h
> @@ -86,6 +86,7 @@ struct inodes_stat_t {
> #define MS_KERNMOUNT (1<<22) /* this is a kern_mount call */
> #define MS_I_VERSION (1<<23) /* Update inode I_version field */
> #define MS_STRICTATIME (1<<24) /* Always perform atime updates */
> +#define MS_SHARELOCK (1<<25) /* Allow share locks on an FS */
>
> /* These sb flags are internal to the kernel */
> #define MS_NOSEC (1<<28)


--
Jeff Layton <[email protected]>

2014-02-01 13:57:26

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH v7 1/7] VFS: Introduce new O_DENY* open flags

On Fri, 17 Jan 2014 18:18:47 +0000
One Thousand Gnomes <[email protected]> wrote:

> > +#define ESHAREDENIED 258 /* File is locked with a sharelock */
>
> Have you prepared C library patches to match this ?
>
> (and why not just use EPERM, it has the meaning you want already)
>

Tough call...

On the one hand, ESHAREDENIED is a distinct error code so an
application has the ability to determine what happened when an open or
unlink fails.

OTOH, a lot of applications won't understand ESHAREDENIED and may barf
on it. Those apps might handle EPERM better.

I'm not sure what the right approach is there...

>
> > + * Check to see if there's a share_reservation conflict. LOCK_READ/LOCK_WRITE
> > + * tell us whether the reservation allows other readers and writers.
> > + */
> > +static int
> > +locks_mand_conflict(struct file_lock *caller_fl, struct file_lock *sys_fl)
> > +{
>
> Shouldn't this also check for CAP_SYS_DAC or some similar permission so
> that root can override such a mess (eg to fix full disks in an
> emergency) ?
>
>

Agreed. This needs a mechanism that allows you to override it, IMO.

CAP_DAC_OVERRIDE doesn't seem quite like the right thing since this
isn't dealing with permissions, per-se. A new capability bit may even be
warranted.

> > +
> > + /*
> > + * For sharelock mounts if a file was created but not opened, we need
> > + * to keep parent i_mutex until we finish the open to prevent races when
> > + * somebody opens newly created by us file and locks it with a sharelock
> > + * before we open it.
> > + */
> > + if (IS_SHARELOCK(dir->d_inode) && error > 0 && *opened & FILE_CREATED) {
> > + /* Don't check for write permission, don't truncate */
> > + open_flag &= ~O_TRUNC;
> > + will_truncate = false;
> > + acc_mode = MAY_OPEN;
> > + path_to_nameidata(path, nd);
> > +
> > + error = may_open(&nd->path, acc_mode, open_flag);
> > + if (error) {
> > + mutex_unlock(&dir->d_inode->i_mutex);
> > + goto out;
> > + }
> > + file->f_path.mnt = nd->path.mnt;
> > + error = finish_open(file, nd->path.dentry, NULL, opened);
> > + if (error) {
> > + mutex_unlock(&dir->d_inode->i_mutex);
> > + if (error == -EOPENSTALE)
> > + goto stale_open;
> > + goto out;
> > + }
> > + error = sharelock_lock_file(file);
> > + mutex_unlock(&dir->d_inode->i_mutex);
> > + if (error)
> > + goto exit_fput;
> > + goto opened;
> > + }
> > +
> > mutex_unlock(&dir->d_inode->i_mutex);
>
> What stops the file system changing mount flags via a remount between
> these two ?
>
> >
> > if (error <= 0) {
> > @@ -3034,6 +3073,18 @@ finish_open_created:
> > goto stale_open;
> > goto out;
> > }
> > +
> > + if (IS_SHARELOCK(dir->d_inode)) {
> > + /*
> > + * Lock parent i_mutex to prevent races with sharelocks on
> > + * newly created files.
> > + */
> > + mutex_lock(&dir->d_inode->i_mutex);
> > + error = sharelock_lock_file(file);
> > + mutex_unlock(&dir->d_inode->i_mutex);
> > + if (error)
> > + goto exit_fput;
> > + }
> > opened:
> --
> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


--
Jeff Layton <[email protected]>

2014-02-04 12:03:21

by Pavel Shilovsky

[permalink] [raw]
Subject: Re: [PATCH v7 1/7] VFS: Introduce new O_DENY* open flags

2014-02-01 Jeff Layton <[email protected]>:
> On Fri, 17 Jan 2014 14:07:06 +0400
> Pavel Shilovsky <[email protected]> wrote:
>
>> This patch adds 3 flags:
>> 1) O_DENYREAD that doesn't permit read access,
>> 2) O_DENYWRITE that doesn't permit write access,
>> 3) O_DENYDELETE that doesn't permit delete or rename.
>>
>> Network filesystems CIFS, SMB2.0, SMB3.0 and NFSv4 have such flags -
>> this change can benefit cifs and nfs modules as well as Samba and
>> NFS file servers that export the same directory for Windows clients,
>> or Wine applications that access the same files simultaneously.
>>
>> These flags are only take affect for opens on mounts with new sharelock
>> option. They are translated to flock's flags:
>>
>> !O_DENYREAD -> LOCK_READ | LOCK_MAND
>> !O_DENYWRITE -> LOCK_WRITE | LOCK_MAND
>>
>> and set through flock_lock_file on a file. If the file can't be locked
>> due conflicts with another open with O_DENY* flags, a new -ESHAREDENIED
>> error code is returned.
>>
>> Create codepath is slightly changed to prevent data races on newly
>> created files: when open with O_CREAT can return -ESHAREDENIED error
>> for successfully created files due to a sharelock set by another task.
>>
>> Temporary disable O_DENYDELETE support - will enable it in further
>> patches.
>>
>> Signed-off-by: Pavel Shilovsky <[email protected]>
>> ---
>> arch/alpha/include/uapi/asm/errno.h | 2 +
>> arch/alpha/include/uapi/asm/fcntl.h | 3 ++
>> arch/mips/include/uapi/asm/errno.h | 2 +
>> arch/parisc/include/uapi/asm/errno.h | 2 +
>> arch/parisc/include/uapi/asm/fcntl.h | 3 ++
>> arch/sparc/include/uapi/asm/errno.h | 2 +
>> arch/sparc/include/uapi/asm/fcntl.h | 3 ++
>> fs/fcntl.c | 5 +-
>> fs/locks.c | 97 +++++++++++++++++++++++++++++++---
>> fs/namei.c | 53 ++++++++++++++++++-
>> fs/proc_namespace.c | 1 +
>> include/linux/fs.h | 8 +++
>> include/uapi/asm-generic/errno.h | 2 +
>> include/uapi/asm-generic/fcntl.h | 11 ++++
>> include/uapi/linux/fs.h | 1 +
>> 15 files changed, 185 insertions(+), 10 deletions(-)
>>
>
> You might consider breaking this patch into two. One patch that makes
> LOCK_MAND locks actually work and that adds MS_SHARELOCK, and one patch
> that hooks that up to open(). Given the locking involved with the
> i_mutex it would be best to present this as a series of small,
> incremental changes.

Good point. So, we can break it into 2:
1) make flock actually work with LOCK_MAND on MS_SHARELOCK mounts,
2) replace flock+LOCK_MAND with open+O_DENY* flags.


>> diff --git a/arch/alpha/include/uapi/asm/errno.h b/arch/alpha/include/uapi/asm/errno.h
>> index 17f92aa..953a6d6 100644
>> --- a/arch/alpha/include/uapi/asm/errno.h
>> +++ b/arch/alpha/include/uapi/asm/errno.h
>> @@ -124,4 +124,6 @@
>>
>> #define EHWPOISON 139 /* Memory page has hardware error */
>>
>> +#define ESHAREDENIED 140 /* File is locked with a sharelock */
>> +
>> #endif
>> diff --git a/arch/alpha/include/uapi/asm/fcntl.h b/arch/alpha/include/uapi/asm/fcntl.h
>> index 09f49a6..265344b 100644
>> --- a/arch/alpha/include/uapi/asm/fcntl.h
>> +++ b/arch/alpha/include/uapi/asm/fcntl.h
>> @@ -33,6 +33,9 @@
>>
>> #define O_PATH 040000000
>> #define __O_TMPFILE 0100000000
>> +#define O_DENYREAD 0200000000 /* Do not permit read access */
>> +#define O_DENYWRITE 0400000000 /* Do not permit write access */
>> +#define O_DENYDELETE 01000000000 /* Do not permit delete or rename */
>>
>> #define F_GETLK 7
>> #define F_SETLK 8
>> diff --git a/arch/mips/include/uapi/asm/errno.h b/arch/mips/include/uapi/asm/errno.h
>> index 02d645d..f1a4068 100644
>> --- a/arch/mips/include/uapi/asm/errno.h
>> +++ b/arch/mips/include/uapi/asm/errno.h
>> @@ -123,6 +123,8 @@
>>
>> #define EHWPOISON 168 /* Memory page has hardware error */
>>
>> +#define ESHAREDENIED 169 /* File is locked with a sharelock */
>> +
>> #define EDQUOT 1133 /* Quota exceeded */
>>
>>
>> diff --git a/arch/parisc/include/uapi/asm/errno.h b/arch/parisc/include/uapi/asm/errno.h
>> index f3a8aa5..654c232 100644
>> --- a/arch/parisc/include/uapi/asm/errno.h
>> +++ b/arch/parisc/include/uapi/asm/errno.h
>> @@ -124,4 +124,6 @@
>>
>> #define EHWPOISON 257 /* Memory page has hardware error */
>>
>> +#define ESHAREDENIED 258 /* File is locked with a sharelock */
>> +
>> #endif
>> diff --git a/arch/parisc/include/uapi/asm/fcntl.h b/arch/parisc/include/uapi/asm/fcntl.h
>> index 34a46cb..5865964 100644
>> --- a/arch/parisc/include/uapi/asm/fcntl.h
>> +++ b/arch/parisc/include/uapi/asm/fcntl.h
>> @@ -21,6 +21,9 @@
>>
>> #define O_PATH 020000000
>> #define __O_TMPFILE 040000000
>> +#define O_DENYREAD 0200000000 /* Do not permit read access */
>> +#define O_DENYWRITE 0400000000 /* Do not permit write access */
>> +#define O_DENYDELETE 01000000000 /* Do not permit delete or rename */
>>
>> #define F_GETLK64 8
>> #define F_SETLK64 9
>> diff --git a/arch/sparc/include/uapi/asm/errno.h b/arch/sparc/include/uapi/asm/errno.h
>> index 20423e17..fe339b5 100644
>> --- a/arch/sparc/include/uapi/asm/errno.h
>> +++ b/arch/sparc/include/uapi/asm/errno.h
>> @@ -114,4 +114,6 @@
>>
>> #define EHWPOISON 135 /* Memory page has hardware error */
>>
>> +#define ESHAREDENIED 136 /* File is locked with a sharelock */
>> +
>> #endif
>> diff --git a/arch/sparc/include/uapi/asm/fcntl.h b/arch/sparc/include/uapi/asm/fcntl.h
>> index 7e8ace5..ab68170 100644
>> --- a/arch/sparc/include/uapi/asm/fcntl.h
>> +++ b/arch/sparc/include/uapi/asm/fcntl.h
>> @@ -36,6 +36,9 @@
>>
>> #define O_PATH 0x1000000
>> #define __O_TMPFILE 0x2000000
>> +#define O_DENYREAD 0x4000000 /* Do not permit read access */
>> +#define O_DENYWRITE 0x8000000 /* Do not permit write access */
>> +#define O_DENYDELETE 0x10000000 /* Do not permit delete or rename */
>>
>
> It'd probably be best to add O_DENYDELETE in a separate patch, rather
> than disabling it temporarily.

Agree.

>
>> #define F_GETOWN 5 /* for sockets. */
>> #define F_SETOWN 6 /* for sockets. */
>> diff --git a/fs/fcntl.c b/fs/fcntl.c
>> index ef68665..3f85887 100644
>> --- a/fs/fcntl.c
>> +++ b/fs/fcntl.c
>> @@ -729,14 +729,15 @@ static int __init fcntl_init(void)
>> * Exceptions: O_NONBLOCK is a two bit define on parisc; O_NDELAY
>> * is defined as O_NONBLOCK on some platforms and not on others.
>> */
>> - BUILD_BUG_ON(20 - 1 /* for O_RDONLY being 0 */ != HWEIGHT32(
>> + BUILD_BUG_ON(23 - 1 /* for O_RDONLY being 0 */ != HWEIGHT32(
>> O_RDONLY | O_WRONLY | O_RDWR |
>> O_CREAT | O_EXCL | O_NOCTTY |
>> O_TRUNC | O_APPEND | /* O_NONBLOCK | */
>> __O_SYNC | O_DSYNC | FASYNC |
>> O_DIRECT | O_LARGEFILE | O_DIRECTORY |
>> O_NOFOLLOW | O_NOATIME | O_CLOEXEC |
>> - __FMODE_EXEC | O_PATH | __O_TMPFILE
>> + __FMODE_EXEC | O_PATH | __O_TMPFILE |
>> + O_DENYREAD | O_DENYWRITE | O_DENYDELETE
>> ));
>>
>> fasync_cache = kmem_cache_create("fasync_cache",
>> diff --git a/fs/locks.c b/fs/locks.c
>> index 92a0f0a..ffde4d4 100644
>> --- a/fs/locks.c
>> +++ b/fs/locks.c
>> @@ -708,20 +708,73 @@ static int posix_locks_conflict(struct file_lock *caller_fl, struct file_lock *s
>> return (locks_conflict(caller_fl, sys_fl));
>> }
>>
>> -/* Determine if lock sys_fl blocks lock caller_fl. FLOCK specific
>> - * checking before calling the locks_conflict().
>> +static unsigned int
>> +deny_flags_to_cmd(unsigned int flags)
>> +{
>> + unsigned int cmd = LOCK_MAND;
>> +
>> + if (!(flags & O_DENYREAD))
>> + cmd |= LOCK_READ;
>> + if (!(flags & O_DENYWRITE))
>> + cmd |= LOCK_WRITE;
>> +
>> + return cmd;
>> +}
>> +
>> +/*
>> + * locks_mand_conflict - Determine if there's a share reservation conflict
>> + * @caller_fl: lock we're attempting to acquire
>> + * @sys_fl: lock already present on system that we're checking against
>> + *
>> + * Check to see if there's a share_reservation conflict. LOCK_READ/LOCK_WRITE
>> + * tell us whether the reservation allows other readers and writers.
>> + */
>> +static int
>> +locks_mand_conflict(struct file_lock *caller_fl, struct file_lock *sys_fl)
>> +{
>> + unsigned char caller_type = caller_fl->fl_type;
>> + unsigned char sys_type = sys_fl->fl_type;
>> + fmode_t caller_fmode = caller_fl->fl_file->f_mode;
>> + fmode_t sys_fmode = sys_fl->fl_file->f_mode;
>> +
>> + /* they can only conflict if FS is mounted with MS_SHARELOCK */
>> + if (!IS_SHARELOCK(caller_fl->fl_file->f_path.dentry->d_inode))
>> + return 0;
>> +
>> + /* they can only conflict if they're both LOCK_MAND */
>> + if (!(caller_type & LOCK_MAND) || !(sys_type & LOCK_MAND))
>> + return 0;
>> +
>> + if (!(caller_type & LOCK_READ) && (sys_fmode & FMODE_READ))
>> + return 1;
>> + if (!(caller_type & LOCK_WRITE) && (sys_fmode & FMODE_WRITE))
>> + return 1;
>> + if (!(sys_type & LOCK_READ) && (caller_fmode & FMODE_READ))
>> + return 1;
>> + if (!(sys_type & LOCK_WRITE) && (caller_fmode & FMODE_WRITE))
>> + return 1;
>> +
>> + return 0;
>> +}
>> +
>> +/*
>> + * Determine if lock sys_fl blocks lock caller_fl. FLOCK specific checking
>> + * before calling the locks_conflict().
>> */
>> static int flock_locks_conflict(struct file_lock *caller_fl, struct file_lock *sys_fl)
>> {
>> - /* FLOCK locks referring to the same filp do not conflict with
>> + if (!IS_FLOCK(sys_fl))
>> + return 0;
>> + if ((caller_fl->fl_type & LOCK_MAND) || (sys_fl->fl_type & LOCK_MAND))
>> + return locks_mand_conflict(caller_fl, sys_fl);
>
> nit: Seems like the above could be optimized a little. You know that
> locks_mand_conflict is only relevant if both are LOCK_MAND, and one of
> the first things that locks_mand_conflict does is to check that both
> have that set.

ok.

>
>> + /*
>> + * FLOCK locks referring to the same filp do not conflict with
>> * each other.
>> */
>> - if (!IS_FLOCK(sys_fl) || (caller_fl->fl_file == sys_fl->fl_file))
>> - return (0);
>> - if ((caller_fl->fl_type & LOCK_MAND) || (sys_fl->fl_type & LOCK_MAND))
>> + if (caller_fl->fl_file == sys_fl->fl_file)
>> return 0;
>>
>> - return (locks_conflict(caller_fl, sys_fl));
>> + return locks_conflict(caller_fl, sys_fl);
>> }
>>
>> void
>> @@ -888,6 +941,36 @@ out:
>> return error;
>> }
>>
>> +/*
>> + * Determine if a file is allowed to be opened with specified access and share
>> + * modes. Lock the file and return 0 if checks passed, otherwise return
>> + * -ESHAREDENIED.
>> + */
>> +int
>> +sharelock_lock_file(struct file *filp)
>> +{
>> + struct file_lock *lock;
>> + int error = 0;
>> +
>> + if (!IS_SHARELOCK(filp->f_path.dentry->d_inode))
>> + return error;
>> +
>> + /* Disable O_DENYDELETE support for now */
>> + if (filp->f_flags & O_DENYDELETE)
>> + return -EINVAL;
>> +
>> + error = flock_make_lock(filp, &lock, deny_flags_to_cmd(filp->f_flags));
>> + if (error)
>> + return error;
>> +
>> + error = flock_lock_file(filp, lock);
>> + if (error == -EAGAIN)
>> + error = -ESHAREDENIED;
>> +
>> + locks_free_lock(lock);
>> + return error;
>> +}
>> +
>> static int __posix_lock_file(struct inode *inode, struct file_lock *request, struct file_lock *conflock)
>> {
>> struct file_lock *fl;
>> diff --git a/fs/namei.c b/fs/namei.c
>> index 3531dee..2b741a1 100644
>> --- a/fs/namei.c
>> +++ b/fs/namei.c
>> @@ -2725,9 +2725,14 @@ static int atomic_open(struct nameidata *nd, struct dentry *dentry,
>> acc_mode = MAY_OPEN;
>> }
>> error = may_open(&file->f_path, acc_mode, open_flag);
>> - if (error)
>> + if (error) {
>> fput(file);
>> + goto out;
>> + }
>>
>> + error = sharelock_lock_file(file);
>> + if (error)
>> + fput(file);
>> out:
>> dput(dentry);
>> return error;
>> @@ -2919,6 +2924,40 @@ retry_lookup:
>> }
>> mutex_lock(&dir->d_inode->i_mutex);
>> error = lookup_open(nd, path, file, op, got_write, opened);
>> +
>> + /*
>> + * For sharelock mounts if a file was created but not opened, we need
>> + * to keep parent i_mutex until we finish the open to prevent races when
>> + * somebody opens newly created by us file and locks it with a sharelock
>> + * before we open it.
>> + */
>> + if (IS_SHARELOCK(dir->d_inode) && error > 0 && *opened & FILE_CREATED) {
>> + /* Don't check for write permission, don't truncate */
>> + open_flag &= ~O_TRUNC;
>> + will_truncate = false;
>> + acc_mode = MAY_OPEN;
>> + path_to_nameidata(path, nd);
>> +
>> + error = may_open(&nd->path, acc_mode, open_flag);
>> + if (error) {
>> + mutex_unlock(&dir->d_inode->i_mutex);
>> + goto out;
>> + }
>> + file->f_path.mnt = nd->path.mnt;
>> + error = finish_open(file, nd->path.dentry, NULL, opened);
>> + if (error) {
>> + mutex_unlock(&dir->d_inode->i_mutex);
>> + if (error == -EOPENSTALE)
>> + goto stale_open;
>> + goto out;
>> + }
>> + error = sharelock_lock_file(file);
>> + mutex_unlock(&dir->d_inode->i_mutex);
>> + if (error)
>> + goto exit_fput;
>> + goto opened;
>> + }
>> +
>> mutex_unlock(&dir->d_inode->i_mutex);
>>
>> if (error <= 0) {
>> @@ -3034,6 +3073,18 @@ finish_open_created:
>> goto stale_open;
>> goto out;
>> }
>> +
>> + if (IS_SHARELOCK(dir->d_inode)) {
>> + /*
>> + * Lock parent i_mutex to prevent races with sharelocks on
>> + * newly created files.
>> + */
>> + mutex_lock(&dir->d_inode->i_mutex);
>> + error = sharelock_lock_file(file);
>> + mutex_unlock(&dir->d_inode->i_mutex);
>> + if (error)
>> + goto exit_fput;
>> + }
>> opened:
>> error = open_check_o_direct(file);
>> if (error)
>> diff --git a/fs/proc_namespace.c b/fs/proc_namespace.c
>> index 439406e..dd374d4 100644
>> --- a/fs/proc_namespace.c
>> +++ b/fs/proc_namespace.c
>> @@ -44,6 +44,7 @@ static int show_sb_opts(struct seq_file *m, struct super_block *sb)
>> { MS_SYNCHRONOUS, ",sync" },
>> { MS_DIRSYNC, ",dirsync" },
>> { MS_MANDLOCK, ",mand" },
>> + { MS_SHARELOCK, ",sharelock" },
>> { 0, NULL }
>> };
>> const struct proc_fs_info *fs_infop;
>> diff --git a/include/linux/fs.h b/include/linux/fs.h
>> index 121f11f..aa061ca 100644
>> --- a/include/linux/fs.h
>> +++ b/include/linux/fs.h
>> @@ -1029,6 +1029,7 @@ extern int vfs_setlease(struct file *, long, struct file_lock **);
>> extern int lease_modify(struct file_lock **, int);
>> extern int lock_may_read(struct inode *, loff_t start, unsigned long count);
>> extern int lock_may_write(struct inode *, loff_t start, unsigned long count);
>> +extern int sharelock_lock_file(struct file *);
>> #else /* !CONFIG_FILE_LOCKING */
>> static inline int fcntl_getlk(struct file *file, struct flock __user *user)
>> {
>> @@ -1169,6 +1170,12 @@ static inline int lock_may_write(struct inode *inode, loff_t start,
>> {
>> return 1;
>> }
>> +
>> +static inline int sharelock_lock_file(struct file *filp)
>> +{
>> + return 0;
>> +}
>> +
>> #endif /* !CONFIG_FILE_LOCKING */
>>
>>
>> @@ -1675,6 +1682,7 @@ struct super_operations {
>> #define IS_PRIVATE(inode) ((inode)->i_flags & S_PRIVATE)
>> #define IS_IMA(inode) ((inode)->i_flags & S_IMA)
>> #define IS_AUTOMOUNT(inode) ((inode)->i_flags & S_AUTOMOUNT)
>> +#define IS_SHARELOCK(inode) __IS_FLG(inode, MS_SHARELOCK)
>> #define IS_NOSEC(inode) ((inode)->i_flags & S_NOSEC)
>>
>> /*
>> diff --git a/include/uapi/asm-generic/errno.h b/include/uapi/asm-generic/errno.h
>> index 1e1ea6e..aff869c 100644
>> --- a/include/uapi/asm-generic/errno.h
>> +++ b/include/uapi/asm-generic/errno.h
>> @@ -110,4 +110,6 @@
>>
>> #define EHWPOISON 133 /* Memory page has hardware error */
>>
>> +#define ESHAREDENIED 134 /* File is locked with a sharelock */
>> +
>> #endif
>> diff --git a/include/uapi/asm-generic/fcntl.h b/include/uapi/asm-generic/fcntl.h
>> index 95e46c8..9881cfe 100644
>> --- a/include/uapi/asm-generic/fcntl.h
>> +++ b/include/uapi/asm-generic/fcntl.h
>> @@ -92,6 +92,17 @@
>> #define O_TMPFILE (__O_TMPFILE | O_DIRECTORY)
>> #define O_TMPFILE_MASK (__O_TMPFILE | O_DIRECTORY | O_CREAT)
>>
>> +#ifndef O_DENYREAD
>> +#define O_DENYREAD 040000000 /* Do not permit read access */
>> +#endif
>> +/* FMODE_NONOTIFY 0100000000 */
>> +#ifndef O_DENYWRITE
>> +#define O_DENYWRITE 0200000000 /* Do not permit write access */
>> +#endif
>> +#ifndef O_DENYDELETE
>> +#define O_DENYDELETE 0400000000 /* Do not permit delete or rename */
>> +#endif
>> +
>
> One thing to consider: We found with the addition of O_TMPFILE that the
> open() api is not particularly helpful when it comes to informing
> appications when a flag isn't supported:
>
> http://lwn.net/Articles/562294/
>
> ...having a plan to cope with that here would be best. How can an
> application determine at runtime that O_DENY* actually *work*? It may
> be best to step back and consider a new syscall for this (open2() ?).
>

So, consider we added new syscall:

opendm(filename, flags, mode, deny_mode)
{
return open(filename, flags | denymode2openflags(deny_mode), mode)
}

where deny_mode can be DMODE_NONE (0), DMODE_READ (1), DMODE_WRITE(2)
and DMODE_RDWR(3) (similar to FMODE_* values).

We have open and opendm that act actually in the same manner for
mounts without MS_SHARELOCK. For mounts with MS_SHARELOCK open is like
opendm with DMODE_NONE. Open flags O_DENY* are for internal use only.

Is it what you suggest?

--
Best regards,
Pavel Shilovsky.

2014-02-04 12:21:32

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH v7 1/7] VFS: Introduce new O_DENY* open flags

On Tue, 4 Feb 2014 16:03:14 +0400
Pavel Shilovsky <[email protected]> wrote:

> 2014-02-01 Jeff Layton <[email protected]>:
> > On Fri, 17 Jan 2014 14:07:06 +0400
> > Pavel Shilovsky <[email protected]> wrote:
> >
> >> This patch adds 3 flags:
> >> 1) O_DENYREAD that doesn't permit read access,
> >> 2) O_DENYWRITE that doesn't permit write access,
> >> 3) O_DENYDELETE that doesn't permit delete or rename.
> >>
> >> Network filesystems CIFS, SMB2.0, SMB3.0 and NFSv4 have such flags -
> >> this change can benefit cifs and nfs modules as well as Samba and
> >> NFS file servers that export the same directory for Windows clients,
> >> or Wine applications that access the same files simultaneously.
> >>
> >> These flags are only take affect for opens on mounts with new sharelock
> >> option. They are translated to flock's flags:
> >>
> >> !O_DENYREAD -> LOCK_READ | LOCK_MAND
> >> !O_DENYWRITE -> LOCK_WRITE | LOCK_MAND
> >>
> >> and set through flock_lock_file on a file. If the file can't be locked
> >> due conflicts with another open with O_DENY* flags, a new -ESHAREDENIED
> >> error code is returned.
> >>
> >> Create codepath is slightly changed to prevent data races on newly
> >> created files: when open with O_CREAT can return -ESHAREDENIED error
> >> for successfully created files due to a sharelock set by another task.
> >>
> >> Temporary disable O_DENYDELETE support - will enable it in further
> >> patches.
> >>
> >> Signed-off-by: Pavel Shilovsky <[email protected]>
> >> ---
> >> arch/alpha/include/uapi/asm/errno.h | 2 +
> >> arch/alpha/include/uapi/asm/fcntl.h | 3 ++
> >> arch/mips/include/uapi/asm/errno.h | 2 +
> >> arch/parisc/include/uapi/asm/errno.h | 2 +
> >> arch/parisc/include/uapi/asm/fcntl.h | 3 ++
> >> arch/sparc/include/uapi/asm/errno.h | 2 +
> >> arch/sparc/include/uapi/asm/fcntl.h | 3 ++
> >> fs/fcntl.c | 5 +-
> >> fs/locks.c | 97 +++++++++++++++++++++++++++++++---
> >> fs/namei.c | 53 ++++++++++++++++++-
> >> fs/proc_namespace.c | 1 +
> >> include/linux/fs.h | 8 +++
> >> include/uapi/asm-generic/errno.h | 2 +
> >> include/uapi/asm-generic/fcntl.h | 11 ++++
> >> include/uapi/linux/fs.h | 1 +
> >> 15 files changed, 185 insertions(+), 10 deletions(-)
> >>
> >
> > You might consider breaking this patch into two. One patch that makes
> > LOCK_MAND locks actually work and that adds MS_SHARELOCK, and one patch
> > that hooks that up to open(). Given the locking involved with the
> > i_mutex it would be best to present this as a series of small,
> > incremental changes.
>
> Good point. So, we can break it into 2:
> 1) make flock actually work with LOCK_MAND on MS_SHARELOCK mounts,
> 2) replace flock+LOCK_MAND with open+O_DENY* flags.
>
>
> >> diff --git a/arch/alpha/include/uapi/asm/errno.h b/arch/alpha/include/uapi/asm/errno.h
> >> index 17f92aa..953a6d6 100644
> >> --- a/arch/alpha/include/uapi/asm/errno.h
> >> +++ b/arch/alpha/include/uapi/asm/errno.h
> >> @@ -124,4 +124,6 @@
> >>
> >> #define EHWPOISON 139 /* Memory page has hardware error */
> >>
> >> +#define ESHAREDENIED 140 /* File is locked with a sharelock */
> >> +
> >> #endif
> >> diff --git a/arch/alpha/include/uapi/asm/fcntl.h b/arch/alpha/include/uapi/asm/fcntl.h
> >> index 09f49a6..265344b 100644
> >> --- a/arch/alpha/include/uapi/asm/fcntl.h
> >> +++ b/arch/alpha/include/uapi/asm/fcntl.h
> >> @@ -33,6 +33,9 @@
> >>
> >> #define O_PATH 040000000
> >> #define __O_TMPFILE 0100000000
> >> +#define O_DENYREAD 0200000000 /* Do not permit read access */
> >> +#define O_DENYWRITE 0400000000 /* Do not permit write access */
> >> +#define O_DENYDELETE 01000000000 /* Do not permit delete or rename */
> >>
> >> #define F_GETLK 7
> >> #define F_SETLK 8
> >> diff --git a/arch/mips/include/uapi/asm/errno.h b/arch/mips/include/uapi/asm/errno.h
> >> index 02d645d..f1a4068 100644
> >> --- a/arch/mips/include/uapi/asm/errno.h
> >> +++ b/arch/mips/include/uapi/asm/errno.h
> >> @@ -123,6 +123,8 @@
> >>
> >> #define EHWPOISON 168 /* Memory page has hardware error */
> >>
> >> +#define ESHAREDENIED 169 /* File is locked with a sharelock */
> >> +
> >> #define EDQUOT 1133 /* Quota exceeded */
> >>
> >>
> >> diff --git a/arch/parisc/include/uapi/asm/errno.h b/arch/parisc/include/uapi/asm/errno.h
> >> index f3a8aa5..654c232 100644
> >> --- a/arch/parisc/include/uapi/asm/errno.h
> >> +++ b/arch/parisc/include/uapi/asm/errno.h
> >> @@ -124,4 +124,6 @@
> >>
> >> #define EHWPOISON 257 /* Memory page has hardware error */
> >>
> >> +#define ESHAREDENIED 258 /* File is locked with a sharelock */
> >> +
> >> #endif
> >> diff --git a/arch/parisc/include/uapi/asm/fcntl.h b/arch/parisc/include/uapi/asm/fcntl.h
> >> index 34a46cb..5865964 100644
> >> --- a/arch/parisc/include/uapi/asm/fcntl.h
> >> +++ b/arch/parisc/include/uapi/asm/fcntl.h
> >> @@ -21,6 +21,9 @@
> >>
> >> #define O_PATH 020000000
> >> #define __O_TMPFILE 040000000
> >> +#define O_DENYREAD 0200000000 /* Do not permit read access */
> >> +#define O_DENYWRITE 0400000000 /* Do not permit write access */
> >> +#define O_DENYDELETE 01000000000 /* Do not permit delete or rename */
> >>
> >> #define F_GETLK64 8
> >> #define F_SETLK64 9
> >> diff --git a/arch/sparc/include/uapi/asm/errno.h b/arch/sparc/include/uapi/asm/errno.h
> >> index 20423e17..fe339b5 100644
> >> --- a/arch/sparc/include/uapi/asm/errno.h
> >> +++ b/arch/sparc/include/uapi/asm/errno.h
> >> @@ -114,4 +114,6 @@
> >>
> >> #define EHWPOISON 135 /* Memory page has hardware error */
> >>
> >> +#define ESHAREDENIED 136 /* File is locked with a sharelock */
> >> +
> >> #endif
> >> diff --git a/arch/sparc/include/uapi/asm/fcntl.h b/arch/sparc/include/uapi/asm/fcntl.h
> >> index 7e8ace5..ab68170 100644
> >> --- a/arch/sparc/include/uapi/asm/fcntl.h
> >> +++ b/arch/sparc/include/uapi/asm/fcntl.h
> >> @@ -36,6 +36,9 @@
> >>
> >> #define O_PATH 0x1000000
> >> #define __O_TMPFILE 0x2000000
> >> +#define O_DENYREAD 0x4000000 /* Do not permit read access */
> >> +#define O_DENYWRITE 0x8000000 /* Do not permit write access */
> >> +#define O_DENYDELETE 0x10000000 /* Do not permit delete or rename */
> >>
> >
> > It'd probably be best to add O_DENYDELETE in a separate patch, rather
> > than disabling it temporarily.
>
> Agree.
>
> >
> >> #define F_GETOWN 5 /* for sockets. */
> >> #define F_SETOWN 6 /* for sockets. */
> >> diff --git a/fs/fcntl.c b/fs/fcntl.c
> >> index ef68665..3f85887 100644
> >> --- a/fs/fcntl.c
> >> +++ b/fs/fcntl.c
> >> @@ -729,14 +729,15 @@ static int __init fcntl_init(void)
> >> * Exceptions: O_NONBLOCK is a two bit define on parisc; O_NDELAY
> >> * is defined as O_NONBLOCK on some platforms and not on others.
> >> */
> >> - BUILD_BUG_ON(20 - 1 /* for O_RDONLY being 0 */ != HWEIGHT32(
> >> + BUILD_BUG_ON(23 - 1 /* for O_RDONLY being 0 */ != HWEIGHT32(
> >> O_RDONLY | O_WRONLY | O_RDWR |
> >> O_CREAT | O_EXCL | O_NOCTTY |
> >> O_TRUNC | O_APPEND | /* O_NONBLOCK | */
> >> __O_SYNC | O_DSYNC | FASYNC |
> >> O_DIRECT | O_LARGEFILE | O_DIRECTORY |
> >> O_NOFOLLOW | O_NOATIME | O_CLOEXEC |
> >> - __FMODE_EXEC | O_PATH | __O_TMPFILE
> >> + __FMODE_EXEC | O_PATH | __O_TMPFILE |
> >> + O_DENYREAD | O_DENYWRITE | O_DENYDELETE
> >> ));
> >>
> >> fasync_cache = kmem_cache_create("fasync_cache",
> >> diff --git a/fs/locks.c b/fs/locks.c
> >> index 92a0f0a..ffde4d4 100644
> >> --- a/fs/locks.c
> >> +++ b/fs/locks.c
> >> @@ -708,20 +708,73 @@ static int posix_locks_conflict(struct file_lock *caller_fl, struct file_lock *s
> >> return (locks_conflict(caller_fl, sys_fl));
> >> }
> >>
> >> -/* Determine if lock sys_fl blocks lock caller_fl. FLOCK specific
> >> - * checking before calling the locks_conflict().
> >> +static unsigned int
> >> +deny_flags_to_cmd(unsigned int flags)
> >> +{
> >> + unsigned int cmd = LOCK_MAND;
> >> +
> >> + if (!(flags & O_DENYREAD))
> >> + cmd |= LOCK_READ;
> >> + if (!(flags & O_DENYWRITE))
> >> + cmd |= LOCK_WRITE;
> >> +
> >> + return cmd;
> >> +}
> >> +
> >> +/*
> >> + * locks_mand_conflict - Determine if there's a share reservation conflict
> >> + * @caller_fl: lock we're attempting to acquire
> >> + * @sys_fl: lock already present on system that we're checking against
> >> + *
> >> + * Check to see if there's a share_reservation conflict. LOCK_READ/LOCK_WRITE
> >> + * tell us whether the reservation allows other readers and writers.
> >> + */
> >> +static int
> >> +locks_mand_conflict(struct file_lock *caller_fl, struct file_lock *sys_fl)
> >> +{
> >> + unsigned char caller_type = caller_fl->fl_type;
> >> + unsigned char sys_type = sys_fl->fl_type;
> >> + fmode_t caller_fmode = caller_fl->fl_file->f_mode;
> >> + fmode_t sys_fmode = sys_fl->fl_file->f_mode;
> >> +
> >> + /* they can only conflict if FS is mounted with MS_SHARELOCK */
> >> + if (!IS_SHARELOCK(caller_fl->fl_file->f_path.dentry->d_inode))
> >> + return 0;
> >> +
> >> + /* they can only conflict if they're both LOCK_MAND */
> >> + if (!(caller_type & LOCK_MAND) || !(sys_type & LOCK_MAND))
> >> + return 0;
> >> +
> >> + if (!(caller_type & LOCK_READ) && (sys_fmode & FMODE_READ))
> >> + return 1;
> >> + if (!(caller_type & LOCK_WRITE) && (sys_fmode & FMODE_WRITE))
> >> + return 1;
> >> + if (!(sys_type & LOCK_READ) && (caller_fmode & FMODE_READ))
> >> + return 1;
> >> + if (!(sys_type & LOCK_WRITE) && (caller_fmode & FMODE_WRITE))
> >> + return 1;
> >> +
> >> + return 0;
> >> +}
> >> +
> >> +/*
> >> + * Determine if lock sys_fl blocks lock caller_fl. FLOCK specific checking
> >> + * before calling the locks_conflict().
> >> */
> >> static int flock_locks_conflict(struct file_lock *caller_fl, struct file_lock *sys_fl)
> >> {
> >> - /* FLOCK locks referring to the same filp do not conflict with
> >> + if (!IS_FLOCK(sys_fl))
> >> + return 0;
> >> + if ((caller_fl->fl_type & LOCK_MAND) || (sys_fl->fl_type & LOCK_MAND))
> >> + return locks_mand_conflict(caller_fl, sys_fl);
> >
> > nit: Seems like the above could be optimized a little. You know that
> > locks_mand_conflict is only relevant if both are LOCK_MAND, and one of
> > the first things that locks_mand_conflict does is to check that both
> > have that set.
>
> ok.
>
> >
> >> + /*
> >> + * FLOCK locks referring to the same filp do not conflict with
> >> * each other.
> >> */
> >> - if (!IS_FLOCK(sys_fl) || (caller_fl->fl_file == sys_fl->fl_file))
> >> - return (0);
> >> - if ((caller_fl->fl_type & LOCK_MAND) || (sys_fl->fl_type & LOCK_MAND))
> >> + if (caller_fl->fl_file == sys_fl->fl_file)
> >> return 0;
> >>
> >> - return (locks_conflict(caller_fl, sys_fl));
> >> + return locks_conflict(caller_fl, sys_fl);
> >> }
> >>
> >> void
> >> @@ -888,6 +941,36 @@ out:
> >> return error;
> >> }
> >>
> >> +/*
> >> + * Determine if a file is allowed to be opened with specified access and share
> >> + * modes. Lock the file and return 0 if checks passed, otherwise return
> >> + * -ESHAREDENIED.
> >> + */
> >> +int
> >> +sharelock_lock_file(struct file *filp)
> >> +{
> >> + struct file_lock *lock;
> >> + int error = 0;
> >> +
> >> + if (!IS_SHARELOCK(filp->f_path.dentry->d_inode))
> >> + return error;
> >> +
> >> + /* Disable O_DENYDELETE support for now */
> >> + if (filp->f_flags & O_DENYDELETE)
> >> + return -EINVAL;
> >> +
> >> + error = flock_make_lock(filp, &lock, deny_flags_to_cmd(filp->f_flags));
> >> + if (error)
> >> + return error;
> >> +
> >> + error = flock_lock_file(filp, lock);
> >> + if (error == -EAGAIN)
> >> + error = -ESHAREDENIED;
> >> +
> >> + locks_free_lock(lock);
> >> + return error;
> >> +}
> >> +
> >> static int __posix_lock_file(struct inode *inode, struct file_lock *request, struct file_lock *conflock)
> >> {
> >> struct file_lock *fl;
> >> diff --git a/fs/namei.c b/fs/namei.c
> >> index 3531dee..2b741a1 100644
> >> --- a/fs/namei.c
> >> +++ b/fs/namei.c
> >> @@ -2725,9 +2725,14 @@ static int atomic_open(struct nameidata *nd, struct dentry *dentry,
> >> acc_mode = MAY_OPEN;
> >> }
> >> error = may_open(&file->f_path, acc_mode, open_flag);
> >> - if (error)
> >> + if (error) {
> >> fput(file);
> >> + goto out;
> >> + }
> >>
> >> + error = sharelock_lock_file(file);
> >> + if (error)
> >> + fput(file);
> >> out:
> >> dput(dentry);
> >> return error;
> >> @@ -2919,6 +2924,40 @@ retry_lookup:
> >> }
> >> mutex_lock(&dir->d_inode->i_mutex);
> >> error = lookup_open(nd, path, file, op, got_write, opened);
> >> +
> >> + /*
> >> + * For sharelock mounts if a file was created but not opened, we need
> >> + * to keep parent i_mutex until we finish the open to prevent races when
> >> + * somebody opens newly created by us file and locks it with a sharelock
> >> + * before we open it.
> >> + */
> >> + if (IS_SHARELOCK(dir->d_inode) && error > 0 && *opened & FILE_CREATED) {
> >> + /* Don't check for write permission, don't truncate */
> >> + open_flag &= ~O_TRUNC;
> >> + will_truncate = false;
> >> + acc_mode = MAY_OPEN;
> >> + path_to_nameidata(path, nd);
> >> +
> >> + error = may_open(&nd->path, acc_mode, open_flag);
> >> + if (error) {
> >> + mutex_unlock(&dir->d_inode->i_mutex);
> >> + goto out;
> >> + }
> >> + file->f_path.mnt = nd->path.mnt;
> >> + error = finish_open(file, nd->path.dentry, NULL, opened);
> >> + if (error) {
> >> + mutex_unlock(&dir->d_inode->i_mutex);
> >> + if (error == -EOPENSTALE)
> >> + goto stale_open;
> >> + goto out;
> >> + }
> >> + error = sharelock_lock_file(file);
> >> + mutex_unlock(&dir->d_inode->i_mutex);
> >> + if (error)
> >> + goto exit_fput;
> >> + goto opened;
> >> + }
> >> +
> >> mutex_unlock(&dir->d_inode->i_mutex);
> >>
> >> if (error <= 0) {
> >> @@ -3034,6 +3073,18 @@ finish_open_created:
> >> goto stale_open;
> >> goto out;
> >> }
> >> +
> >> + if (IS_SHARELOCK(dir->d_inode)) {
> >> + /*
> >> + * Lock parent i_mutex to prevent races with sharelocks on
> >> + * newly created files.
> >> + */
> >> + mutex_lock(&dir->d_inode->i_mutex);
> >> + error = sharelock_lock_file(file);
> >> + mutex_unlock(&dir->d_inode->i_mutex);
> >> + if (error)
> >> + goto exit_fput;
> >> + }
> >> opened:
> >> error = open_check_o_direct(file);
> >> if (error)
> >> diff --git a/fs/proc_namespace.c b/fs/proc_namespace.c
> >> index 439406e..dd374d4 100644
> >> --- a/fs/proc_namespace.c
> >> +++ b/fs/proc_namespace.c
> >> @@ -44,6 +44,7 @@ static int show_sb_opts(struct seq_file *m, struct super_block *sb)
> >> { MS_SYNCHRONOUS, ",sync" },
> >> { MS_DIRSYNC, ",dirsync" },
> >> { MS_MANDLOCK, ",mand" },
> >> + { MS_SHARELOCK, ",sharelock" },
> >> { 0, NULL }
> >> };
> >> const struct proc_fs_info *fs_infop;
> >> diff --git a/include/linux/fs.h b/include/linux/fs.h
> >> index 121f11f..aa061ca 100644
> >> --- a/include/linux/fs.h
> >> +++ b/include/linux/fs.h
> >> @@ -1029,6 +1029,7 @@ extern int vfs_setlease(struct file *, long, struct file_lock **);
> >> extern int lease_modify(struct file_lock **, int);
> >> extern int lock_may_read(struct inode *, loff_t start, unsigned long count);
> >> extern int lock_may_write(struct inode *, loff_t start, unsigned long count);
> >> +extern int sharelock_lock_file(struct file *);
> >> #else /* !CONFIG_FILE_LOCKING */
> >> static inline int fcntl_getlk(struct file *file, struct flock __user *user)
> >> {
> >> @@ -1169,6 +1170,12 @@ static inline int lock_may_write(struct inode *inode, loff_t start,
> >> {
> >> return 1;
> >> }
> >> +
> >> +static inline int sharelock_lock_file(struct file *filp)
> >> +{
> >> + return 0;
> >> +}
> >> +
> >> #endif /* !CONFIG_FILE_LOCKING */
> >>
> >>
> >> @@ -1675,6 +1682,7 @@ struct super_operations {
> >> #define IS_PRIVATE(inode) ((inode)->i_flags & S_PRIVATE)
> >> #define IS_IMA(inode) ((inode)->i_flags & S_IMA)
> >> #define IS_AUTOMOUNT(inode) ((inode)->i_flags & S_AUTOMOUNT)
> >> +#define IS_SHARELOCK(inode) __IS_FLG(inode, MS_SHARELOCK)
> >> #define IS_NOSEC(inode) ((inode)->i_flags & S_NOSEC)
> >>
> >> /*
> >> diff --git a/include/uapi/asm-generic/errno.h b/include/uapi/asm-generic/errno.h
> >> index 1e1ea6e..aff869c 100644
> >> --- a/include/uapi/asm-generic/errno.h
> >> +++ b/include/uapi/asm-generic/errno.h
> >> @@ -110,4 +110,6 @@
> >>
> >> #define EHWPOISON 133 /* Memory page has hardware error */
> >>
> >> +#define ESHAREDENIED 134 /* File is locked with a sharelock */
> >> +
> >> #endif
> >> diff --git a/include/uapi/asm-generic/fcntl.h b/include/uapi/asm-generic/fcntl.h
> >> index 95e46c8..9881cfe 100644
> >> --- a/include/uapi/asm-generic/fcntl.h
> >> +++ b/include/uapi/asm-generic/fcntl.h
> >> @@ -92,6 +92,17 @@
> >> #define O_TMPFILE (__O_TMPFILE | O_DIRECTORY)
> >> #define O_TMPFILE_MASK (__O_TMPFILE | O_DIRECTORY | O_CREAT)
> >>
> >> +#ifndef O_DENYREAD
> >> +#define O_DENYREAD 040000000 /* Do not permit read access */
> >> +#endif
> >> +/* FMODE_NONOTIFY 0100000000 */
> >> +#ifndef O_DENYWRITE
> >> +#define O_DENYWRITE 0200000000 /* Do not permit write access */
> >> +#endif
> >> +#ifndef O_DENYDELETE
> >> +#define O_DENYDELETE 0400000000 /* Do not permit delete or rename */
> >> +#endif
> >> +
> >
> > One thing to consider: We found with the addition of O_TMPFILE that the
> > open() api is not particularly helpful when it comes to informing
> > appications when a flag isn't supported:
> >
> > http://lwn.net/Articles/562294/
> >
> > ...having a plan to cope with that here would be best. How can an
> > application determine at runtime that O_DENY* actually *work*? It may
> > be best to step back and consider a new syscall for this (open2() ?).
> >
>
> So, consider we added new syscall:
>
> opendm(filename, flags, mode, deny_mode)
> {
> return open(filename, flags | denymode2openflags(deny_mode), mode)
> }
>
> where deny_mode can be DMODE_NONE (0), DMODE_READ (1), DMODE_WRITE(2)
> and DMODE_RDWR(3) (similar to FMODE_* values).
>
> We have open and opendm that act actually in the same manner for
> mounts without MS_SHARELOCK. For mounts with MS_SHARELOCK open is like
> opendm with DMODE_NONE. Open flags O_DENY* are for internal use only.
>
> Is it what you suggest?
>


Right, something that like that maybe...

...or possibly consider not making this specific to deny modes or
anything, and just consider adding a new generic "openat2()" syscall.

This one could have a larger (or maybe extensible) field for flags,
and well-defined behavior when presented with a flag that it doesn't
understand.

I realize that that's a large increase in scope, but it can often be
easier to get new features merged if you are simultaneously addressing
other problems that exist. ;)

--
Jeff Layton <[email protected]>