Hi Linus,
Please pull the following Ceph updates from
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client.git for-linus
This is a big batch. From Ilya we have:
- rbd support for more than ~250 mapped devices (now uses same scheme
that SCSI does for device major/minor numbering)
- crush updates for new mapping behaviors (will be needed for coming
erasure coding support, among other things)
- preliminary support for tiered storage pools
There is also a big series fixing a pile cephfs bugs with clustered MDSs
from Yan Zheng, ACL support for cephfs from Guangliang Zhao, ceph fscache
improvements from Li Wang, improved behavior when we get ENOSPC from Josh
Durgin, some readv/writev improvements from Majianpeng, and the usual mix
of small cleanups.
Thanks!
sage
----------------------------------------------------------------
Alex Elder (1):
MAINTAINERS: update an e-mail address
Guangliang Zhao (1):
ceph: add acl for cephfs
Ilya Dryomov (51):
rbd: rbd_device::dev_id is an int, format it as such
rbd: tweak "loaded" message and module description
rbd: refactor rbd_init() a bit
rbd: switch to ida for rbd id assignments
rbd: add 'minor' sysfs rbd device attribute
rbd: wire up is_visible() sysfs callback for rbd bus
rbd: add support for single-major device number allocation scheme
rbd: enable extended devt in single-major mode
rbd: introduce rbd_dev_header_unwatch_sync() and switch to it
rbd: tear down watch request if rbd_dev_device_setup() fails
libceph: all features fields must be u64
libceph: update ceph_features.h
crush: pass weight vector size to map function
crush: factor out (trivial) crush_destroy_rule()
crush: reduce scope of some local variables
crush: fix some comments
crush: eliminate CRUSH_MAX_SET result size limitation
crush: return CRUSH_ITEM_UNDEF for failed placements with indep
crush: use breadth-first search for indep mode
crush: add note about r in recursive choose
crush: strip firstn conditionals out of crush_choose, rename
crush: clarify numrep vs endpos
crush: pass parent r value for indep call
crush: new SET_CHOOSE_LEAF_TRIES command
crush: apply chooseleaf_tries to firstn mode too
crush: add SET_CHOOSE_TRIES rule step
crush: CHOOSE_LEAF -> CHOOSELEAF throughout
crush: generalize descend_once
crush: add set_choose_local_[fallback_]tries steps
crush: attempts -> tries
crush: fix crush_choose_firstn comment
crush: support new indep mode and SET_* steps (crush v2) by default
libceph: use CEPH_MON_PORT when the specified port is 0
libceph: rename ceph_msg::front_max to front_alloc_len
libceph: rename front to front_len in get_reply()
libceph: fix preallocation check in get_reply()
libceph: add ceph_kv{malloc,free}() and switch to them
libceph: dout() is missing a newline
libceph: start using oloc abstraction
libceph: move ceph_file_layout helpers to ceph_fs.h
libceph: rename MAX_OBJ_NAME_SIZE to CEPH_MAX_OID_NAME_LEN
libceph: introduce and start using oid abstraction
libceph: replace ceph_calc_ceph_pg() with ceph_oloc_oid_to_pg()
libceph: CEPH_OSD_FLAG_* enum update
libceph: add ceph_pg_pool_by_id()
libceph: follow {read,write}_tier fields on osd request submission
libceph: rename ceph_osd_request::r_{oloc,oid} to r_base_{oloc,oid}
libceph: follow redirect replies from osds
libceph: support CEPH_FEATURE_OSD_CACHEPOOL feature
ceph: fix dout() compile warnings in ceph_filemap_fault()
ceph: cast PAGE_SIZE to size_t in ceph_sync_write()
J. Bruce Fields (1):
ceph: trivial comment fix
Josh Durgin (2):
libceph: block I/O when PAUSE or FULL osd map flags are set
libceph: resend all writes after the osdmap loses the full flag
Li Wang (4):
ceph: Clean up if error occurred in finish_read()
ceph: Add necessary clean up if invalid reply received in handle_reply()
ceph fscache: Introduce a routine for uncaching single no data page from fscache
ceph fscache: Uncaching no data page from fscache in readpage()
Libo Chen (1):
fs: ceph: new helper: file_inode(file)
Yan, Zheng (13):
ceph: drop unconnected inodes
ceph: check caps in filemap_fault and page_mkwrite
ceph: handle cap export race in try_flush_caps()
ceph: use ceph_seq_cmp() to compare migrate_seq
ceph: fix cache revoke race
ceph: fix trim caps
ceph: handle -ESTALE reply
ceph: check inode caps in ceph_d_revalidate
ceph: handle session flush message
ceph: remove exported caps when handling cap import message
ceph: add open export target session helper
ceph: add imported caps when handling cap export message
libceph: support CEPH_FEATURE_EXPORT_PEER
majianpeng (2):
ceph: Implement writev/pwritev for sync operation.
ceph: implement readv/preadv for sync operation
Documentation/ABI/testing/sysfs-bus-rbd | 26 ++
MAINTAINERS | 2 +-
drivers/block/rbd.c | 303 ++++++++++++++--------
fs/ceph/Kconfig | 13 +
fs/ceph/Makefile | 1 +
fs/ceph/acl.c | 332 ++++++++++++++++++++++++
fs/ceph/addr.c | 93 ++++++-
fs/ceph/cache.h | 13 +
fs/ceph/caps.c | 338 +++++++++++++++---------
fs/ceph/dir.c | 16 +-
fs/ceph/file.c | 437 ++++++++++++++++++++++----------
fs/ceph/inode.c | 33 ++-
fs/ceph/ioctl.c | 8 +-
fs/ceph/mds_client.c | 132 ++++++----
fs/ceph/mds_client.h | 2 +
fs/ceph/strings.c | 2 +
fs/ceph/super.c | 9 +-
fs/ceph/super.h | 45 +++-
fs/ceph/xattr.c | 60 ++++-
include/linux/ceph/buffer.h | 1 -
include/linux/ceph/ceph_features.h | 101 +++++---
include/linux/ceph/ceph_fs.h | 36 ++-
include/linux/ceph/libceph.h | 19 +-
include/linux/ceph/messenger.h | 13 +-
include/linux/ceph/osd_client.h | 19 +-
include/linux/ceph/osdmap.h | 66 +++--
include/linux/ceph/rados.h | 4 +
include/linux/crush/crush.h | 20 +-
include/linux/crush/mapper.h | 3 +-
net/ceph/buffer.c | 22 +-
net/ceph/ceph_common.c | 24 +-
net/ceph/crush/crush.c | 7 +-
net/ceph/crush/mapper.c | 336 +++++++++++++++++++-----
net/ceph/debugfs.c | 3 +-
net/ceph/messenger.c | 32 +--
net/ceph/mon_client.c | 8 +-
net/ceph/osd_client.c | 283 +++++++++++++++++++--
net/ceph/osdmap.c | 78 ++++--
38 files changed, 2261 insertions(+), 679 deletions(-)
create mode 100644 fs/ceph/acl.c
On Tue, Jan 28, 2014 at 10:40:16AM -0800, Sage Weil wrote:
> Hi Linus,
>
> Please pull the following Ceph updates from
>
> git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client.git for-linus
>
> This is a big batch. From Ilya we have:
>
> - rbd support for more than ~250 mapped devices (now uses same scheme
> that SCSI does for device major/minor numbering)
> - crush updates for new mapping behaviors (will be needed for coming
> erasure coding support, among other things)
> - preliminary support for tiered storage pools
>
> There is also a big series fixing a pile cephfs bugs with clustered MDSs
> from Yan Zheng, ACL support for cephfs from Guangliang Zhao, ceph fscache
> improvements from Li Wang, improved behavior when we get ENOSPC from Josh
> Durgin, some readv/writev improvements from Majianpeng, and the usual mix
> of small cleanups.
This breaks the build for me.
fs/ceph/acl.c: In function ‘ceph_init_acl’:
fs/ceph/acl.c:216:3: warning: passing argument 1 of ‘posix_acl_create’ from incompatible pointer type [enabled by default]
ret = posix_acl_create(&acl, GFP_NOFS, &inode->i_mode);
^
In file included from include/linux/posix_acl_xattr.h:12:0,
from fs/ceph/acl.c:25:
include/linux/posix_acl.h:96:12: note: expected ‘struct inode *’ but argument is of type ‘struct posix_acl **’
extern int posix_acl_create(struct inode *, umode_t *, struct posix_acl **,
^
fs/ceph/acl.c:216:3: warning: passing argument 2 of ‘posix_acl_create’ makes pointer from integer without a cast [enabled by default]
ret = posix_acl_create(&acl, GFP_NOFS, &inode->i_mode);
^
In file included from include/linux/posix_acl_xattr.h:12:0,
from fs/ceph/acl.c:25:
include/linux/posix_acl.h:96:12: note: expected ‘umode_t *’ but argument is of type ‘unsigned int’
extern int posix_acl_create(struct inode *, umode_t *, struct posix_acl **,
^
fs/ceph/acl.c:216:3: warning: passing argument 3 of ‘posix_acl_create’ from incompatible pointer type [enabled by default]
ret = posix_acl_create(&acl, GFP_NOFS, &inode->i_mode);
^
In file included from include/linux/posix_acl_xattr.h:12:0,
from fs/ceph/acl.c:25:
include/linux/posix_acl.h:96:12: note: expected ‘struct posix_acl **’ but argument is of type ‘umode_t *’
extern int posix_acl_create(struct inode *, umode_t *, struct posix_acl **,
^
fs/ceph/acl.c:216:3: error: too few arguments to function ‘posix_acl_create’
ret = posix_acl_create(&acl, GFP_NOFS, &inode->i_mode);
^
In file included from include/linux/posix_acl_xattr.h:12:0,
from fs/ceph/acl.c:25:
include/linux/posix_acl.h:96:12: note: declared here
extern int posix_acl_create(struct inode *, umode_t *, struct posix_acl **,
^
fs/ceph/acl.c: In function ‘ceph_acl_chmod’:
fs/ceph/acl.c:252:2: warning: passing argument 1 of ‘posix_acl_chmod’ from incompatible pointer type [enabled by default]
ret = posix_acl_chmod(&acl, GFP_KERNEL, inode->i_mode);
^
In file included from include/linux/posix_acl_xattr.h:12:0,
from fs/ceph/acl.c:25:
include/linux/posix_acl.h:95:12: note: expected ‘struct inode *’ but argument is of type ‘struct posix_acl **’
extern int posix_acl_chmod(struct inode *, umode_t);
^
fs/ceph/acl.c:252:2: error: too many arguments to function ‘posix_acl_chmod’
ret = posix_acl_chmod(&acl, GFP_KERNEL, inode->i_mode);
^
In file included from include/linux/posix_acl_xattr.h:12:0,
from fs/ceph/acl.c:25:
include/linux/posix_acl.h:95:12: note: declared here
extern int posix_acl_chmod(struct inode *, umode_t);
On Tue, Jan 28, 2014 at 1:10 PM, Dave Jones <[email protected]> wrote:
>
> This breaks the build for me.
It is my merge (Christoph's ACL changes came in today through the VFS
tree from Al).
I was doing the merges today on my laptop (I had jury duty yesterday
and today), and so I didn't do the allmodconfig build I would normally
do on my (much faster) desktop. Well, actually I did do the full fs
builds for the earlier pulls that actually had some conflicts, but not
for the ceph pull. The conflict was hidden by the fact that the whole
cifs ACL support is new, so there was no data conflict, just a silent
semantic conflict between the new smarter ACL helpers and the new ACL
use in CIFS.
I'm back home now (yay, all the afternoon cases got settled), and I
see the problem now. I should have done an allmodconfig build
immediately after coming home, but I never even thought of it.
Anyway, here's an *untested* conversion to the new posix acl helper
infrastructure. I do wonder if ceph_init_acl() could be converted to
posix_acl_create(), right now that part is a "non-conversion" - it's
just made to use __posix_acl_create() that implements the old
interface.
Al, Christoph, can you please check my conversion for sanity from a
generic posix-acl standpoint?
Sage, Guangliang, Li, can you check the actual cifs usage/sanity of
the attached patch?
Sorry about the messed-up merge. Although I also blame Al, because
he's horrible about having his changes in linux-next, so nobody was
ever really aware of this semantic conflict.
Al. Bad, bad boy. Consider yourself hit with a rolled-up newspaper.
Linus
Hi Linus,
On Tue, 28 Jan 2014, Linus Torvalds wrote:
> On Tue, Jan 28, 2014 at 1:10 PM, Dave Jones <[email protected]> wrote:
> >
> > This breaks the build for me.
>
> It is my merge (Christoph's ACL changes came in today through the VFS
> tree from Al).
>
> I was doing the merges today on my laptop (I had jury duty yesterday
> and today), and so I didn't do the allmodconfig build I would normally
> do on my (much faster) desktop. Well, actually I did do the full fs
> builds for the earlier pulls that actually had some conflicts, but not
> for the ceph pull. The conflict was hidden by the fact that the whole
> cifs ACL support is new, so there was no data conflict, just a silent
> semantic conflict between the new smarter ACL helpers and the new ACL
> use in CIFS.
s/cifs/ceph/ :)
> I'm back home now (yay, all the afternoon cases got settled), and I
> see the problem now. I should have done an allmodconfig build
> immediately after coming home, but I never even thought of it.
>
> Anyway, here's an *untested* conversion to the new posix acl helper
> infrastructure. I do wonder if ceph_init_acl() could be converted to
> posix_acl_create(), right now that part is a "non-conversion" - it's
> just made to use __posix_acl_create() that implements the old
> interface.
>
> Al, Christoph, can you please check my conversion for sanity from a
> generic posix-acl standpoint?
>
> Sage, Guangliang, Li, can you check the actual cifs usage/sanity of
> the attached patch?
Superficially at least the conversion looks okay to me, but it's not
passing my smoke test (it's giving me EOPNOTSUPP for chmod and setxattr
when setting an ACL). I'll look at it tomorrow if Guangliang, Li, or Yan
don't get there first.
I should have caught this before--I knew the ACL changes were coming and
forgot to check the merged build beforehand!
sage
On Tue, 28 Jan 2014, Sage Weil wrote:
> Hi Linus,
>
> On Tue, 28 Jan 2014, Linus Torvalds wrote:
> > On Tue, Jan 28, 2014 at 1:10 PM, Dave Jones <[email protected]> wrote:
> > >
> > > This breaks the build for me.
> >
> > It is my merge (Christoph's ACL changes came in today through the VFS
> > tree from Al).
> >
> > I was doing the merges today on my laptop (I had jury duty yesterday
> > and today), and so I didn't do the allmodconfig build I would normally
> > do on my (much faster) desktop. Well, actually I did do the full fs
> > builds for the earlier pulls that actually had some conflicts, but not
> > for the ceph pull. The conflict was hidden by the fact that the whole
> > cifs ACL support is new, so there was no data conflict, just a silent
> > semantic conflict between the new smarter ACL helpers and the new ACL
> > use in CIFS.
>
> s/cifs/ceph/ :)
>
> > I'm back home now (yay, all the afternoon cases got settled), and I
> > see the problem now. I should have done an allmodconfig build
> > immediately after coming home, but I never even thought of it.
> >
> > Anyway, here's an *untested* conversion to the new posix acl helper
> > infrastructure. I do wonder if ceph_init_acl() could be converted to
> > posix_acl_create(), right now that part is a "non-conversion" - it's
> > just made to use __posix_acl_create() that implements the old
> > interface.
> >
> > Al, Christoph, can you please check my conversion for sanity from a
> > generic posix-acl standpoint?
> >
> > Sage, Guangliang, Li, can you check the actual cifs usage/sanity of
> > the attached patch?
>
> Superficially at least the conversion looks okay to me, but it's not
> passing my smoke test (it's giving me EOPNOTSUPP for chmod and setxattr
> when setting an ACL). I'll look at it tomorrow if Guangliang, Li, or Yan
> don't get there first.
The set_acl inode_operation wasn't getting set, and the prototype needed
to be adjusted a bit (it doesn't take a dentry anymore). All seems to be
well with the below patch.
Thanks!
sage
>From 01baa54e113060eb9147548fe7beb572522a645a Mon Sep 17 00:00:00 2001
From: Sage Weil <[email protected]>
Date: Wed, 29 Jan 2014 06:22:25 -0800
Subject: [PATCH] ceph: fix posix ACL hooks
The merge of 7221fe4c2 raced with upstream changes in the generic POSIX
ACL code (2aeccbe95). Update Ceph to use the new helpers as well by
dropping the now-generic functions and setting the set_acl inode op.
Signed-off-by: Sage Weil <[email protected]>
---
fs/ceph/acl.c | 112 +++-----------------------------------------------------
fs/ceph/dir.c | 1 +
fs/ceph/inode.c | 5 ++-
fs/ceph/super.h | 5 +--
fs/ceph/xattr.c | 5 ++-
5 files changed, 15 insertions(+), 113 deletions(-)
diff --git a/fs/ceph/acl.c b/fs/ceph/acl.c
index 64fddbc..66d377a 100644
--- a/fs/ceph/acl.c
+++ b/fs/ceph/acl.c
@@ -107,14 +107,14 @@ struct posix_acl *ceph_get_acl(struct inode *inode, int type)
return acl;
}
-static int ceph_set_acl(struct dentry *dentry, struct inode *inode,
- struct posix_acl *acl, int type)
+int ceph_set_acl(struct inode *inode, struct posix_acl *acl, int type)
{
int ret = 0, size = 0;
const char *name = NULL;
char *value = NULL;
struct iattr newattrs;
umode_t new_mode = inode->i_mode, old_mode = inode->i_mode;
+ struct dentry *dentry = d_find_alias(inode);
if (acl) {
ret = posix_acl_valid(acl);
@@ -208,16 +208,15 @@ int ceph_init_acl(struct dentry *dentry, struct inode *inode, struct inode *dir)
if (IS_POSIXACL(dir) && acl) {
if (S_ISDIR(inode->i_mode)) {
- ret = ceph_set_acl(dentry, inode, acl,
- ACL_TYPE_DEFAULT);
+ ret = ceph_set_acl(inode, acl, ACL_TYPE_DEFAULT);
if (ret)
goto out_release;
}
- ret = posix_acl_create(&acl, GFP_NOFS, &inode->i_mode);
+ ret = __posix_acl_create(&acl, GFP_NOFS, &inode->i_mode);
if (ret < 0)
goto out;
else if (ret > 0)
- ret = ceph_set_acl(dentry, inode, acl, ACL_TYPE_ACCESS);
+ ret = ceph_set_acl(inode, acl, ACL_TYPE_ACCESS);
else
cache_no_acl(inode);
} else {
@@ -229,104 +228,3 @@ out_release:
out:
return ret;
}
-
-int ceph_acl_chmod(struct dentry *dentry, struct inode *inode)
-{
- struct posix_acl *acl;
- int ret = 0;
-
- if (S_ISLNK(inode->i_mode)) {
- ret = -EOPNOTSUPP;
- goto out;
- }
-
- if (!IS_POSIXACL(inode))
- goto out;
-
- acl = ceph_get_acl(inode, ACL_TYPE_ACCESS);
- if (IS_ERR_OR_NULL(acl)) {
- ret = PTR_ERR(acl);
- goto out;
- }
-
- ret = posix_acl_chmod(&acl, GFP_KERNEL, inode->i_mode);
- if (ret)
- goto out;
- ret = ceph_set_acl(dentry, inode, acl, ACL_TYPE_ACCESS);
- posix_acl_release(acl);
-out:
- return ret;
-}
-
-static int ceph_xattr_acl_get(struct dentry *dentry, const char *name,
- void *value, size_t size, int type)
-{
- struct posix_acl *acl;
- int ret = 0;
-
- if (!IS_POSIXACL(dentry->d_inode))
- return -EOPNOTSUPP;
-
- acl = ceph_get_acl(dentry->d_inode, type);
- if (IS_ERR(acl))
- return PTR_ERR(acl);
- if (acl == NULL)
- return -ENODATA;
-
- ret = posix_acl_to_xattr(&init_user_ns, acl, value, size);
- posix_acl_release(acl);
-
- return ret;
-}
-
-static int ceph_xattr_acl_set(struct dentry *dentry, const char *name,
- const void *value, size_t size, int flags, int type)
-{
- int ret = 0;
- struct posix_acl *acl = NULL;
-
- if (!inode_owner_or_capable(dentry->d_inode)) {
- ret = -EPERM;
- goto out;
- }
-
- if (!IS_POSIXACL(dentry->d_inode)) {
- ret = -EOPNOTSUPP;
- goto out;
- }
-
- if (value) {
- acl = posix_acl_from_xattr(&init_user_ns, value, size);
- if (IS_ERR(acl)) {
- ret = PTR_ERR(acl);
- goto out;
- }
-
- if (acl) {
- ret = posix_acl_valid(acl);
- if (ret)
- goto out_release;
- }
- }
-
- ret = ceph_set_acl(dentry, dentry->d_inode, acl, type);
-
-out_release:
- posix_acl_release(acl);
-out:
- return ret;
-}
-
-const struct xattr_handler ceph_xattr_acl_default_handler = {
- .prefix = POSIX_ACL_XATTR_DEFAULT,
- .flags = ACL_TYPE_DEFAULT,
- .get = ceph_xattr_acl_get,
- .set = ceph_xattr_acl_set,
-};
-
-const struct xattr_handler ceph_xattr_acl_access_handler = {
- .prefix = POSIX_ACL_XATTR_ACCESS,
- .flags = ACL_TYPE_ACCESS,
- .get = ceph_xattr_acl_get,
- .set = ceph_xattr_acl_set,
-};
diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
index 619616d..6da4df8 100644
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -1303,6 +1303,7 @@ const struct inode_operations ceph_dir_iops = {
.listxattr = ceph_listxattr,
.removexattr = ceph_removexattr,
.get_acl = ceph_get_acl,
+ .set_acl = ceph_set_acl,
.mknod = ceph_mknod,
.symlink = ceph_symlink,
.mkdir = ceph_mkdir,
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 6fc10a7..32d519d 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -9,6 +9,7 @@
#include <linux/namei.h>
#include <linux/writeback.h>
#include <linux/vmalloc.h>
+#include <linux/posix_acl.h>
#include "super.h"
#include "mds_client.h"
@@ -96,6 +97,7 @@ const struct inode_operations ceph_file_iops = {
.listxattr = ceph_listxattr,
.removexattr = ceph_removexattr,
.get_acl = ceph_get_acl,
+ .set_acl = ceph_set_acl,
};
@@ -1615,6 +1617,7 @@ static const struct inode_operations ceph_symlink_iops = {
.listxattr = ceph_listxattr,
.removexattr = ceph_removexattr,
.get_acl = ceph_get_acl,
+ .set_acl = ceph_set_acl,
};
/*
@@ -1805,7 +1808,7 @@ int ceph_setattr(struct dentry *dentry, struct iattr *attr)
__mark_inode_dirty(inode, inode_dirty_flags);
if (ia_valid & ATTR_MODE) {
- err = ceph_acl_chmod(dentry, inode);
+ err = posix_acl_chmod(inode, attr->ia_mode);
if (err)
goto out_put;
}
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index c299f7d..eabf601 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -718,6 +718,7 @@ extern void ceph_queue_writeback(struct inode *inode);
extern int ceph_do_getattr(struct inode *inode, int mask);
extern int ceph_permission(struct inode *inode, int mask);
extern int ceph_setattr(struct dentry *dentry, struct iattr *attr);
+extern int ceph_setattr(struct dentry *dentry, struct iattr *attr);
extern int ceph_getattr(struct vfsmount *mnt, struct dentry *dentry,
struct kstat *stat);
@@ -736,15 +737,13 @@ extern void __init ceph_xattr_init(void);
extern void ceph_xattr_exit(void);
/* acl.c */
-extern const struct xattr_handler ceph_xattr_acl_access_handler;
-extern const struct xattr_handler ceph_xattr_acl_default_handler;
extern const struct xattr_handler *ceph_xattr_handlers[];
#ifdef CONFIG_CEPH_FS_POSIX_ACL
struct posix_acl *ceph_get_acl(struct inode *, int);
+int ceph_set_acl(struct inode *inode, struct posix_acl *acl, int type);
int ceph_init_acl(struct dentry *, struct inode *, struct inode *);
-int ceph_acl_chmod(struct dentry *, struct inode *);
void ceph_forget_all_cached_acls(struct inode *inode);
#else
diff --git a/fs/ceph/xattr.c b/fs/ceph/xattr.c
index c7581f37..898b656 100644
--- a/fs/ceph/xattr.c
+++ b/fs/ceph/xattr.c
@@ -6,6 +6,7 @@
#include <linux/ceph/decode.h>
#include <linux/xattr.h>
+#include <linux/posix_acl_xattr.h>
#include <linux/slab.h>
#define XATTR_CEPH_PREFIX "ceph."
@@ -17,8 +18,8 @@
*/
const struct xattr_handler *ceph_xattr_handlers[] = {
#ifdef CONFIG_CEPH_FS_POSIX_ACL
- &ceph_xattr_acl_access_handler,
- &ceph_xattr_acl_default_handler,
+ &posix_acl_access_xattr_handler,
+ &posix_acl_default_xattr_handler,
#endif
NULL,
};
--
1.8.5.1
On Wed, Jan 29, 2014 at 4:30 PM, Sage Weil <[email protected]> wrote:
> On Tue, 28 Jan 2014, Sage Weil wrote:
>> Hi Linus,
>>
>> On Tue, 28 Jan 2014, Linus Torvalds wrote:
>> > On Tue, Jan 28, 2014 at 1:10 PM, Dave Jones <[email protected]> wrote:
>> > >
>> > > This breaks the build for me.
>> >
>> > It is my merge (Christoph's ACL changes came in today through the VFS
>> > tree from Al).
>> >
>> > I was doing the merges today on my laptop (I had jury duty yesterday
>> > and today), and so I didn't do the allmodconfig build I would normally
>> > do on my (much faster) desktop. Well, actually I did do the full fs
>> > builds for the earlier pulls that actually had some conflicts, but not
>> > for the ceph pull. The conflict was hidden by the fact that the whole
>> > cifs ACL support is new, so there was no data conflict, just a silent
>> > semantic conflict between the new smarter ACL helpers and the new ACL
>> > use in CIFS.
>>
>> s/cifs/ceph/ :)
>>
>> > I'm back home now (yay, all the afternoon cases got settled), and I
>> > see the problem now. I should have done an allmodconfig build
>> > immediately after coming home, but I never even thought of it.
>> >
>> > Anyway, here's an *untested* conversion to the new posix acl helper
>> > infrastructure. I do wonder if ceph_init_acl() could be converted to
>> > posix_acl_create(), right now that part is a "non-conversion" - it's
>> > just made to use __posix_acl_create() that implements the old
>> > interface.
>> >
>> > Al, Christoph, can you please check my conversion for sanity from a
>> > generic posix-acl standpoint?
>> >
>> > Sage, Guangliang, Li, can you check the actual cifs usage/sanity of
>> > the attached patch?
>>
>> Superficially at least the conversion looks okay to me, but it's not
>> passing my smoke test (it's giving me EOPNOTSUPP for chmod and setxattr
>> when setting an ACL). I'll look at it tomorrow if Guangliang, Li, or Yan
>> don't get there first.
>
> The set_acl inode_operation wasn't getting set, and the prototype needed
> to be adjusted a bit (it doesn't take a dentry anymore). All seems to be
> well with the below patch.
>
> Thanks!
> sage
>
>
> From 01baa54e113060eb9147548fe7beb572522a645a Mon Sep 17 00:00:00 2001
> From: Sage Weil <[email protected]>
> Date: Wed, 29 Jan 2014 06:22:25 -0800
> Subject: [PATCH] ceph: fix posix ACL hooks
>
> The merge of 7221fe4c2 raced with upstream changes in the generic POSIX
> ACL code (2aeccbe95). Update Ceph to use the new helpers as well by
> dropping the now-generic functions and setting the set_acl inode op.
>
> Signed-off-by: Sage Weil <[email protected]>
> ---
> fs/ceph/acl.c | 112 +++-----------------------------------------------------
> fs/ceph/dir.c | 1 +
> fs/ceph/inode.c | 5 ++-
> fs/ceph/super.h | 5 +--
> fs/ceph/xattr.c | 5 ++-
> 5 files changed, 15 insertions(+), 113 deletions(-)
>
> diff --git a/fs/ceph/acl.c b/fs/ceph/acl.c
> index 64fddbc..66d377a 100644
> --- a/fs/ceph/acl.c
> +++ b/fs/ceph/acl.c
> @@ -107,14 +107,14 @@ struct posix_acl *ceph_get_acl(struct inode *inode, int type)
> return acl;
> }
>
> -static int ceph_set_acl(struct dentry *dentry, struct inode *inode,
> - struct posix_acl *acl, int type)
> +int ceph_set_acl(struct inode *inode, struct posix_acl *acl, int type)
> {
> int ret = 0, size = 0;
> const char *name = NULL;
> char *value = NULL;
> struct iattr newattrs;
> umode_t new_mode = inode->i_mode, old_mode = inode->i_mode;
> + struct dentry *dentry = d_find_alias(inode);
>
> if (acl) {
> ret = posix_acl_valid(acl);
> @@ -208,16 +208,15 @@ int ceph_init_acl(struct dentry *dentry, struct inode *inode, struct inode *dir)
>
> if (IS_POSIXACL(dir) && acl) {
> if (S_ISDIR(inode->i_mode)) {
> - ret = ceph_set_acl(dentry, inode, acl,
> - ACL_TYPE_DEFAULT);
> + ret = ceph_set_acl(inode, acl, ACL_TYPE_DEFAULT);
> if (ret)
> goto out_release;
> }
> - ret = posix_acl_create(&acl, GFP_NOFS, &inode->i_mode);
> + ret = __posix_acl_create(&acl, GFP_NOFS, &inode->i_mode);
> if (ret < 0)
> goto out;
> else if (ret > 0)
> - ret = ceph_set_acl(dentry, inode, acl, ACL_TYPE_ACCESS);
> + ret = ceph_set_acl(inode, acl, ACL_TYPE_ACCESS);
> else
> cache_no_acl(inode);
> } else {
> @@ -229,104 +228,3 @@ out_release:
> out:
> return ret;
> }
> -
> -int ceph_acl_chmod(struct dentry *dentry, struct inode *inode)
> -{
> - struct posix_acl *acl;
> - int ret = 0;
> -
> - if (S_ISLNK(inode->i_mode)) {
> - ret = -EOPNOTSUPP;
> - goto out;
> - }
> -
> - if (!IS_POSIXACL(inode))
> - goto out;
> -
> - acl = ceph_get_acl(inode, ACL_TYPE_ACCESS);
> - if (IS_ERR_OR_NULL(acl)) {
> - ret = PTR_ERR(acl);
> - goto out;
> - }
> -
> - ret = posix_acl_chmod(&acl, GFP_KERNEL, inode->i_mode);
> - if (ret)
> - goto out;
> - ret = ceph_set_acl(dentry, inode, acl, ACL_TYPE_ACCESS);
> - posix_acl_release(acl);
> -out:
> - return ret;
> -}
> -
> -static int ceph_xattr_acl_get(struct dentry *dentry, const char *name,
> - void *value, size_t size, int type)
> -{
> - struct posix_acl *acl;
> - int ret = 0;
> -
> - if (!IS_POSIXACL(dentry->d_inode))
> - return -EOPNOTSUPP;
> -
> - acl = ceph_get_acl(dentry->d_inode, type);
> - if (IS_ERR(acl))
> - return PTR_ERR(acl);
> - if (acl == NULL)
> - return -ENODATA;
> -
> - ret = posix_acl_to_xattr(&init_user_ns, acl, value, size);
> - posix_acl_release(acl);
> -
> - return ret;
> -}
> -
> -static int ceph_xattr_acl_set(struct dentry *dentry, const char *name,
> - const void *value, size_t size, int flags, int type)
> -{
> - int ret = 0;
> - struct posix_acl *acl = NULL;
> -
> - if (!inode_owner_or_capable(dentry->d_inode)) {
> - ret = -EPERM;
> - goto out;
> - }
> -
> - if (!IS_POSIXACL(dentry->d_inode)) {
> - ret = -EOPNOTSUPP;
> - goto out;
> - }
> -
> - if (value) {
> - acl = posix_acl_from_xattr(&init_user_ns, value, size);
> - if (IS_ERR(acl)) {
> - ret = PTR_ERR(acl);
> - goto out;
> - }
> -
> - if (acl) {
> - ret = posix_acl_valid(acl);
> - if (ret)
> - goto out_release;
> - }
> - }
> -
> - ret = ceph_set_acl(dentry, dentry->d_inode, acl, type);
> -
> -out_release:
> - posix_acl_release(acl);
> -out:
> - return ret;
> -}
> -
> -const struct xattr_handler ceph_xattr_acl_default_handler = {
> - .prefix = POSIX_ACL_XATTR_DEFAULT,
> - .flags = ACL_TYPE_DEFAULT,
> - .get = ceph_xattr_acl_get,
> - .set = ceph_xattr_acl_set,
> -};
> -
> -const struct xattr_handler ceph_xattr_acl_access_handler = {
> - .prefix = POSIX_ACL_XATTR_ACCESS,
> - .flags = ACL_TYPE_ACCESS,
> - .get = ceph_xattr_acl_get,
> - .set = ceph_xattr_acl_set,
> -};
> diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
> index 619616d..6da4df8 100644
> --- a/fs/ceph/dir.c
> +++ b/fs/ceph/dir.c
> @@ -1303,6 +1303,7 @@ const struct inode_operations ceph_dir_iops = {
> .listxattr = ceph_listxattr,
> .removexattr = ceph_removexattr,
> .get_acl = ceph_get_acl,
> + .set_acl = ceph_set_acl,
> .mknod = ceph_mknod,
> .symlink = ceph_symlink,
> .mkdir = ceph_mkdir,
> diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
> index 6fc10a7..32d519d 100644
> --- a/fs/ceph/inode.c
> +++ b/fs/ceph/inode.c
> @@ -9,6 +9,7 @@
> #include <linux/namei.h>
> #include <linux/writeback.h>
> #include <linux/vmalloc.h>
> +#include <linux/posix_acl.h>
>
> #include "super.h"
> #include "mds_client.h"
> @@ -96,6 +97,7 @@ const struct inode_operations ceph_file_iops = {
> .listxattr = ceph_listxattr,
> .removexattr = ceph_removexattr,
> .get_acl = ceph_get_acl,
> + .set_acl = ceph_set_acl,
> };
>
>
> @@ -1615,6 +1617,7 @@ static const struct inode_operations ceph_symlink_iops = {
> .listxattr = ceph_listxattr,
> .removexattr = ceph_removexattr,
> .get_acl = ceph_get_acl,
> + .set_acl = ceph_set_acl,
> };
>
> /*
> @@ -1805,7 +1808,7 @@ int ceph_setattr(struct dentry *dentry, struct iattr *attr)
> __mark_inode_dirty(inode, inode_dirty_flags);
>
> if (ia_valid & ATTR_MODE) {
> - err = ceph_acl_chmod(dentry, inode);
> + err = posix_acl_chmod(inode, attr->ia_mode);
> if (err)
> goto out_put;
> }
> diff --git a/fs/ceph/super.h b/fs/ceph/super.h
> index c299f7d..eabf601 100644
> --- a/fs/ceph/super.h
> +++ b/fs/ceph/super.h
> @@ -718,6 +718,7 @@ extern void ceph_queue_writeback(struct inode *inode);
> extern int ceph_do_getattr(struct inode *inode, int mask);
> extern int ceph_permission(struct inode *inode, int mask);
> extern int ceph_setattr(struct dentry *dentry, struct iattr *attr);
> +extern int ceph_setattr(struct dentry *dentry, struct iattr *attr);
> extern int ceph_getattr(struct vfsmount *mnt, struct dentry *dentry,
> struct kstat *stat);
>
> @@ -736,15 +737,13 @@ extern void __init ceph_xattr_init(void);
> extern void ceph_xattr_exit(void);
>
> /* acl.c */
> -extern const struct xattr_handler ceph_xattr_acl_access_handler;
> -extern const struct xattr_handler ceph_xattr_acl_default_handler;
> extern const struct xattr_handler *ceph_xattr_handlers[];
>
> #ifdef CONFIG_CEPH_FS_POSIX_ACL
>
> struct posix_acl *ceph_get_acl(struct inode *, int);
> +int ceph_set_acl(struct inode *inode, struct posix_acl *acl, int type);
> int ceph_init_acl(struct dentry *, struct inode *, struct inode *);
> -int ceph_acl_chmod(struct dentry *, struct inode *);
> void ceph_forget_all_cached_acls(struct inode *inode);
>
> #else
Sage missed the '#define ceph_set_acl NULL' for the !CEPH_FS_POSIX_ACL
case. v2 should be in-reply-to this message.
Thanks,
Ilya
From: Sage Weil <[email protected]>
The merge of 7221fe4c2 raced with upstream changes in the generic POSIX
ACL code (2aeccbe95). Update Ceph to use the new helpers as well by
dropping the now-generic functions and setting the set_acl inode op.
Signed-off-by: Sage Weil <[email protected]>
---
v2: fixed the !CONFIG_CEPH_FS_POSIX_ACL case
fs/ceph/acl.c | 112 +++----------------------------------------------------
fs/ceph/dir.c | 1 +
fs/ceph/inode.c | 5 ++-
fs/ceph/super.h | 6 +--
fs/ceph/xattr.c | 5 ++-
5 files changed, 16 insertions(+), 113 deletions(-)
diff --git a/fs/ceph/acl.c b/fs/ceph/acl.c
index 64fddbc1d17b..66d377a12f7c 100644
--- a/fs/ceph/acl.c
+++ b/fs/ceph/acl.c
@@ -107,14 +107,14 @@ struct posix_acl *ceph_get_acl(struct inode *inode, int type)
return acl;
}
-static int ceph_set_acl(struct dentry *dentry, struct inode *inode,
- struct posix_acl *acl, int type)
+int ceph_set_acl(struct inode *inode, struct posix_acl *acl, int type)
{
int ret = 0, size = 0;
const char *name = NULL;
char *value = NULL;
struct iattr newattrs;
umode_t new_mode = inode->i_mode, old_mode = inode->i_mode;
+ struct dentry *dentry = d_find_alias(inode);
if (acl) {
ret = posix_acl_valid(acl);
@@ -208,16 +208,15 @@ int ceph_init_acl(struct dentry *dentry, struct inode *inode, struct inode *dir)
if (IS_POSIXACL(dir) && acl) {
if (S_ISDIR(inode->i_mode)) {
- ret = ceph_set_acl(dentry, inode, acl,
- ACL_TYPE_DEFAULT);
+ ret = ceph_set_acl(inode, acl, ACL_TYPE_DEFAULT);
if (ret)
goto out_release;
}
- ret = posix_acl_create(&acl, GFP_NOFS, &inode->i_mode);
+ ret = __posix_acl_create(&acl, GFP_NOFS, &inode->i_mode);
if (ret < 0)
goto out;
else if (ret > 0)
- ret = ceph_set_acl(dentry, inode, acl, ACL_TYPE_ACCESS);
+ ret = ceph_set_acl(inode, acl, ACL_TYPE_ACCESS);
else
cache_no_acl(inode);
} else {
@@ -229,104 +228,3 @@ out_release:
out:
return ret;
}
-
-int ceph_acl_chmod(struct dentry *dentry, struct inode *inode)
-{
- struct posix_acl *acl;
- int ret = 0;
-
- if (S_ISLNK(inode->i_mode)) {
- ret = -EOPNOTSUPP;
- goto out;
- }
-
- if (!IS_POSIXACL(inode))
- goto out;
-
- acl = ceph_get_acl(inode, ACL_TYPE_ACCESS);
- if (IS_ERR_OR_NULL(acl)) {
- ret = PTR_ERR(acl);
- goto out;
- }
-
- ret = posix_acl_chmod(&acl, GFP_KERNEL, inode->i_mode);
- if (ret)
- goto out;
- ret = ceph_set_acl(dentry, inode, acl, ACL_TYPE_ACCESS);
- posix_acl_release(acl);
-out:
- return ret;
-}
-
-static int ceph_xattr_acl_get(struct dentry *dentry, const char *name,
- void *value, size_t size, int type)
-{
- struct posix_acl *acl;
- int ret = 0;
-
- if (!IS_POSIXACL(dentry->d_inode))
- return -EOPNOTSUPP;
-
- acl = ceph_get_acl(dentry->d_inode, type);
- if (IS_ERR(acl))
- return PTR_ERR(acl);
- if (acl == NULL)
- return -ENODATA;
-
- ret = posix_acl_to_xattr(&init_user_ns, acl, value, size);
- posix_acl_release(acl);
-
- return ret;
-}
-
-static int ceph_xattr_acl_set(struct dentry *dentry, const char *name,
- const void *value, size_t size, int flags, int type)
-{
- int ret = 0;
- struct posix_acl *acl = NULL;
-
- if (!inode_owner_or_capable(dentry->d_inode)) {
- ret = -EPERM;
- goto out;
- }
-
- if (!IS_POSIXACL(dentry->d_inode)) {
- ret = -EOPNOTSUPP;
- goto out;
- }
-
- if (value) {
- acl = posix_acl_from_xattr(&init_user_ns, value, size);
- if (IS_ERR(acl)) {
- ret = PTR_ERR(acl);
- goto out;
- }
-
- if (acl) {
- ret = posix_acl_valid(acl);
- if (ret)
- goto out_release;
- }
- }
-
- ret = ceph_set_acl(dentry, dentry->d_inode, acl, type);
-
-out_release:
- posix_acl_release(acl);
-out:
- return ret;
-}
-
-const struct xattr_handler ceph_xattr_acl_default_handler = {
- .prefix = POSIX_ACL_XATTR_DEFAULT,
- .flags = ACL_TYPE_DEFAULT,
- .get = ceph_xattr_acl_get,
- .set = ceph_xattr_acl_set,
-};
-
-const struct xattr_handler ceph_xattr_acl_access_handler = {
- .prefix = POSIX_ACL_XATTR_ACCESS,
- .flags = ACL_TYPE_ACCESS,
- .get = ceph_xattr_acl_get,
- .set = ceph_xattr_acl_set,
-};
diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
index 619616d585b0..6da4df84ba30 100644
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -1303,6 +1303,7 @@ const struct inode_operations ceph_dir_iops = {
.listxattr = ceph_listxattr,
.removexattr = ceph_removexattr,
.get_acl = ceph_get_acl,
+ .set_acl = ceph_set_acl,
.mknod = ceph_mknod,
.symlink = ceph_symlink,
.mkdir = ceph_mkdir,
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 6fc10a7d7c59..32d519d8a2e2 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -9,6 +9,7 @@
#include <linux/namei.h>
#include <linux/writeback.h>
#include <linux/vmalloc.h>
+#include <linux/posix_acl.h>
#include "super.h"
#include "mds_client.h"
@@ -96,6 +97,7 @@ const struct inode_operations ceph_file_iops = {
.listxattr = ceph_listxattr,
.removexattr = ceph_removexattr,
.get_acl = ceph_get_acl,
+ .set_acl = ceph_set_acl,
};
@@ -1615,6 +1617,7 @@ static const struct inode_operations ceph_symlink_iops = {
.listxattr = ceph_listxattr,
.removexattr = ceph_removexattr,
.get_acl = ceph_get_acl,
+ .set_acl = ceph_set_acl,
};
/*
@@ -1805,7 +1808,7 @@ int ceph_setattr(struct dentry *dentry, struct iattr *attr)
__mark_inode_dirty(inode, inode_dirty_flags);
if (ia_valid & ATTR_MODE) {
- err = ceph_acl_chmod(dentry, inode);
+ err = posix_acl_chmod(inode, attr->ia_mode);
if (err)
goto out_put;
}
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index c299f7d19bf3..aa260590f615 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -718,6 +718,7 @@ extern void ceph_queue_writeback(struct inode *inode);
extern int ceph_do_getattr(struct inode *inode, int mask);
extern int ceph_permission(struct inode *inode, int mask);
extern int ceph_setattr(struct dentry *dentry, struct iattr *attr);
+extern int ceph_setattr(struct dentry *dentry, struct iattr *attr);
extern int ceph_getattr(struct vfsmount *mnt, struct dentry *dentry,
struct kstat *stat);
@@ -736,20 +737,19 @@ extern void __init ceph_xattr_init(void);
extern void ceph_xattr_exit(void);
/* acl.c */
-extern const struct xattr_handler ceph_xattr_acl_access_handler;
-extern const struct xattr_handler ceph_xattr_acl_default_handler;
extern const struct xattr_handler *ceph_xattr_handlers[];
#ifdef CONFIG_CEPH_FS_POSIX_ACL
struct posix_acl *ceph_get_acl(struct inode *, int);
+int ceph_set_acl(struct inode *inode, struct posix_acl *acl, int type);
int ceph_init_acl(struct dentry *, struct inode *, struct inode *);
-int ceph_acl_chmod(struct dentry *, struct inode *);
void ceph_forget_all_cached_acls(struct inode *inode);
#else
#define ceph_get_acl NULL
+#define ceph_set_acl NULL
static inline int ceph_init_acl(struct dentry *dentry, struct inode *inode,
struct inode *dir)
diff --git a/fs/ceph/xattr.c b/fs/ceph/xattr.c
index c7581f3733c1..898b6565ad3e 100644
--- a/fs/ceph/xattr.c
+++ b/fs/ceph/xattr.c
@@ -6,6 +6,7 @@
#include <linux/ceph/decode.h>
#include <linux/xattr.h>
+#include <linux/posix_acl_xattr.h>
#include <linux/slab.h>
#define XATTR_CEPH_PREFIX "ceph."
@@ -17,8 +18,8 @@
*/
const struct xattr_handler *ceph_xattr_handlers[] = {
#ifdef CONFIG_CEPH_FS_POSIX_ACL
- &ceph_xattr_acl_access_handler,
- &ceph_xattr_acl_default_handler,
+ &posix_acl_access_xattr_handler,
+ &posix_acl_default_xattr_handler,
#endif
NULL,
};
--
1.7.10.4
On Wed, Jan 29, 2014 at 8:37 AM, Ilya Dryomov <[email protected]> wrote:
> From: Sage Weil <[email protected]>
>
> The merge of 7221fe4c2 raced with upstream changes in the generic POSIX
> ACL code (2aeccbe95). Update Ceph to use the new helpers as well by
> dropping the now-generic functions and setting the set_acl inode op.
>
> Signed-off-by: Sage Weil <[email protected]>
Ok, I already committed my untested (and broken) patch yesterday,
because even a broken tree is better than a non-*compiling* tree for
testing.
I created a diff from my current tree to this, and will happily apply
it, but the sign-off chain is now broken (Ilya didn't sign off on his
changes to Sage's patch. Not that it really matters for what is really
a one-liner change, but I thought I'd mention it, since another issue
came up with this patch..
Ceph now does
struct dentry *dentry = d_find_alias(inode);
in its ceph_set_acl() function, and that's because the VFS layer
doesn't pass down the dentry to the acl code any more.
Christoph - that sounds like a misfeature in the new interfaces. I
realize that for traditional unix filesystems, the path is irrelevant
for an inode operation, but the thing is, from a Linux VFS standpoint
and a conceptual standpoint, the dentry really is the more important
and unique one, and some filesystems want the dentry because they are
*correctly* designed to be about pathnames.
Network filesystems that are pass file descriptors around (ie NFS etc)
are not the "RightWay(tm)" of doing things, so I think that it is
quite reasonable for ceph to want the *path* to the inode and not just
the inode.
So attached is the incremental diff of the patch by Sage and Ilya, and
I'll apply it (delayed a bit to see if I can get the sign-off from
Ilya), but I also think we should fix the (non-cached) ACL functions
that call down to the filesystem layer to also get the dentry.
We already do that for the xattr functions, just not for
get_acl()/set_acl()/acl_chmod().
Christoph, Al? Yes, most filesystems will ignore the dentry pointer, but still..
Linus
On Wed, Jan 29, 2014 at 6:37 PM, Ilya Dryomov <[email protected]> wrote:
> From: Sage Weil <[email protected]>
>
> The merge of 7221fe4c2 raced with upstream changes in the generic POSIX
> ACL code (2aeccbe95). Update Ceph to use the new helpers as well by
> dropping the now-generic functions and setting the set_acl inode op.
>
> Signed-off-by: Sage Weil <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>
Thanks,
Ilya
On Wed, Jan 29, 2014 at 11:09:18AM -0800, Linus Torvalds wrote:
> So attached is the incremental diff of the patch by Sage and Ilya, and
> I'll apply it (delayed a bit to see if I can get the sign-off from
> Ilya), but I also think we should fix the (non-cached) ACL functions
> that call down to the filesystem layer to also get the dentry.
For ->set_acl that's fairly easily doable and I actually had a version
doing that to be able to convert 9p. But for ->get_acl the path walking
caller didn't seem easily feasible. ->get_acl actually is an invention
of yours, so if you got a good idea to get the dentry to it I'd love
to be able to pass it.
On Wed, Jan 29, 2014 at 06:30:00AM -0800, Sage Weil wrote:
> The set_acl inode_operation wasn't getting set, and the prototype needed
> to be adjusted a bit (it doesn't take a dentry anymore). All seems to be
> well with the below patch.
Btw, there's a few minor bits that should go on top of yours:
- ->get_acl only gets called after we checked for a cached ACL, so no
need to call get_cached_acl again.
- no need to check IS_POSIXACL in ->get_acl, without that it should
never get set as all the callers that set it already have the check.
- you should be able to use the full posix_acl_create in CEPH
Untested patch below:
diff --git a/fs/ceph/acl.c b/fs/ceph/acl.c
index 66d377a..9ab312e 100644
--- a/fs/ceph/acl.c
+++ b/fs/ceph/acl.c
@@ -66,13 +66,6 @@ struct posix_acl *ceph_get_acl(struct inode *inode, int type)
char *value = NULL;
struct posix_acl *acl;
- if (!IS_POSIXACL(inode))
- return NULL;
-
- acl = ceph_get_cached_acl(inode, type);
- if (acl != ACL_NOT_CACHED)
- return acl;
-
switch (type) {
case ACL_TYPE_ACCESS:
name = POSIX_ACL_XATTR_ACCESS;
@@ -190,41 +183,24 @@ out:
int ceph_init_acl(struct dentry *dentry, struct inode *inode, struct inode *dir)
{
- struct posix_acl *acl = NULL;
- int ret = 0;
-
- if (!S_ISLNK(inode->i_mode)) {
- if (IS_POSIXACL(dir)) {
- acl = ceph_get_acl(dir, ACL_TYPE_DEFAULT);
- if (IS_ERR(acl)) {
- ret = PTR_ERR(acl);
- goto out;
- }
- }
+ struct posix_acl *default_acl, *acl;
+ int error;
- if (!acl)
- inode->i_mode &= ~current_umask();
- }
+ error = posix_acl_create(dir, &inode->i_mode, &default_acl, &acl);
+ if (error)
+ return error;
- if (IS_POSIXACL(dir) && acl) {
- if (S_ISDIR(inode->i_mode)) {
- ret = ceph_set_acl(inode, acl, ACL_TYPE_DEFAULT);
- if (ret)
- goto out_release;
- }
- ret = __posix_acl_create(&acl, GFP_NOFS, &inode->i_mode);
- if (ret < 0)
- goto out;
- else if (ret > 0)
- ret = ceph_set_acl(inode, acl, ACL_TYPE_ACCESS);
- else
- cache_no_acl(inode);
- } else {
+ if (!default_acl && !acl)
cache_no_acl(inode);
- }
-out_release:
- posix_acl_release(acl);
-out:
- return ret;
+ if (default_acl) {
+ error = ceph_set_acl(inode, default_acl, ACL_TYPE_DEFAULT);
+ posix_acl_release(default_acl);
+ }
+ if (acl) {
+ if (!error)
+ error = ceph_set_acl(inode, acl, ACL_TYPE_ACCESS);
+ posix_acl_release(acl);
+ }
+ return error;
}
On Wed, Jan 29, 2014 at 11:54 PM, Christoph Hellwig <[email protected]> wrote:
>
> For ->set_acl that's fairly easily doable and I actually had a version
> doing that to be able to convert 9p. But for ->get_acl the path walking
> caller didn't seem easily feasible. ->get_acl actually is an invention
> of yours, so if you got a good idea to get the dentry to it I'd love
> to be able to pass it.
Yeah, that's pretty annoying, largely because that path is also
RCU-walk aware, which does *not* need this all (because it will never
call down into the filesystem - if the acl isn't found in the cached
acl's, we just abort).
And we're going through that very common "generic_permission()" thing
that in turn is also often called from the low-level filesystens, and
it's all fairly tightly integrated with __inode_permission() etc.
In the end, all the original call-sites should have a dentry, and none
of this is "fundamental". But you're right, it looks like an absolute
nightmare to add the dentry pointer through the whole chain. Damn.
So I'm not thrilled about it, but maybe that "d_find_alias(inode)" to
find the dentry is good enough in practice. It feels very much
incorrect (it could find a dentry with a path that you cannot actually
access on the server, and result in user-visible errors), but I
definitely see your argument. It may just not be worth the pain for
this odd ceph case.
That said, if the ceph people decide to try to bite the bullet and do
the required conversions to pass the dentry to the permissions
functions, I think I'd take it unless it ends up being *too* horribly
messy.
Linus
On Thu, 30 Jan 2014, Linus Torvalds wrote:
> On Wed, Jan 29, 2014 at 11:54 PM, Christoph Hellwig <[email protected]> wrote:
> >
> > For ->set_acl that's fairly easily doable and I actually had a version
> > doing that to be able to convert 9p. But for ->get_acl the path walking
> > caller didn't seem easily feasible. ->get_acl actually is an invention
> > of yours, so if you got a good idea to get the dentry to it I'd love
> > to be able to pass it.
>
> Yeah, that's pretty annoying, largely because that path is also
> RCU-walk aware, which does *not* need this all (because it will never
> call down into the filesystem - if the acl isn't found in the cached
> acl's, we just abort).
>
> And we're going through that very common "generic_permission()" thing
> that in turn is also often called from the low-level filesystens, and
> it's all fairly tightly integrated with __inode_permission() etc.
>
> In the end, all the original call-sites should have a dentry, and none
> of this is "fundamental". But you're right, it looks like an absolute
> nightmare to add the dentry pointer through the whole chain. Damn.
>
> So I'm not thrilled about it, but maybe that "d_find_alias(inode)" to
> find the dentry is good enough in practice. It feels very much
> incorrect (it could find a dentry with a path that you cannot actually
> access on the server, and result in user-visible errors), but I
> definitely see your argument. It may just not be worth the pain for
> this odd ceph case.
>
> That said, if the ceph people decide to try to bite the bullet and do
> the required conversions to pass the dentry to the permissions
> functions, I think I'd take it unless it ends up being *too* horribly
> messy.
FWIW the dentry isn't useful in the get case; it's only on put that it is
currently used. And now that I look closely, it is only being used by
ceph_setattr to associate the update with the parent directory for the
purposes of fsync(dirfd)... which is, I think, incorrect anyway (that
should only flush out/wait for namespace modifications, not inode attr
updates).
So I think it's fine as is, and we'll clean this up later.
I do have a couple patches on top of what's in your tree, though, that
clean up a couple duplicated lines in your fix and apply Christoph's
cleanup:
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client.git for-linus
Thanks!
sage
On Thu, Jan 30, 2014 at 02:01:38PM -0800, Linus Torvalds wrote:
> In the end, all the original call-sites should have a dentry, and none
> of this is "fundamental". But you're right, it looks like an absolute
> nightmare to add the dentry pointer through the whole chain. Damn.
>
> So I'm not thrilled about it, but maybe that "d_find_alias(inode)" to
> find the dentry is good enough in practice. It feels very much
> incorrect (it could find a dentry with a path that you cannot actually
> access on the server, and result in user-visible errors), but I
> definitely see your argument. It may just not be worth the pain for
> this odd ceph case.
It's not just ceph. 9p fundamentally needs it and I really want to
convert 9p to the new code so that we can get rid of the lower level
interfaces entirely and eventually move ACL dispatching entirely
into the VFS. The same d_find_alias hack should work for 9p as well,
although spreading this even more gets uglier and uglier. Similarly
for CIFS which pretends to understand the Posix ACL xattrs, but doesn't
use any of the infrastructure as it seems to rely on server side
enforcement.
On Mon, Feb 03, 2014 at 02:29:43AM -0800, Christoph Hellwig wrote:
> On Thu, Jan 30, 2014 at 02:01:38PM -0800, Linus Torvalds wrote:
> > In the end, all the original call-sites should have a dentry, and none
> > of this is "fundamental". But you're right, it looks like an absolute
> > nightmare to add the dentry pointer through the whole chain. Damn.
> >
> > So I'm not thrilled about it, but maybe that "d_find_alias(inode)" to
> > find the dentry is good enough in practice. It feels very much
> > incorrect (it could find a dentry with a path that you cannot actually
> > access on the server, and result in user-visible errors), but I
> > definitely see your argument. It may just not be worth the pain for
> > this odd ceph case.
>
> It's not just ceph. 9p fundamentally needs it and I really want to
> convert 9p to the new code so that we can get rid of the lower level
> interfaces entirely and eventually move ACL dispatching entirely
> into the VFS. The same d_find_alias hack should work for 9p as well,
> although spreading this even more gets uglier and uglier. Similarly
> for CIFS which pretends to understand the Posix ACL xattrs, but doesn't
> use any of the infrastructure as it seems to rely on server side
> enforcement.
9P is going to be fun to deal with; that's why I've ended up abandoning
vfs.git#experimental-xattr last year. We probably want to move FIDs
from dentries to inodes there, and rely in ->getxattr() et.al. upon
having already done ->d_revalidate() on some dentry for that inode.
Another pile of fun is fsnotify_xattr() call in __vfs_setxattr_noperm()
and the whole misbegotten IMA/EVM mess ;-/
See #experimental-xattr - a lot of stuff in that direction is sitting there;
might turn out to be useful.
On Mon, Feb 3, 2014 at 2:29 AM, Christoph Hellwig <[email protected]> wrote:
> On Thu, Jan 30, 2014 at 02:01:38PM -0800, Linus Torvalds wrote:
>> In the end, all the original call-sites should have a dentry, and none
>> of this is "fundamental". But you're right, it looks like an absolute
>> nightmare to add the dentry pointer through the whole chain. Damn.
>
> It's not just ceph. 9p fundamentally needs it and I really want to
> convert 9p to the new code so that we can get rid of the lower level
> interfaces entirely and eventually move ACL dispatching entirely
> into the VFS.
Hmm. I spent some time actually seeing how painful it would be, and I
think we can do it.
The attached patch seems to work for me - it does *not* make any
actual semantic changes, but it pushes the dentry pointer into the
filesystems for the permission checking, and the
vfs_{create,mkdir,mknot,symlink,link,rmdir,unlink,rename} helper
functions.
NOTE! It does *not* actually push them into the ACL functions, and in
fact it doesn't even push it down to "generic_permission()" yet. But
it would be a prerequisite for doing so.
And interestingly, replacing the inode pointer with a dentry pointer
ended up actually simplifying several of the call-sites, so while
doing this patch I got the feeling that this was the better interface
anyway, and we should have done this long ago.
Now, to be honest, pushing it down one more level (to
generic_permission()) will actually start causing some trouble. In
particular, gfs2_permission() fundamentally does not have a dentry for
several of the callers.
But aside from being pretty large, this patch looks pretty good to me.
And as I mentioned, I suspect it would actually be the right thing to
do even if we don't go any futher. And modulo gfs2, it would make it
trivial to push down the dentry pointer to generic_permission and then
to the acl lookup code.
What do you think? I guess this patch could be split up into two: one
that does the "vfs_xyz()" helper functions, and another that does the
inode_permission() change. I tied them together mainly because I
started with the inode_permission() change, and that required the
vfs_xyz() change.
Linus
On Mon, Feb 03, 2014 at 01:03:32PM -0800, Linus Torvalds wrote:
> What do you think? I guess this patch could be split up into two: one
> that does the "vfs_xyz()" helper functions, and another that does the
> inode_permission() change. I tied them together mainly because I
> started with the inode_permission() change, and that required the
> vfs_xyz() change.
Please, don't do that. Half of that is pointless (e.g. what you are
doing with vfs_rmdir() - if anything, we could get rid of the first
argument completely, it's always victim->d_parent->d_inode and we are
holding enough locks for that to be stable) and ->permission() is
just plain wrong.
Result *is* a function of inode alone; the problem with 9P is that we
are caching FIDs in the wrong place.
What really happens is that protocol refers to objects by 32bit tokens,
given by server out to client. Many of those can correspond to the
same file; think of those as file descriptors. We are associating them
with dentries, but if you have several links to the same file a FID
acquired for either of them will do.
What we ought to do, AFAICS, is moving these guys from dentry to inode;
sure, to *get* one in the first place we need some dentry. But by the
time ->setxattr() and friends get to see the inode, the caller has already
done things that make sure that some FID of his exists.
IOW, NAK.
On Mon, Feb 03, 2014 at 01:03:32PM -0800, Linus Torvalds wrote:
> Now, to be honest, pushing it down one more level (to
> generic_permission()) will actually start causing some trouble. In
> particular, gfs2_permission() fundamentally does not have a dentry for
> several of the callers.
Looking over the gfs2 code the problem seems to be that it duplicates
permissions checks from the may_{lookup,create,linkat,delete}, most
likely because it needs cluster locking in place for them. The right
fix seems to be to optionally call the filesystem from those. That
being said I wonder how ocfs2 or network filesystems get away without
that.
> What do you think? I guess this patch could be split up into two: one
> that does the "vfs_xyz()" helper functions, and another that does the
> inode_permission() change. I tied them together mainly because I
> started with the inode_permission() change, and that required the
> vfs_xyz() change.
The changes look good to me, and yes I think they should be split.
I'll see if I can take this further, but doing something non-hacky
in GFS2 would be the first step here.
On Mon, Feb 03, 2014 at 09:19:55PM +0000, Al Viro wrote:
> Result *is* a function of inode alone; the problem with 9P is that we
> are caching FIDs in the wrong place.
I don't think that's true for CIFS unfortunately, which is path based.
On Mon, Feb 3, 2014 at 1:19 PM, Al Viro <[email protected]> wrote:
>
> Please, don't do that. Half of that is pointless (e.g. what you are
> doing with vfs_rmdir() - if anything, we could get rid of the first
> argument completely, it's always victim->d_parent->d_inode and we are
> holding enough locks for that to be stable) and ->permission() is
> just plain wrong.
>
> Result *is* a function of inode alone; the problem with 9P is that we
> are caching FIDs in the wrong place.
No, Al. It's *not* just a function of the inode alone. That's the point.
Some network filesystems pass the *path* to the server. Any operation
that needs to check something on the server needs the *dentry*, not
the inode.
This whole "the inode describes the file" mentality comes from old
broken UNIX semantics. It's fundamentally true for local Unix
filesystems, but that's it. It's not true in general.
Sure, many network filesystems then emulate the local Unix filesystem
behavior, so in practice, you get the unix semantics quite often. But
it really is wrong.
If the protocol is path-based (and it happens, and it's actually the
*correct* thing to do for a network filesystem, rather than the
idiotic "file handle" crap that tries to emulate the unix inode
semantics in the protocol), then the inode is simply not sufficient.
And no, d_find_alias() is not correct or sufficient either. It can
work in practice (and probably does perfectly fine 99.9% of the time),
but it can equally well give the *wrong* dentry: yes, the dentry it
returns would have been a valid dentry for somebody at some time, but
it might be a stale dentry *now*, and it might be the wrong dentry for
the current user (because the current user may not have permissions to
that particular path, even if the user has permissions through his
*own* path).
So I really think you're *fundamentally* incorrect when you say
"result *is* a function of inode alone".
Linus
On Mon, Feb 03, 2014 at 01:24:47PM -0800, Christoph Hellwig wrote:
> On Mon, Feb 03, 2014 at 09:19:55PM +0000, Al Viro wrote:
> > Result *is* a function of inode alone; the problem with 9P is that we
> > are caching FIDs in the wrong place.
>
> I don't think that's true for CIFS unfortunately, which is path based.
Yes, and...? CIFS also doesn't have hardlinks, so _there_ d_find_alias()
is just fine.
9P is actually trickier; there we need some massage, but for CIFS it's
literally a matter of one function call.
The problem with 9P is that the way we do it right now, you might pick
the wrong dentry. _Some_ alias already bears the FID you are after,
but if you end up picking the one that doesn't, you might very well
end up being unable to obtain one either.
On Mon, Feb 03, 2014 at 09:31:53PM +0000, Al Viro wrote:
> Yes, and...? CIFS also doesn't have hardlinks, so _there_ d_find_alias()
> is just fine.
It does have hardlinks, look at cifs_hardlink and functions called from
it.
On Mon, Feb 3, 2014 at 1:31 PM, Al Viro <[email protected]> wrote:
>
> Yes, and...? CIFS also doesn't have hardlinks, so _there_ d_find_alias()
> is just fine.
Hmm? I'm pretty sure cifs can actually have hardlinks.
Linus
On Mon, Feb 03, 2014 at 01:31:19PM -0800, Linus Torvalds wrote:
> If the protocol is path-based (and it happens, and it's actually the
> *correct* thing to do for a network filesystem, rather than the
> idiotic "file handle" crap that tries to emulate the unix inode
> semantics in the protocol), then the inode is simply not sufficient.
>
> And no, d_find_alias() is not correct or sufficient either. It can
> work in practice (and probably does perfectly fine 99.9% of the time),
> but it can equally well give the *wrong* dentry: yes, the dentry it
> returns would have been a valid dentry for somebody at some time, but
> it might be a stale dentry *now*, and it might be the wrong dentry for
> the current user (because the current user may not have permissions to
> that particular path, even if the user has permissions through his
> *own* path).
>
> So I really think you're *fundamentally* incorrect when you say
> "result *is* a function of inode alone".
Which fs are you talking about? For 9P it *is* a function of inode alone.
For CIFS there's no wrong dentry to pick - it doesn't have links to start
with.
If we really have hardlinks, the result of permission check would better
be a function of inode itself - as in, "if it gives different results
for two pathnames reachable for the same user, we have a bug".
On Mon, Feb 03, 2014 at 01:37:03PM -0800, Linus Torvalds wrote:
> On Mon, Feb 3, 2014 at 1:31 PM, Al Viro <[email protected]> wrote:
> >
> > Yes, and...? CIFS also doesn't have hardlinks, so _there_ d_find_alias()
> > is just fine.
>
> Hmm? I'm pretty sure cifs can actually have hardlinks.
*UGH*
How the hell does it... Oh, right - samba on Unix server. I really wonder
how well do Windows clients deal with those...
Crap... I reall hate that prototype change ;-/
On Mon, Feb 03, 2014 at 09:39:26PM +0000, Al Viro wrote:
> Which fs are you talking about? For 9P it *is* a function of inode alone.
> For CIFS there's no wrong dentry to pick - it doesn't have links to start
> with.
Except that apparently it does ;-/ Bugger...
On Mon, Feb 3, 2014 at 1:39 PM, Al Viro <[email protected]> wrote:
>
> If we really have hardlinks, the result of permission check would better
> be a function of inode itself - as in, "if it gives different results
> for two pathnames reachable for the same user, we have a bug".
No. You're wrong.
EVEN ON A UNIX FILESYSTEM THE PATH IS MEANINGFUL.
Do this: create a hardlink in two different directories. Make the
*directory* permissions for one of the directories be something you
cannot traverse. Now try to check the permissions of the *same* inode
through those two paths. Notice how you get *different* results.
Really.
Now, imagine that you are doing the same thing over a network. On the
server, there may be a single inode for the file, but when the client
gives the server a pathname, the two pathnames to that single inode
ARE NOT EQUIVALENT.
And the fact is, filesystems with hardlinks and path-name-based
operations do exist. cifs with the unix extensions is one of them.
Al, face it, you're wrong this time.
Linus
On Mon, Feb 03, 2014 at 01:03:32PM -0800, Linus Torvalds wrote:
> - err = vfs_mkdir(path.dentry->d_inode, dentry, mode);
> + err = vfs_mkdir(path.dentry, dentry, mode);
Pointless - path.dentry == dentry->d_parent anyway.
> - err = vfs_mknod(path.dentry->d_inode, dentry, mode, dev->devt);
> + err = vfs_mknod(path.dentry, dentry, mode, dev->devt);
Ditto.
> @@ -237,7 +237,7 @@ static int dev_rmdir(const char *name)
> return PTR_ERR(dentry);
> if (dentry->d_inode) {
> if (dentry->d_inode->i_private == &thread)
> - err = vfs_rmdir(parent.dentry->d_inode, dentry);
> + err = vfs_rmdir(parent.dentry, dentry);
Ditto, with s/path/parent/
> @@ -324,7 +324,7 @@ static int handle_remove(const char *nodename, struct device *dev)
> mutex_lock(&dentry->d_inode->i_mutex);
> notify_change(dentry, &newattrs, NULL);
> mutex_unlock(&dentry->d_inode->i_mutex);
> - err = vfs_unlink(parent.dentry->d_inode, dentry, NULL);
> + err = vfs_unlink(parent.dentry, dentry, NULL);
> if (!err || err == -ENOENT)
> deleted = 1;
> }
And here as well.
> +++ b/drivers/staging/lustre/lustre/lvfs/lvfs_linux.c
> @@ -222,8 +222,8 @@ int lustre_rename(struct dentry *dir, struct vfsmount *mnt,
> if (IS_ERR(dchild_new))
> GOTO(put_old, err = PTR_ERR(dchild_new));
>
> - err = ll_vfs_rename(dir->d_inode, dchild_old, mnt,
> - dir->d_inode, dchild_new, mnt, NULL);
> + err = ll_vfs_rename(dir, dchild_old, mnt,
> + dir, dchild_new, mnt, NULL);
... and again, that's completely pointless.
> -int afs_permission(struct inode *inode, int mask)
> +int afs_permission(struct dentry *dentry, struct inode *inode, int mask)
Oh, _lovely_. So not only do we pass dentry, the arguments are redundant
as well.
> -static inline int btrfs_may_create(struct inode *dir, struct dentry *child)
> +static inline int btrfs_may_create(struct dentry *parent, struct dentry *child)
I'm fairly sure that it's also pointless, because parent is going to be, well,
the parent. Of child.
> +static int gfs2_vfs_permission(struct dentry *dentry, struct inode *inode, int mask)
> +{
> + return gfs2_permission(inode, mask);
> +}
Er... You do realize that callers of gfs2_permission() tend to have
the dentry in question, either directly or as ->d_parent of something
they have?
I really hate the whole thing... ;-/
On Mon, Feb 3, 2014 at 1:03 PM, Linus Torvalds
<[email protected]> wrote:
>
> What do you think? I guess this patch could be split up into two: one
> that does the "vfs_xyz()" helper functions, and another that does the
> inode_permission() change. I tied them together mainly because I
> started with the inode_permission() change, and that required the
> vfs_xyz() change.
Ok, this is the split-up. I haven't completed the "make allmodconfig"
for the first patch yet, so I might have split this wrong, but it was
fairly well separated and I'm pretty sure that this is fine.
I do actually agree that the second patch isn't exactly pretty.
Passing in both dentry and inode is redundant, and calling the
function "inode_permission()" is now a misnomer. But it makes it a lot
easier to see the differences this way in the diff (particularly
word-diff), so I think a renaming and/or dropping the inode parameter
would better be done as a separate patch anyway.
And no, as far as the first patch is concerned, I certainly wouldn't
hate dropping the first 'parent' argument from the vfs_xyzzy()
functions either, since that *should* be redundant, but quite frankly,
I didn't want to think too much about that, and this was the simple
conversions (and all callers were quite happy with dropping the inode
and replacing it with the dentry).
Linus
On Mon, Feb 3, 2014 at 1:59 PM, Al Viro <[email protected]> wrote:
> On Mon, Feb 03, 2014 at 01:03:32PM -0800, Linus Torvalds wrote:
>
>> - err = vfs_mkdir(path.dentry->d_inode, dentry, mode);
>> + err = vfs_mkdir(path.dentry, dentry, mode);
>
> Pointless - path.dentry == dentry->d_parent anyway.
Heh. It's no less redundant than it used to be.
But if you want to clean up the vfs_xyzzy() ones further, I'm
perfectly fine with that.
>> - err = ll_vfs_rename(dir->d_inode, dchild_old, mnt,
>> - dir->d_inode, dchild_new, mnt, NULL);
>> + err = ll_vfs_rename(dir, dchild_old, mnt,
>> + dir, dchild_new, mnt, NULL);
>
>
> ... and again, that's completely pointless.
Minimal patch.. I really didn't want to check what the heck lustre
does with the insane ll_vfs thing.
>> -int afs_permission(struct inode *inode, int mask)
>> +int afs_permission(struct dentry *dentry, struct inode *inode, int mask)
>
> Oh, _lovely_. So not only do we pass dentry, the arguments are redundant
> as well.
Note that *not* passing in inode would make the patch much bigger,
because now every filesystem would have to add the
struct inode *inode = dentry->d_inode;
at the top.
Also, I'm not actually convinced it is redundant at all. Remember the
RCU lookup case? dentry->d_inode is not safe.
The RCU case actually does
inode_permission(nd->path.dentry, nd->inode, ..)
and here the difference between nd->inode and dentry->d_inode are
relevant, I think.
>> +static int gfs2_vfs_permission(struct dentry *dentry, struct inode *inode, int mask)
>> +{
>> + return gfs2_permission(inode, mask);
>> +}
>
> Er... You do realize that callers of gfs2_permission() tend to have
> the dentry in question, either directly or as ->d_parent of something
> they have?
Not true. Look closer.
Look at gfs2_lookupi() in particular, and check how it is called.
I did hate that part of the patch, and I did mention the kinds of
problems this will cause if the next phase passes in dentry to
"generic_permission()".
Linus
On Mon, Feb 03, 2014 at 01:44:22PM -0800, Linus Torvalds wrote:
> On Mon, Feb 3, 2014 at 1:39 PM, Al Viro <[email protected]> wrote:
> >
> > If we really have hardlinks, the result of permission check would better
> > be a function of inode itself - as in, "if it gives different results
> > for two pathnames reachable for the same user, we have a bug".
^^^^^^^^^
> No. You're wrong.
>
> EVEN ON A UNIX FILESYSTEM THE PATH IS MEANINGFUL.
>
> Do this: create a hardlink in two different directories. Make the
> *directory* permissions for one of the directories be something you
> cannot traverse. Now try to check the permissions of the *same* inode
> through those two paths. Notice how you get *different* results.
>
> Really.
Yes. In one case we won't get to looking at the permissions of that thing
at all.
> Now, imagine that you are doing the same thing over a network. On the
> server, there may be a single inode for the file, but when the client
> gives the server a pathname, the two pathnames to that single inode
> ARE NOT EQUIVALENT.
Why do we pretend that those are links, in the first place?
> And the fact is, filesystems with hardlinks and path-name-based
> operations do exist. cifs with the unix extensions is one of them.
Pox on Tridge...
I really, really hate that change; I can buy "->getxattr() has inconvenient
interface because of lousy protocol design", but spreading the same to
->permission(), with everything that will fall out of that... <shudder>
On Mon, Feb 03, 2014 at 02:12:00PM -0800, Linus Torvalds wrote:
> >> -int afs_permission(struct inode *inode, int mask)
> >> +int afs_permission(struct dentry *dentry, struct inode *inode, int mask)
> >
> > Oh, _lovely_. So not only do we pass dentry, the arguments are redundant
> > as well.
>
> Note that *not* passing in inode would make the patch much bigger,
> because now every filesystem would have to add the
>
> struct inode *inode = dentry->d_inode;
>
> at the top.
>
> Also, I'm not actually convinced it is redundant at all. Remember the
> RCU lookup case? dentry->d_inode is not safe.
Umm... Point, but that actually means that we get an extra pitfall for
filesystem writers here. foo_permission() passes dentry (now that it
has one) to foo_wank_a_lot(), with the latter using dentry->d_inode at
some point...
> >> +static int gfs2_vfs_permission(struct dentry *dentry, struct inode *inode, int mask)
> >> +{
> >> + return gfs2_permission(inode, mask);
> >> +}
> >
> > Er... You do realize that callers of gfs2_permission() tend to have
> > the dentry in question, either directly or as ->d_parent of something
> > they have?
>
> Not true. Look closer.
>
> Look at gfs2_lookupi() in particular, and check how it is called.
Yeowch... gfs2_ok_to_move() is particulary nasty... WTF do we need
it for and why is it not racy as hell?
On Mon, Feb 3, 2014 at 2:40 PM, Al Viro <[email protected]> wrote:
>
> Umm... Point, but that actually means that we get an extra pitfall for
> filesystem writers here. foo_permission() passes dentry (now that it
> has one) to foo_wank_a_lot(), with the latter using dentry->d_inode at
> some point...
I agree.
The good news, though, is that in the RCU lookup case, we have that
MAY_NOT_BLOCK thing, and most filesystems will have errored out for
any complex operations.
RCU lookup is special, and complicated, and sadly, the permissions
checking is very much part of that. But for the really complex cases,
at least we can punt.
>> Look at gfs2_lookupi() in particular, and check how it is called.
>
> Yeowch... gfs2_ok_to_move() is particulary nasty... WTF do we need
> it for and why is it not racy as hell?
I don't know. And I suspect that for things like the journal index
file lookup (which is actually worse - see gfs2_jindex_hold()) we
don't really about the permissions, since this is just done at
init_journal() time.
So I think all of this is quite solvable for gfs2, it just wasn't the
obvious kind of "we already have the dentry" case that every single
other case I looked at was.
Linus
On Mon, 2014-02-03 at 22:40 +0000, Al Viro wrote:
> > >> +static int gfs2_vfs_permission(struct dentry *dentry, struct inode *inode, int mask)
> > >> +{
> > >> + return gfs2_permission(inode, mask);
> > >> +}
> > >
> > > Er... You do realize that callers of gfs2_permission() tend to have
> > > the dentry in question, either directly or as ->d_parent of something
> > > they have?
> >
> > Not true. Look closer.
> >
> > Look at gfs2_lookupi() in particular, and check how it is called.
>
> Yeowch... gfs2_ok_to_move() is particulary nasty... WTF do we need
> it for and why is it not racy as hell?
Well, the original intent was to prevent us from moving a directory into
one of its subdirectories, and thus creating a loop. It is only called
when the rename is moving a directory, and when the parent directories
(source and destination) are different.
I know there is a problem there, recently reported by Bruce Fields which
he came across when looking at the d_splice_alias issue. The bug that
Bruce found only shows up in the clustered case, and not in the single
node case though.
Which particular race are you worried about? This check is covered by
the rename glock, which is basically a cluster wide version of the vfs
level ->s_vfs_rename_mutex.
To diverge from that topic for a moment, this thread has also brought
together some discussion on another issue which I've been pondering
recently.... that of whether the inode operations for get/set_xattr
should take a dentry or not. I had thought that we'd come to the
conclusion that 9p made it impossible to swap the current dentry
argument for an inode, and I was about to send a patch for selinux
support on clustered fs on that basis. However the discussion in this
thread has made me wonder whether that really is the case or not.... Al,
can you confirm whether your xattr-experimental patches are still under
active consideration?
The other question that I have relating to that side of things, is why
security_inode_permission() is called from __inode_permission() rather
than from generic_permission() ? Maybe there is a good reason, but I
can't immediately see what it is at the moment.
In response to the question elsewhere about GFS2 calling
gfs2_permission() after the vfs has already done its checks, that is
indeed down to needing to ensure that we have the cluster locks when
this check is called. More importantly to know that things haven't
changed since the VFS called the same function in case we've raced with
another node changing the permissions, for example. There are a number
of cases where we redo vfs level checks for this reason,
Steve.
On Tue, Feb 04, 2014 at 11:33:35AM +0000, Steven Whitehouse wrote:
> To diverge from that topic for a moment, this thread has also brought
> together some discussion on another issue which I've been pondering
> recently.... that of whether the inode operations for get/set_xattr
> should take a dentry or not. I had thought that we'd come to the
> conclusion that 9p made it impossible to swap the current dentry
> argument for an inode, and I was about to send a patch for selinux
> support on clustered fs on that basis. However the discussion in this
> thread has made me wonder whether that really is the case or not.... Al,
> can you confirm whether your xattr-experimental patches are still under
> active consideration?
My plan was to work on the 9p and cifs conversions using the
d_find_alias hack we have in ceph right now. That means the base work
could switch to passed in dentries or in case of 9p the per-inode fids
easily.
> The other question that I have relating to that side of things, is why
> security_inode_permission() is called from __inode_permission() rather
> than from generic_permission() ? Maybe there is a good reason, but I
> can't immediately see what it is at the moment.
Seems like almost everything of the security_* family is called from the
VFS instead of the filesystem. There's also some very odd other
behaviour in there, e.g. for the xattrs sets are handed to the
filesystem first, and then the xattr layer calls into the security
layer, which for reads the filesystems is never reached at all.
> In response to the question elsewhere about GFS2 calling
> gfs2_permission() after the vfs has already done its checks, that is
> indeed down to needing to ensure that we have the cluster locks when
> this check is called. More importantly to know that things haven't
> changed since the VFS called the same function in case we've raced with
> another node changing the permissions, for example. There are a number
> of cases where we redo vfs level checks for this reason,
Seems like we should be able to grab a cluster lock where we grab
i_mutex in the namespace code to avoid having to redo all these checks.
On Tue, Feb 4, 2014 at 3:33 AM, Steven Whitehouse <[email protected]> wrote:
>
> The other question that I have relating to that side of things, is why
> security_inode_permission() is called from __inode_permission() rather
> than from generic_permission() ? Maybe there is a good reason, but I
> can't immediately see what it is at the moment.
"generic_permission()" is just a helper that implements the default
UNIX permissions, and won't necessarily even be called. A filesystem
could decide not to call it at all, and in fact there are cases that
don't (eg coda or the bad_inode case).
The inode_permission() class of helpers, in contrast, is what gets
called by the VFS layer itself. So if you want to catch all permission
checks (and that would be security_inode_permission()) then you need
to catch it there.
Linus
On Mon, Feb 03, 2014 at 10:31:27PM +0000, Al Viro wrote:
>
> > And the fact is, filesystems with hardlinks and path-name-based
> > operations do exist. cifs with the unix extensions is one of them.
>
> Pox on Tridge...
Actually you have to blame me for that. Tridge always
*HATED* the UNIX extensions :-).