2023-08-03 16:09:37

by Aleksandr Mikhalitsyn

[permalink] [raw]
Subject: [PATCH v8 00/12] ceph: support idmapped mounts

Dear friends,

This patchset was originally developed by Christian Brauner but I'll continue
to push it forward. Christian allowed me to do that :)

This feature is already actively used/tested with LXD/LXC project.

Git tree (based on https://github.com/ceph/ceph-client.git testing):
v7: https://github.com/mihalicyn/linux/commits/fs.idmapped.ceph.v7
current: https://github.com/mihalicyn/linux/tree/fs.idmapped.ceph

In the version 3 I've changed only two commits:
- fs: export mnt_idmap_get/mnt_idmap_put
- ceph: allow idmapped setattr inode op
and added a new one:
- ceph: pass idmap to __ceph_setattr

In the version 4 I've reworked the ("ceph: stash idmapping in mdsc request")
commit. Now we take idmap refcounter just in place where req->r_mnt_idmap
is filled. It's more safer approach and prevents possible refcounter underflow
on error paths where __register_request wasn't called but ceph_mdsc_release_request is
called.

Changelog for version 5:
- a few commits were squashed into one (as suggested by Xiubo Li)
- started passing an idmapping everywhere (if possible), so a caller
UID/GID-s will be mapped almost everywhere (as suggested by Xiubo Li)

Changelog for version 6:
- rebased on top of testing branch
- passed an idmapping in a few places (readdir, ceph_netfs_issue_op_inline)

Changelog for version 7:
- rebased on top of testing branch
- this thing now requires a new cephfs protocol extension CEPHFS_FEATURE_HAS_OWNER_UIDGID
https://github.com/ceph/ceph/pull/52575

Changelog for version 8:
- rebased on top of testing branch
- added enable_unsafe_idmap module parameter to make idmapped mounts
work with old MDS server versions
- properly handled case when old MDS used with new kernel client

I can confirm that this version passes xfstests and
tested with old MDS (without CEPHFS_FEATURE_HAS_OWNER_UIDGID)
and with recent MDS version.

Links to previous versions:
v1: https://lore.kernel.org/all/[email protected]/
v2: https://lore.kernel.org/lkml/[email protected]/
tree: https://github.com/mihalicyn/linux/commits/fs.idmapped.ceph.v2
v3: https://lore.kernel.org/lkml/[email protected]/#t
v4: https://lore.kernel.org/lkml/[email protected]/#t
tree: https://github.com/mihalicyn/linux/commits/fs.idmapped.ceph.v4
v5: https://lore.kernel.org/lkml/[email protected]/#t
tree: https://github.com/mihalicyn/linux/commits/fs.idmapped.ceph.v5
v6: https://lore.kernel.org/lkml/[email protected]/
tree: https://github.com/mihalicyn/linux/commits/fs.idmapped.ceph.v6

Kind regards,
Alex

Original description from Christian:
========================================================================
This patch series enables cephfs to support idmapped mounts, i.e. the
ability to alter ownership information on a per-mount basis.

Container managers such as LXD support sharaing data via cephfs between
the host and unprivileged containers and between unprivileged containers.
They may all use different idmappings. Idmapped mounts can be used to
create mounts with the idmapping used for the container (or a different
one specific to the use-case).

There are in fact more use-cases such as remapping ownership for
mountpoints on the host itself to grant or restrict access to different
users or to make it possible to enforce that programs running as root
will write with a non-zero {g,u}id to disk.

The patch series is simple overall and few changes are needed to cephfs.
There is one cephfs specific issue that I would like to discuss and
solve which I explain in detail in:

[PATCH 02/12] ceph: handle idmapped mounts in create_request_message()

It has to do with how to handle mds serves which have id-based access
restrictions configured. I would ask you to please take a look at the
explanation in the aforementioned patch.

The patch series passes the vfs and idmapped mount testsuite as part of
xfstests. To run it you will need a config like:

[ceph]
export FSTYP=ceph
export TEST_DIR=/mnt/test
export TEST_DEV=10.103.182.10:6789:/
export TEST_FS_MOUNT_OPTS="-o name=admin,secret=$password

and then simply call

sudo ./check -g idmapped

========================================================================

Alexander Mikhalitsyn (3):
fs: export mnt_idmap_get/mnt_idmap_put
ceph: add enable_unsafe_idmap module parameter
ceph: pass idmap to __ceph_setattr

Christian Brauner (9):
ceph: stash idmapping in mdsc request
ceph: handle idmapped mounts in create_request_message()
ceph: pass an idmapping to mknod/symlink/mkdir
ceph: allow idmapped getattr inode op
ceph: allow idmapped permission inode op
ceph: allow idmapped setattr inode op
ceph/acl: allow idmapped set_acl inode op
ceph/file: allow idmapped atomic_open inode op
ceph: allow idmapped mounts

fs/ceph/acl.c | 6 +--
fs/ceph/crypto.c | 2 +-
fs/ceph/dir.c | 3 ++
fs/ceph/file.c | 10 ++++-
fs/ceph/inode.c | 29 +++++++++------
fs/ceph/mds_client.c | 69 ++++++++++++++++++++++++++++++++---
fs/ceph/mds_client.h | 8 +++-
fs/ceph/super.c | 7 +++-
fs/ceph/super.h | 3 +-
fs/mnt_idmapping.c | 2 +
include/linux/ceph/ceph_fs.h | 4 +-
include/linux/mnt_idmapping.h | 3 ++
12 files changed, 119 insertions(+), 27 deletions(-)

--
2.34.1



2023-08-03 16:09:41

by Aleksandr Mikhalitsyn

[permalink] [raw]
Subject: [PATCH v8 04/12] ceph: add enable_unsafe_idmap module parameter

This parameter is used to decide if we allow
to perform IO on idmapped mount in case when MDS lacks
support of CEPHFS_FEATURE_HAS_OWNER_UIDGID feature.

In this case we can't properly handle MDS permission
checks and if UID/GID-based restrictions are enabled
on the MDS side then IO requests which go through an
idmapped mount may fail with -EACCESS/-EPERM.
Fortunately, for most of users it's not a case and
everything should work fine. But we put work "unsafe"
in the module parameter name to warn users about
possible problems with this feature and encourage
update of cephfs MDS.

Cc: Xiubo Li <[email protected]>
Cc: Jeff Layton <[email protected]>
Cc: Ilya Dryomov <[email protected]>
Cc: [email protected]
Suggested-by: Stéphane Graber <[email protected]>
Signed-off-by: Alexander Mikhalitsyn <[email protected]>
---
fs/ceph/mds_client.c | 28 +++++++++++++++++++++-------
fs/ceph/mds_client.h | 2 ++
fs/ceph/super.c | 5 +++++
3 files changed, 28 insertions(+), 7 deletions(-)

diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 7d3106d3b726..d8097e84a5ee 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -2949,6 +2949,8 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
int ret;
bool legacy = !(session->s_con.peer_features & CEPH_FEATURE_FS_BTIME);
u16 request_head_version = mds_supported_head_version(session);
+ kuid_t caller_fsuid = req->r_cred->fsuid;
+ kgid_t caller_fsgid = req->r_cred->fsgid;

ret = set_request_path_attr(mdsc, req->r_inode, req->r_dentry,
req->r_parent, req->r_path1, req->r_ino1.ino,
@@ -3044,12 +3046,24 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,

if ((req->r_mnt_idmap != &nop_mnt_idmap) &&
!test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID, &session->s_features)) {
- pr_err_ratelimited_client(cl,
- "idmapped mount is used and CEPHFS_FEATURE_HAS_OWNER_UIDGID"
- " is not supported by MDS. Fail request with -EIO.\n");
+ if (enable_unsafe_idmap) {
+ pr_warn_once_client(cl,
+ "idmapped mount is used and CEPHFS_FEATURE_HAS_OWNER_UIDGID"
+ " is not supported by MDS. UID/GID-based restrictions may"
+ " not work properly.\n");
+
+ caller_fsuid = from_vfsuid(req->r_mnt_idmap, &init_user_ns,
+ VFSUIDT_INIT(req->r_cred->fsuid));
+ caller_fsgid = from_vfsgid(req->r_mnt_idmap, &init_user_ns,
+ VFSGIDT_INIT(req->r_cred->fsgid));
+ } else {
+ pr_err_ratelimited_client(cl,
+ "idmapped mount is used and CEPHFS_FEATURE_HAS_OWNER_UIDGID"
+ " is not supported by MDS. Fail request with -EIO.\n");

- ret = -EIO;
- goto out_err;
+ ret = -EIO;
+ goto out_err;
+ }
}

/*
@@ -3094,9 +3108,9 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
lhead->mdsmap_epoch = cpu_to_le32(mdsc->mdsmap->m_epoch);
lhead->op = cpu_to_le32(req->r_op);
lhead->caller_uid = cpu_to_le32(from_kuid(&init_user_ns,
- req->r_cred->fsuid));
+ caller_fsuid));
lhead->caller_gid = cpu_to_le32(from_kgid(&init_user_ns,
- req->r_cred->fsgid));
+ caller_fsgid));
lhead->ino = cpu_to_le64(req->r_deleg_ino);
lhead->args = req->r_args;

diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
index 8f683e8203bd..0945ae4cf3c5 100644
--- a/fs/ceph/mds_client.h
+++ b/fs/ceph/mds_client.h
@@ -619,4 +619,6 @@ static inline int ceph_wait_on_async_create(struct inode *inode)
extern int ceph_wait_on_conflict_unlink(struct dentry *dentry);
extern u64 ceph_get_deleg_ino(struct ceph_mds_session *session);
extern int ceph_restore_deleg_ino(struct ceph_mds_session *session, u64 ino);
+
+extern bool enable_unsafe_idmap;
#endif
diff --git a/fs/ceph/super.c b/fs/ceph/super.c
index 49fd17fbba9f..18bfdfd48cef 100644
--- a/fs/ceph/super.c
+++ b/fs/ceph/super.c
@@ -1680,6 +1680,11 @@ static const struct kernel_param_ops param_ops_mount_syntax = {
module_param_cb(mount_syntax_v1, &param_ops_mount_syntax, &mount_support, 0444);
module_param_cb(mount_syntax_v2, &param_ops_mount_syntax, &mount_support, 0444);

+bool enable_unsafe_idmap = false;
+module_param(enable_unsafe_idmap, bool, 0644);
+MODULE_PARM_DESC(enable_unsafe_idmap,
+ "Allow to use idmapped mounts with MDS without CEPHFS_FEATURE_HAS_OWNER_UIDGID");
+
module_init(init_ceph);
module_exit(exit_ceph);

--
2.34.1


2023-08-03 16:17:29

by Aleksandr Mikhalitsyn

[permalink] [raw]
Subject: [PATCH v8 05/12] ceph: pass an idmapping to mknod/symlink/mkdir

From: Christian Brauner <[email protected]>

Enable mknod/symlink/mkdir iops to handle idmapped mounts.
This is just a matter of passing down the mount's idmapping.

Cc: Xiubo Li <[email protected]>
Cc: Jeff Layton <[email protected]>
Cc: Ilya Dryomov <[email protected]>
Cc: [email protected]
Signed-off-by: Christian Brauner <[email protected]>
Signed-off-by: Alexander Mikhalitsyn <[email protected]>
---
v4:
- call mnt_idmap_get
v7:
- don't pass idmapping for ceph_rename (no need)
---
fs/ceph/dir.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
index b752ed3ccdf0..397656ae7787 100644
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -952,6 +952,7 @@ static int ceph_mknod(struct mnt_idmap *idmap, struct inode *dir,
req->r_parent = dir;
ihold(dir);
set_bit(CEPH_MDS_R_PARENT_LOCKED, &req->r_req_flags);
+ req->r_mnt_idmap = mnt_idmap_get(idmap);
req->r_args.mknod.mode = cpu_to_le32(mode);
req->r_args.mknod.rdev = cpu_to_le32(rdev);
req->r_dentry_drop = CEPH_CAP_FILE_SHARED | CEPH_CAP_AUTH_EXCL |
@@ -1067,6 +1068,7 @@ static int ceph_symlink(struct mnt_idmap *idmap, struct inode *dir,
}

set_bit(CEPH_MDS_R_PARENT_LOCKED, &req->r_req_flags);
+ req->r_mnt_idmap = mnt_idmap_get(idmap);
req->r_dentry = dget(dentry);
req->r_num_caps = 2;
req->r_dentry_drop = CEPH_CAP_FILE_SHARED | CEPH_CAP_AUTH_EXCL |
@@ -1146,6 +1148,7 @@ static int ceph_mkdir(struct mnt_idmap *idmap, struct inode *dir,
req->r_parent = dir;
ihold(dir);
set_bit(CEPH_MDS_R_PARENT_LOCKED, &req->r_req_flags);
+ req->r_mnt_idmap = mnt_idmap_get(idmap);
req->r_args.mkdir.mode = cpu_to_le32(mode);
req->r_dentry_drop = CEPH_CAP_FILE_SHARED | CEPH_CAP_AUTH_EXCL |
CEPH_CAP_XATTR_EXCL;
--
2.34.1


2023-08-03 16:19:50

by Aleksandr Mikhalitsyn

[permalink] [raw]
Subject: [PATCH v8 08/12] ceph: pass idmap to __ceph_setattr

Just pass down the mount's idmapping to __ceph_setattr,
because we will need it later.

Cc: Xiubo Li <[email protected]>
Cc: Jeff Layton <[email protected]>
Cc: Ilya Dryomov <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Alexander Mikhalitsyn <[email protected]>
---
fs/ceph/acl.c | 4 ++--
fs/ceph/crypto.c | 2 +-
fs/ceph/inode.c | 5 +++--
fs/ceph/super.h | 3 ++-
4 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/fs/ceph/acl.c b/fs/ceph/acl.c
index 32b26deb1741..89280c168acb 100644
--- a/fs/ceph/acl.c
+++ b/fs/ceph/acl.c
@@ -142,7 +142,7 @@ int ceph_set_acl(struct mnt_idmap *idmap, struct dentry *dentry,
newattrs.ia_ctime = current_time(inode);
newattrs.ia_mode = new_mode;
newattrs.ia_valid = ATTR_MODE | ATTR_CTIME;
- ret = __ceph_setattr(inode, &newattrs, NULL);
+ ret = __ceph_setattr(idmap, inode, &newattrs, NULL);
if (ret)
goto out_free;
}
@@ -153,7 +153,7 @@ int ceph_set_acl(struct mnt_idmap *idmap, struct dentry *dentry,
newattrs.ia_ctime = old_ctime;
newattrs.ia_mode = old_mode;
newattrs.ia_valid = ATTR_MODE | ATTR_CTIME;
- __ceph_setattr(inode, &newattrs, NULL);
+ __ceph_setattr(idmap, inode, &newattrs, NULL);
}
goto out_free;
}
diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c
index b9071bba3b08..8cf32e7f59bf 100644
--- a/fs/ceph/crypto.c
+++ b/fs/ceph/crypto.c
@@ -112,7 +112,7 @@ static int ceph_crypt_set_context(struct inode *inode, const void *ctx, size_t l

cia.fscrypt_auth = cfa;

- ret = __ceph_setattr(inode, &attr, &cia);
+ ret = __ceph_setattr(&nop_mnt_idmap, inode, &attr, &cia);
if (ret == 0)
inode_set_flags(inode, S_ENCRYPTED, S_ENCRYPTED);
kfree(cia.fscrypt_auth);
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 9b50861bd2b5..6c4cc009d819 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -2466,7 +2466,8 @@ static int fill_fscrypt_truncate(struct inode *inode,
return ret;
}

-int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *cia)
+int __ceph_setattr(struct mnt_idmap *idmap, struct inode *inode,
+ struct iattr *attr, struct ceph_iattr *cia)
{
struct ceph_inode_info *ci = ceph_inode(inode);
unsigned int ia_valid = attr->ia_valid;
@@ -2818,7 +2819,7 @@ int ceph_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
ceph_quota_is_max_bytes_exceeded(inode, attr->ia_size))
return -EDQUOT;

- err = __ceph_setattr(inode, attr, NULL);
+ err = __ceph_setattr(idmap, inode, attr, NULL);

if (err >= 0 && (attr->ia_valid & ATTR_MODE))
err = posix_acl_chmod(&nop_mnt_idmap, dentry, attr->ia_mode);
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index 4e78de1be23e..e729cde7b4a0 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -1101,7 +1101,8 @@ struct ceph_iattr {
struct ceph_fscrypt_auth *fscrypt_auth;
};

-extern int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *cia);
+extern int __ceph_setattr(struct mnt_idmap *idmap, struct inode *inode,
+ struct iattr *attr, struct ceph_iattr *cia);
extern int ceph_setattr(struct mnt_idmap *idmap,
struct dentry *dentry, struct iattr *attr);
extern int ceph_getattr(struct mnt_idmap *idmap,
--
2.34.1


2023-08-03 16:28:10

by Aleksandr Mikhalitsyn

[permalink] [raw]
Subject: [PATCH v8 10/12] ceph/acl: allow idmapped set_acl inode op

From: Christian Brauner <[email protected]>

Enable ceph_set_acl() to handle idmapped mounts. This is just a matter
of passing down the mount's idmapping.

Cc: Xiubo Li <[email protected]>
Cc: Jeff Layton <[email protected]>
Cc: Ilya Dryomov <[email protected]>
Cc: [email protected]
Signed-off-by: Christian Brauner <[email protected]>
Signed-off-by: Alexander Mikhalitsyn <[email protected]>
---
fs/ceph/acl.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/ceph/acl.c b/fs/ceph/acl.c
index 89280c168acb..ffc6a1c02388 100644
--- a/fs/ceph/acl.c
+++ b/fs/ceph/acl.c
@@ -107,7 +107,7 @@ int ceph_set_acl(struct mnt_idmap *idmap, struct dentry *dentry,
case ACL_TYPE_ACCESS:
name = XATTR_NAME_POSIX_ACL_ACCESS;
if (acl) {
- ret = posix_acl_update_mode(&nop_mnt_idmap, inode,
+ ret = posix_acl_update_mode(idmap, inode,
&new_mode, &acl);
if (ret)
goto out;
--
2.34.1


2023-08-03 17:05:53

by Aleksandr Mikhalitsyn

[permalink] [raw]
Subject: [PATCH v8 09/12] ceph: allow idmapped setattr inode op

From: Christian Brauner <[email protected]>

Enable __ceph_setattr() to handle idmapped mounts. This is just a matter
of passing down the mount's idmapping.

Cc: Xiubo Li <[email protected]>
Cc: Jeff Layton <[email protected]>
Cc: Ilya Dryomov <[email protected]>
Cc: [email protected]
Signed-off-by: Christian Brauner <[email protected]>
[ adapted to b27c82e12965 ("attr: port attribute changes to new types") ]
Signed-off-by: Alexander Mikhalitsyn <[email protected]>
---
v4:
- introduced fsuid/fsgid local variables
v3:
- reworked as Christian suggested here:
https://lore.kernel.org/lkml/20230602-vorzeichen-praktikum-f17931692301@brauner/
---
fs/ceph/inode.c | 20 ++++++++++++--------
1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 6c4cc009d819..0a8cc0327f85 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -2553,33 +2553,37 @@ int __ceph_setattr(struct mnt_idmap *idmap, struct inode *inode,
#endif /* CONFIG_FS_ENCRYPTION */

if (ia_valid & ATTR_UID) {
+ kuid_t fsuid = from_vfsuid(idmap, i_user_ns(inode), attr->ia_vfsuid);
+
doutc(cl, "%p %llx.%llx uid %d -> %d\n", inode,
ceph_vinop(inode),
from_kuid(&init_user_ns, inode->i_uid),
from_kuid(&init_user_ns, attr->ia_uid));
if (issued & CEPH_CAP_AUTH_EXCL) {
- inode->i_uid = attr->ia_uid;
+ inode->i_uid = fsuid;
dirtied |= CEPH_CAP_AUTH_EXCL;
} else if ((issued & CEPH_CAP_AUTH_SHARED) == 0 ||
- !uid_eq(attr->ia_uid, inode->i_uid)) {
+ !uid_eq(fsuid, inode->i_uid)) {
req->r_args.setattr.uid = cpu_to_le32(
- from_kuid(&init_user_ns, attr->ia_uid));
+ from_kuid(&init_user_ns, fsuid));
mask |= CEPH_SETATTR_UID;
release |= CEPH_CAP_AUTH_SHARED;
}
}
if (ia_valid & ATTR_GID) {
+ kgid_t fsgid = from_vfsgid(idmap, i_user_ns(inode), attr->ia_vfsgid);
+
doutc(cl, "%p %llx.%llx gid %d -> %d\n", inode,
ceph_vinop(inode),
from_kgid(&init_user_ns, inode->i_gid),
from_kgid(&init_user_ns, attr->ia_gid));
if (issued & CEPH_CAP_AUTH_EXCL) {
- inode->i_gid = attr->ia_gid;
+ inode->i_gid = fsgid;
dirtied |= CEPH_CAP_AUTH_EXCL;
} else if ((issued & CEPH_CAP_AUTH_SHARED) == 0 ||
- !gid_eq(attr->ia_gid, inode->i_gid)) {
+ !gid_eq(fsgid, inode->i_gid)) {
req->r_args.setattr.gid = cpu_to_le32(
- from_kgid(&init_user_ns, attr->ia_gid));
+ from_kgid(&init_user_ns, fsgid));
mask |= CEPH_SETATTR_GID;
release |= CEPH_CAP_AUTH_SHARED;
}
@@ -2807,7 +2811,7 @@ int ceph_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
if (err)
return err;

- err = setattr_prepare(&nop_mnt_idmap, dentry, attr);
+ err = setattr_prepare(idmap, dentry, attr);
if (err != 0)
return err;

@@ -2822,7 +2826,7 @@ int ceph_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
err = __ceph_setattr(idmap, inode, attr, NULL);

if (err >= 0 && (attr->ia_valid & ATTR_MODE))
- err = posix_acl_chmod(&nop_mnt_idmap, dentry, attr->ia_mode);
+ err = posix_acl_chmod(idmap, dentry, attr->ia_mode);

return err;
}
--
2.34.1


2023-08-03 17:10:42

by Aleksandr Mikhalitsyn

[permalink] [raw]
Subject: [PATCH v8 03/12] ceph: handle idmapped mounts in create_request_message()

From: Christian Brauner <[email protected]>

Inode operations that create a new filesystem object such as ->mknod,
->create, ->mkdir() and others don't take a {g,u}id argument explicitly.
Instead the caller's fs{g,u}id is used for the {g,u}id of the new
filesystem object.

In order to ensure that the correct {g,u}id is used map the caller's
fs{g,u}id for creation requests. This doesn't require complex changes.
It suffices to pass in the relevant idmapping recorded in the request
message. If this request message was triggered from an inode operation
that creates filesystem objects it will have passed down the relevant
idmaping. If this is a request message that was triggered from an inode
operation that doens't need to take idmappings into account the initial
idmapping is passed down which is an identity mapping.

This change uses a new cephfs protocol extension CEPHFS_FEATURE_HAS_OWNER_UIDGID
which adds two new fields (owner_{u,g}id) to the request head structure.
So, we need to ensure that MDS supports it otherwise we need to fail
any IO that comes through an idmapped mount because we can't process it
in a proper way. MDS server without such an extension will use caller_{u,g}id
fields to set a new inode owner UID/GID which is incorrect because caller_{u,g}id
values are unmapped. At the same time we can't map these fields with an
idmapping as it can break UID/GID-based permission checks logic on the
MDS side. This problem was described with a lot of details at [1], [2].

[1] https://lore.kernel.org/lkml/CAEivzxfw1fHO2TFA4dx3u23ZKK6Q+EThfzuibrhA3RKM=ZOYLg@mail.gmail.com/
[2] https://lore.kernel.org/all/[email protected]/

https://github.com/ceph/ceph/pull/52575
https://tracker.ceph.com/issues/62217

Cc: Xiubo Li <[email protected]>
Cc: Jeff Layton <[email protected]>
Cc: Ilya Dryomov <[email protected]>
Cc: [email protected]
Co-Developed-by: Alexander Mikhalitsyn <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
Signed-off-by: Alexander Mikhalitsyn <[email protected]>
---
v7:
- reworked to use two new fields for owner UID/GID (https://github.com/ceph/ceph/pull/52575)
v8:
- properly handled case when old MDS used with new kernel client
---
fs/ceph/mds_client.c | 46 +++++++++++++++++++++++++++++++++---
fs/ceph/mds_client.h | 5 +++-
include/linux/ceph/ceph_fs.h | 4 +++-
3 files changed, 50 insertions(+), 5 deletions(-)

diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 8829f55103da..7d3106d3b726 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -2902,6 +2902,17 @@ static void encode_mclientrequest_tail(void **p, const struct ceph_mds_request *
}
}

+static inline u16 mds_supported_head_version(struct ceph_mds_session *session)
+{
+ if (!test_bit(CEPHFS_FEATURE_32BITS_RETRY_FWD, &session->s_features))
+ return 1;
+
+ if (!test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID, &session->s_features))
+ return 2;
+
+ return CEPH_MDS_REQUEST_HEAD_VERSION;
+}
+
static struct ceph_mds_request_head_legacy *
find_legacy_request_head(void *p, u64 features)
{
@@ -2923,6 +2934,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
{
int mds = session->s_mds;
struct ceph_mds_client *mdsc = session->s_mdsc;
+ struct ceph_client *cl = mdsc->fsc->client;
struct ceph_msg *msg;
struct ceph_mds_request_head_legacy *lhead;
const char *path1 = NULL;
@@ -2936,7 +2948,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
void *p, *end;
int ret;
bool legacy = !(session->s_con.peer_features & CEPH_FEATURE_FS_BTIME);
- bool old_version = !test_bit(CEPHFS_FEATURE_32BITS_RETRY_FWD, &session->s_features);
+ u16 request_head_version = mds_supported_head_version(session);

ret = set_request_path_attr(mdsc, req->r_inode, req->r_dentry,
req->r_parent, req->r_path1, req->r_ino1.ino,
@@ -2977,8 +2989,10 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
*/
if (legacy)
len = sizeof(struct ceph_mds_request_head_legacy);
- else if (old_version)
+ else if (request_head_version == 1)
len = sizeof(struct ceph_mds_request_head_old);
+ else if (request_head_version == 2)
+ len = offsetofend(struct ceph_mds_request_head, ext_num_fwd);
else
len = sizeof(struct ceph_mds_request_head);

@@ -3028,6 +3042,16 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
lhead = find_legacy_request_head(msg->front.iov_base,
session->s_con.peer_features);

+ if ((req->r_mnt_idmap != &nop_mnt_idmap) &&
+ !test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID, &session->s_features)) {
+ pr_err_ratelimited_client(cl,
+ "idmapped mount is used and CEPHFS_FEATURE_HAS_OWNER_UIDGID"
+ " is not supported by MDS. Fail request with -EIO.\n");
+
+ ret = -EIO;
+ goto out_err;
+ }
+
/*
* The ceph_mds_request_head_legacy didn't contain a version field, and
* one was added when we moved the message version from 3->4.
@@ -3035,17 +3059,33 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
if (legacy) {
msg->hdr.version = cpu_to_le16(3);
p = msg->front.iov_base + sizeof(*lhead);
- } else if (old_version) {
+ } else if (request_head_version == 1) {
struct ceph_mds_request_head_old *ohead = msg->front.iov_base;

msg->hdr.version = cpu_to_le16(4);
ohead->version = cpu_to_le16(1);
p = msg->front.iov_base + sizeof(*ohead);
+ } else if (request_head_version == 2) {
+ struct ceph_mds_request_head *nhead = msg->front.iov_base;
+
+ msg->hdr.version = cpu_to_le16(6);
+ nhead->version = cpu_to_le16(2);
+
+ p = msg->front.iov_base + offsetofend(struct ceph_mds_request_head, ext_num_fwd);
} else {
struct ceph_mds_request_head *nhead = msg->front.iov_base;
+ kuid_t owner_fsuid;
+ kgid_t owner_fsgid;

msg->hdr.version = cpu_to_le16(6);
nhead->version = cpu_to_le16(CEPH_MDS_REQUEST_HEAD_VERSION);
+
+ owner_fsuid = from_vfsuid(req->r_mnt_idmap, &init_user_ns,
+ VFSUIDT_INIT(req->r_cred->fsuid));
+ owner_fsgid = from_vfsgid(req->r_mnt_idmap, &init_user_ns,
+ VFSGIDT_INIT(req->r_cred->fsgid));
+ nhead->owner_uid = cpu_to_le32(from_kuid(&init_user_ns, owner_fsuid));
+ nhead->owner_gid = cpu_to_le32(from_kgid(&init_user_ns, owner_fsgid));
p = msg->front.iov_base + sizeof(*nhead);
}

diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
index e3bbf3ba8ee8..8f683e8203bd 100644
--- a/fs/ceph/mds_client.h
+++ b/fs/ceph/mds_client.h
@@ -33,8 +33,10 @@ enum ceph_feature_type {
CEPHFS_FEATURE_NOTIFY_SESSION_STATE,
CEPHFS_FEATURE_OP_GETVXATTR,
CEPHFS_FEATURE_32BITS_RETRY_FWD,
+ CEPHFS_FEATURE_NEW_SNAPREALM_INFO,
+ CEPHFS_FEATURE_HAS_OWNER_UIDGID,

- CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_32BITS_RETRY_FWD,
+ CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_HAS_OWNER_UIDGID,
};

#define CEPHFS_FEATURES_CLIENT_SUPPORTED { \
@@ -49,6 +51,7 @@ enum ceph_feature_type {
CEPHFS_FEATURE_NOTIFY_SESSION_STATE, \
CEPHFS_FEATURE_OP_GETVXATTR, \
CEPHFS_FEATURE_32BITS_RETRY_FWD, \
+ CEPHFS_FEATURE_HAS_OWNER_UIDGID, \
}

/*
diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h
index 5f2301ee88bc..6eb83a51341c 100644
--- a/include/linux/ceph/ceph_fs.h
+++ b/include/linux/ceph/ceph_fs.h
@@ -499,7 +499,7 @@ struct ceph_mds_request_head_legacy {
union ceph_mds_request_args args;
} __attribute__ ((packed));

-#define CEPH_MDS_REQUEST_HEAD_VERSION 2
+#define CEPH_MDS_REQUEST_HEAD_VERSION 3

struct ceph_mds_request_head_old {
__le16 version; /* struct version */
@@ -530,6 +530,8 @@ struct ceph_mds_request_head {

__le32 ext_num_retry; /* new count retry attempts */
__le32 ext_num_fwd; /* new count fwd attempts */
+
+ __le32 owner_uid, owner_gid; /* used for OPs which create inodes */
} __attribute__ ((packed));

/* cap/lease release record */
--
2.34.1


2023-08-04 03:44:26

by Xiubo Li

[permalink] [raw]
Subject: Re: [PATCH v8 03/12] ceph: handle idmapped mounts in create_request_message()


On 8/3/23 21:59, Alexander Mikhalitsyn wrote:
> From: Christian Brauner <[email protected]>
>
> Inode operations that create a new filesystem object such as ->mknod,
> ->create, ->mkdir() and others don't take a {g,u}id argument explicitly.
> Instead the caller's fs{g,u}id is used for the {g,u}id of the new
> filesystem object.
>
> In order to ensure that the correct {g,u}id is used map the caller's
> fs{g,u}id for creation requests. This doesn't require complex changes.
> It suffices to pass in the relevant idmapping recorded in the request
> message. If this request message was triggered from an inode operation
> that creates filesystem objects it will have passed down the relevant
> idmaping. If this is a request message that was triggered from an inode
> operation that doens't need to take idmappings into account the initial
> idmapping is passed down which is an identity mapping.
>
> This change uses a new cephfs protocol extension CEPHFS_FEATURE_HAS_OWNER_UIDGID
> which adds two new fields (owner_{u,g}id) to the request head structure.
> So, we need to ensure that MDS supports it otherwise we need to fail
> any IO that comes through an idmapped mount because we can't process it
> in a proper way. MDS server without such an extension will use caller_{u,g}id
> fields to set a new inode owner UID/GID which is incorrect because caller_{u,g}id
> values are unmapped. At the same time we can't map these fields with an
> idmapping as it can break UID/GID-based permission checks logic on the
> MDS side. This problem was described with a lot of details at [1], [2].
>
> [1] https://lore.kernel.org/lkml/CAEivzxfw1fHO2TFA4dx3u23ZKK6Q+EThfzuibrhA3RKM=ZOYLg@mail.gmail.com/
> [2] https://lore.kernel.org/all/[email protected]/
>
> https://github.com/ceph/ceph/pull/52575
> https://tracker.ceph.com/issues/62217
>
> Cc: Xiubo Li <[email protected]>
> Cc: Jeff Layton <[email protected]>
> Cc: Ilya Dryomov <[email protected]>
> Cc: [email protected]
> Co-Developed-by: Alexander Mikhalitsyn <[email protected]>
> Signed-off-by: Christian Brauner <[email protected]>
> Signed-off-by: Alexander Mikhalitsyn <[email protected]>
> ---
> v7:
> - reworked to use two new fields for owner UID/GID (https://github.com/ceph/ceph/pull/52575)
> v8:
> - properly handled case when old MDS used with new kernel client
> ---
> fs/ceph/mds_client.c | 46 +++++++++++++++++++++++++++++++++---
> fs/ceph/mds_client.h | 5 +++-
> include/linux/ceph/ceph_fs.h | 4 +++-
> 3 files changed, 50 insertions(+), 5 deletions(-)
>
> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> index 8829f55103da..7d3106d3b726 100644
> --- a/fs/ceph/mds_client.c
> +++ b/fs/ceph/mds_client.c
> @@ -2902,6 +2902,17 @@ static void encode_mclientrequest_tail(void **p, const struct ceph_mds_request *
> }
> }
>
> +static inline u16 mds_supported_head_version(struct ceph_mds_session *session)
> +{
> + if (!test_bit(CEPHFS_FEATURE_32BITS_RETRY_FWD, &session->s_features))
> + return 1;
> +
> + if (!test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID, &session->s_features))
> + return 2;
> +
> + return CEPH_MDS_REQUEST_HEAD_VERSION;
> +}
> +
> static struct ceph_mds_request_head_legacy *
> find_legacy_request_head(void *p, u64 features)
> {
> @@ -2923,6 +2934,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
> {
> int mds = session->s_mds;
> struct ceph_mds_client *mdsc = session->s_mdsc;
> + struct ceph_client *cl = mdsc->fsc->client;
> struct ceph_msg *msg;
> struct ceph_mds_request_head_legacy *lhead;
> const char *path1 = NULL;
> @@ -2936,7 +2948,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
> void *p, *end;
> int ret;
> bool legacy = !(session->s_con.peer_features & CEPH_FEATURE_FS_BTIME);
> - bool old_version = !test_bit(CEPHFS_FEATURE_32BITS_RETRY_FWD, &session->s_features);
> + u16 request_head_version = mds_supported_head_version(session);
>
> ret = set_request_path_attr(mdsc, req->r_inode, req->r_dentry,
> req->r_parent, req->r_path1, req->r_ino1.ino,
> @@ -2977,8 +2989,10 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
> */
> if (legacy)
> len = sizeof(struct ceph_mds_request_head_legacy);
> - else if (old_version)
> + else if (request_head_version == 1)
> len = sizeof(struct ceph_mds_request_head_old);
> + else if (request_head_version == 2)
> + len = offsetofend(struct ceph_mds_request_head, ext_num_fwd);
> else
> len = sizeof(struct ceph_mds_request_head);
>

This is not what we suppose to. If we do this again and again when
adding new members it will make the code very complicated to maintain.

Once the CEPHFS_FEATURE_32BITS_RETRY_FWD has been supported the ceph
should correctly decode it and if CEPHFS_FEATURE_HAS_OWNER_UIDGID is not
supported the decoder should skip it directly.

Is the MDS side buggy ? Why you last version didn't work ?

Thanks

- Xiubo

> @@ -3028,6 +3042,16 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
> lhead = find_legacy_request_head(msg->front.iov_base,
> session->s_con.peer_features);
>
> + if ((req->r_mnt_idmap != &nop_mnt_idmap) &&
> + !test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID, &session->s_features)) {
> + pr_err_ratelimited_client(cl,
> + "idmapped mount is used and CEPHFS_FEATURE_HAS_OWNER_UIDGID"
> + " is not supported by MDS. Fail request with -EIO.\n");
> +
> + ret = -EIO;
> + goto out_err;
> + }
> +
> /*
> * The ceph_mds_request_head_legacy didn't contain a version field, and
> * one was added when we moved the message version from 3->4.
> @@ -3035,17 +3059,33 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
> if (legacy) {
> msg->hdr.version = cpu_to_le16(3);
> p = msg->front.iov_base + sizeof(*lhead);
> - } else if (old_version) {
> + } else if (request_head_version == 1) {
> struct ceph_mds_request_head_old *ohead = msg->front.iov_base;
>
> msg->hdr.version = cpu_to_le16(4);
> ohead->version = cpu_to_le16(1);
> p = msg->front.iov_base + sizeof(*ohead);
> + } else if (request_head_version == 2) {
> + struct ceph_mds_request_head *nhead = msg->front.iov_base;
> +
> + msg->hdr.version = cpu_to_le16(6);
> + nhead->version = cpu_to_le16(2);
> +
> + p = msg->front.iov_base + offsetofend(struct ceph_mds_request_head, ext_num_fwd);
> } else {
> struct ceph_mds_request_head *nhead = msg->front.iov_base;
> + kuid_t owner_fsuid;
> + kgid_t owner_fsgid;
>
> msg->hdr.version = cpu_to_le16(6);
> nhead->version = cpu_to_le16(CEPH_MDS_REQUEST_HEAD_VERSION);
> +
> + owner_fsuid = from_vfsuid(req->r_mnt_idmap, &init_user_ns,
> + VFSUIDT_INIT(req->r_cred->fsuid));
> + owner_fsgid = from_vfsgid(req->r_mnt_idmap, &init_user_ns,
> + VFSGIDT_INIT(req->r_cred->fsgid));
> + nhead->owner_uid = cpu_to_le32(from_kuid(&init_user_ns, owner_fsuid));
> + nhead->owner_gid = cpu_to_le32(from_kgid(&init_user_ns, owner_fsgid));
> p = msg->front.iov_base + sizeof(*nhead);
> }
>
> diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
> index e3bbf3ba8ee8..8f683e8203bd 100644
> --- a/fs/ceph/mds_client.h
> +++ b/fs/ceph/mds_client.h
> @@ -33,8 +33,10 @@ enum ceph_feature_type {
> CEPHFS_FEATURE_NOTIFY_SESSION_STATE,
> CEPHFS_FEATURE_OP_GETVXATTR,
> CEPHFS_FEATURE_32BITS_RETRY_FWD,
> + CEPHFS_FEATURE_NEW_SNAPREALM_INFO,
> + CEPHFS_FEATURE_HAS_OWNER_UIDGID,
>
> - CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_32BITS_RETRY_FWD,
> + CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_HAS_OWNER_UIDGID,
> };
>
> #define CEPHFS_FEATURES_CLIENT_SUPPORTED { \
> @@ -49,6 +51,7 @@ enum ceph_feature_type {
> CEPHFS_FEATURE_NOTIFY_SESSION_STATE, \
> CEPHFS_FEATURE_OP_GETVXATTR, \
> CEPHFS_FEATURE_32BITS_RETRY_FWD, \
> + CEPHFS_FEATURE_HAS_OWNER_UIDGID, \
> }
>
> /*
> diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h
> index 5f2301ee88bc..6eb83a51341c 100644
> --- a/include/linux/ceph/ceph_fs.h
> +++ b/include/linux/ceph/ceph_fs.h
> @@ -499,7 +499,7 @@ struct ceph_mds_request_head_legacy {
> union ceph_mds_request_args args;
> } __attribute__ ((packed));
>
> -#define CEPH_MDS_REQUEST_HEAD_VERSION 2
> +#define CEPH_MDS_REQUEST_HEAD_VERSION 3
>
> struct ceph_mds_request_head_old {
> __le16 version; /* struct version */
> @@ -530,6 +530,8 @@ struct ceph_mds_request_head {
>
> __le32 ext_num_retry; /* new count retry attempts */
> __le32 ext_num_fwd; /* new count fwd attempts */
> +
> + __le32 owner_uid, owner_gid; /* used for OPs which create inodes */
> } __attribute__ ((packed));
>
> /* cap/lease release record */


2023-08-04 03:45:48

by Xiubo Li

[permalink] [raw]
Subject: Re: [PATCH v8 03/12] ceph: handle idmapped mounts in create_request_message()


On 8/4/23 10:26, Xiubo Li wrote:
>
> On 8/3/23 21:59, Alexander Mikhalitsyn wrote:
>> From: Christian Brauner <[email protected]>
>>
>> Inode operations that create a new filesystem object such as ->mknod,
>> ->create, ->mkdir() and others don't take a {g,u}id argument explicitly.
>> Instead the caller's fs{g,u}id is used for the {g,u}id of the new
>> filesystem object.
>>
>> In order to ensure that the correct {g,u}id is used map the caller's
>> fs{g,u}id for creation requests. This doesn't require complex changes.
>> It suffices to pass in the relevant idmapping recorded in the request
>> message. If this request message was triggered from an inode operation
>> that creates filesystem objects it will have passed down the relevant
>> idmaping. If this is a request message that was triggered from an inode
>> operation that doens't need to take idmappings into account the initial
>> idmapping is passed down which is an identity mapping.
>>
>> This change uses a new cephfs protocol extension
>> CEPHFS_FEATURE_HAS_OWNER_UIDGID
>> which adds two new fields (owner_{u,g}id) to the request head structure.
>> So, we need to ensure that MDS supports it otherwise we need to fail
>> any IO that comes through an idmapped mount because we can't process it
>> in a proper way. MDS server without such an extension will use
>> caller_{u,g}id
>> fields to set a new inode owner UID/GID which is incorrect because
>> caller_{u,g}id
>> values are unmapped. At the same time we can't map these fields with an
>> idmapping as it can break UID/GID-based permission checks logic on the
>> MDS side. This problem was described with a lot of details at [1], [2].
>>
>> [1]
>> https://lore.kernel.org/lkml/CAEivzxfw1fHO2TFA4dx3u23ZKK6Q+EThfzuibrhA3RKM=ZOYLg@mail.gmail.com/
>> [2]
>> https://lore.kernel.org/all/[email protected]/
>>
>> https://github.com/ceph/ceph/pull/52575
>> https://tracker.ceph.com/issues/62217
>>
>> Cc: Xiubo Li <[email protected]>
>> Cc: Jeff Layton <[email protected]>
>> Cc: Ilya Dryomov <[email protected]>
>> Cc: [email protected]
>> Co-Developed-by: Alexander Mikhalitsyn
>> <[email protected]>
>> Signed-off-by: Christian Brauner <[email protected]>
>> Signed-off-by: Alexander Mikhalitsyn
>> <[email protected]>
>> ---
>> v7:
>>     - reworked to use two new fields for owner UID/GID
>> (https://github.com/ceph/ceph/pull/52575)
>> v8:
>>     - properly handled case when old MDS used with new kernel client
>> ---
>>   fs/ceph/mds_client.c         | 46 +++++++++++++++++++++++++++++++++---
>>   fs/ceph/mds_client.h         |  5 +++-
>>   include/linux/ceph/ceph_fs.h |  4 +++-
>>   3 files changed, 50 insertions(+), 5 deletions(-)
>>
>> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
>> index 8829f55103da..7d3106d3b726 100644
>> --- a/fs/ceph/mds_client.c
>> +++ b/fs/ceph/mds_client.c
>> @@ -2902,6 +2902,17 @@ static void encode_mclientrequest_tail(void
>> **p, const struct ceph_mds_request *
>>       }
>>   }
>>   +static inline u16 mds_supported_head_version(struct
>> ceph_mds_session *session)
>> +{
>> +    if (!test_bit(CEPHFS_FEATURE_32BITS_RETRY_FWD,
>> &session->s_features))
>> +        return 1;
>> +
>> +    if (!test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID,
>> &session->s_features))
>> +        return 2;
>> +
>> +    return CEPH_MDS_REQUEST_HEAD_VERSION;
>> +}
>> +
>>   static struct ceph_mds_request_head_legacy *
>>   find_legacy_request_head(void *p, u64 features)
>>   {
>> @@ -2923,6 +2934,7 @@ static struct ceph_msg
>> *create_request_message(struct ceph_mds_session *session,
>>   {
>>       int mds = session->s_mds;
>>       struct ceph_mds_client *mdsc = session->s_mdsc;
>> +    struct ceph_client *cl = mdsc->fsc->client;
>>       struct ceph_msg *msg;
>>       struct ceph_mds_request_head_legacy *lhead;
>>       const char *path1 = NULL;
>> @@ -2936,7 +2948,7 @@ static struct ceph_msg
>> *create_request_message(struct ceph_mds_session *session,
>>       void *p, *end;
>>       int ret;
>>       bool legacy = !(session->s_con.peer_features &
>> CEPH_FEATURE_FS_BTIME);
>> -    bool old_version = !test_bit(CEPHFS_FEATURE_32BITS_RETRY_FWD,
>> &session->s_features);
>> +    u16 request_head_version = mds_supported_head_version(session);
>>         ret = set_request_path_attr(mdsc, req->r_inode, req->r_dentry,
>>                     req->r_parent, req->r_path1, req->r_ino1.ino,
>> @@ -2977,8 +2989,10 @@ static struct ceph_msg
>> *create_request_message(struct ceph_mds_session *session,
>>        */
>>       if (legacy)
>>           len = sizeof(struct ceph_mds_request_head_legacy);
>> -    else if (old_version)
>> +    else if (request_head_version == 1)
>>           len = sizeof(struct ceph_mds_request_head_old);
>> +    else if (request_head_version == 2)
>> +        len = offsetofend(struct ceph_mds_request_head, ext_num_fwd);
>>       else
>>           len = sizeof(struct ceph_mds_request_head);
>
> This is not what we suppose to. If we do this again and again when
> adding new members it will make the code very complicated to maintain.
>
> Once the CEPHFS_FEATURE_32BITS_RETRY_FWD has been supported the ceph
> should correctly decode it and if CEPHFS_FEATURE_HAS_OWNER_UIDGID is
> not supported the decoder should skip it directly.
>
> Is the MDS side buggy ? Why you last version didn't work ?
>

I think the ceph side is buggy. Possibly we should add one new `length`
member in struct `struct ceph_mds_request_head` and just skip the extra
bytes when decoding it.

Could you fix it together with your ceph PR ?

Thanks

- Xiubo


> Thanks
>
> - Xiubo
>
>> @@ -3028,6 +3042,16 @@ static struct ceph_msg
>> *create_request_message(struct ceph_mds_session *session,
>>       lhead = find_legacy_request_head(msg->front.iov_base,
>>                        session->s_con.peer_features);
>>   +    if ((req->r_mnt_idmap != &nop_mnt_idmap) &&
>> +        !test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID,
>> &session->s_features)) {
>> +        pr_err_ratelimited_client(cl,
>> +            "idmapped mount is used and
>> CEPHFS_FEATURE_HAS_OWNER_UIDGID"
>> +            " is not supported by MDS. Fail request with -EIO.\n");
>> +
>> +        ret = -EIO;
>> +        goto out_err;
>> +    }
>> +
>>       /*
>>        * The ceph_mds_request_head_legacy didn't contain a version
>> field, and
>>        * one was added when we moved the message version from 3->4.
>> @@ -3035,17 +3059,33 @@ static struct ceph_msg
>> *create_request_message(struct ceph_mds_session *session,
>>       if (legacy) {
>>           msg->hdr.version = cpu_to_le16(3);
>>           p = msg->front.iov_base + sizeof(*lhead);
>> -    } else if (old_version) {
>> +    } else if (request_head_version == 1) {
>>           struct ceph_mds_request_head_old *ohead = msg->front.iov_base;
>>             msg->hdr.version = cpu_to_le16(4);
>>           ohead->version = cpu_to_le16(1);
>>           p = msg->front.iov_base + sizeof(*ohead);
>> +    } else if (request_head_version == 2) {
>> +        struct ceph_mds_request_head *nhead = msg->front.iov_base;
>> +
>> +        msg->hdr.version = cpu_to_le16(6);
>> +        nhead->version = cpu_to_le16(2);
>> +
>> +        p = msg->front.iov_base + offsetofend(struct
>> ceph_mds_request_head, ext_num_fwd);
>>       } else {
>>           struct ceph_mds_request_head *nhead = msg->front.iov_base;
>> +        kuid_t owner_fsuid;
>> +        kgid_t owner_fsgid;
>>             msg->hdr.version = cpu_to_le16(6);
>>           nhead->version = cpu_to_le16(CEPH_MDS_REQUEST_HEAD_VERSION);
>> +
>> +        owner_fsuid = from_vfsuid(req->r_mnt_idmap, &init_user_ns,
>> +                      VFSUIDT_INIT(req->r_cred->fsuid));
>> +        owner_fsgid = from_vfsgid(req->r_mnt_idmap, &init_user_ns,
>> +                      VFSGIDT_INIT(req->r_cred->fsgid));
>> +        nhead->owner_uid = cpu_to_le32(from_kuid(&init_user_ns,
>> owner_fsuid));
>> +        nhead->owner_gid = cpu_to_le32(from_kgid(&init_user_ns,
>> owner_fsgid));
>>           p = msg->front.iov_base + sizeof(*nhead);
>>       }
>>   diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
>> index e3bbf3ba8ee8..8f683e8203bd 100644
>> --- a/fs/ceph/mds_client.h
>> +++ b/fs/ceph/mds_client.h
>> @@ -33,8 +33,10 @@ enum ceph_feature_type {
>>       CEPHFS_FEATURE_NOTIFY_SESSION_STATE,
>>       CEPHFS_FEATURE_OP_GETVXATTR,
>>       CEPHFS_FEATURE_32BITS_RETRY_FWD,
>> +    CEPHFS_FEATURE_NEW_SNAPREALM_INFO,
>> +    CEPHFS_FEATURE_HAS_OWNER_UIDGID,
>>   -    CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_32BITS_RETRY_FWD,
>> +    CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_HAS_OWNER_UIDGID,
>>   };
>>     #define CEPHFS_FEATURES_CLIENT_SUPPORTED {    \
>> @@ -49,6 +51,7 @@ enum ceph_feature_type {
>>       CEPHFS_FEATURE_NOTIFY_SESSION_STATE,    \
>>       CEPHFS_FEATURE_OP_GETVXATTR,        \
>>       CEPHFS_FEATURE_32BITS_RETRY_FWD,    \
>> +    CEPHFS_FEATURE_HAS_OWNER_UIDGID,    \
>>   }
>>     /*
>> diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h
>> index 5f2301ee88bc..6eb83a51341c 100644
>> --- a/include/linux/ceph/ceph_fs.h
>> +++ b/include/linux/ceph/ceph_fs.h
>> @@ -499,7 +499,7 @@ struct ceph_mds_request_head_legacy {
>>       union ceph_mds_request_args args;
>>   } __attribute__ ((packed));
>>   -#define CEPH_MDS_REQUEST_HEAD_VERSION  2
>> +#define CEPH_MDS_REQUEST_HEAD_VERSION  3
>>     struct ceph_mds_request_head_old {
>>       __le16 version;                /* struct version */
>> @@ -530,6 +530,8 @@ struct ceph_mds_request_head {
>>         __le32 ext_num_retry;          /* new count retry attempts */
>>       __le32 ext_num_fwd;            /* new count fwd attempts */
>> +
>> +    __le32 owner_uid, owner_gid;   /* used for OPs which create
>> inodes */
>>   } __attribute__ ((packed));
>>     /* cap/lease release record */


2023-08-04 07:36:52

by Aleksandr Mikhalitsyn

[permalink] [raw]
Subject: Re: [PATCH v8 03/12] ceph: handle idmapped mounts in create_request_message()

On Fri, Aug 4, 2023 at 8:35 AM Aleksandr Mikhalitsyn
<[email protected]> wrote:
>
> On Fri, Aug 4, 2023 at 5:24 AM Xiubo Li <[email protected]> wrote:
> >
> >
> > On 8/4/23 10:26, Xiubo Li wrote:
> > >
> > > On 8/3/23 21:59, Alexander Mikhalitsyn wrote:
> > >> From: Christian Brauner <[email protected]>
> > >>
> > >> Inode operations that create a new filesystem object such as ->mknod,
> > >> ->create, ->mkdir() and others don't take a {g,u}id argument explicitly.
> > >> Instead the caller's fs{g,u}id is used for the {g,u}id of the new
> > >> filesystem object.
> > >>
> > >> In order to ensure that the correct {g,u}id is used map the caller's
> > >> fs{g,u}id for creation requests. This doesn't require complex changes.
> > >> It suffices to pass in the relevant idmapping recorded in the request
> > >> message. If this request message was triggered from an inode operation
> > >> that creates filesystem objects it will have passed down the relevant
> > >> idmaping. If this is a request message that was triggered from an inode
> > >> operation that doens't need to take idmappings into account the initial
> > >> idmapping is passed down which is an identity mapping.
> > >>
> > >> This change uses a new cephfs protocol extension
> > >> CEPHFS_FEATURE_HAS_OWNER_UIDGID
> > >> which adds two new fields (owner_{u,g}id) to the request head structure.
> > >> So, we need to ensure that MDS supports it otherwise we need to fail
> > >> any IO that comes through an idmapped mount because we can't process it
> > >> in a proper way. MDS server without such an extension will use
> > >> caller_{u,g}id
> > >> fields to set a new inode owner UID/GID which is incorrect because
> > >> caller_{u,g}id
> > >> values are unmapped. At the same time we can't map these fields with an
> > >> idmapping as it can break UID/GID-based permission checks logic on the
> > >> MDS side. This problem was described with a lot of details at [1], [2].
> > >>
> > >> [1]
> > >> https://lore.kernel.org/lkml/CAEivzxfw1fHO2TFA4dx3u23ZKK6Q+EThfzuibrhA3RKM=ZOYLg@mail.gmail.com/
> > >> [2]
> > >> https://lore.kernel.org/all/[email protected]/
> > >>
> > >> https://github.com/ceph/ceph/pull/52575
> > >> https://tracker.ceph.com/issues/62217
> > >>
> > >> Cc: Xiubo Li <[email protected]>
> > >> Cc: Jeff Layton <[email protected]>
> > >> Cc: Ilya Dryomov <[email protected]>
> > >> Cc: [email protected]
> > >> Co-Developed-by: Alexander Mikhalitsyn
> > >> <[email protected]>
> > >> Signed-off-by: Christian Brauner <[email protected]>
> > >> Signed-off-by: Alexander Mikhalitsyn
> > >> <[email protected]>
> > >> ---
> > >> v7:
> > >> - reworked to use two new fields for owner UID/GID
> > >> (https://github.com/ceph/ceph/pull/52575)
> > >> v8:
> > >> - properly handled case when old MDS used with new kernel client
> > >> ---
> > >> fs/ceph/mds_client.c | 46 +++++++++++++++++++++++++++++++++---
> > >> fs/ceph/mds_client.h | 5 +++-
> > >> include/linux/ceph/ceph_fs.h | 4 +++-
> > >> 3 files changed, 50 insertions(+), 5 deletions(-)
> > >>
> > >> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> > >> index 8829f55103da..7d3106d3b726 100644
> > >> --- a/fs/ceph/mds_client.c
> > >> +++ b/fs/ceph/mds_client.c
> > >> @@ -2902,6 +2902,17 @@ static void encode_mclientrequest_tail(void
> > >> **p, const struct ceph_mds_request *
> > >> }
> > >> }
> > >> +static inline u16 mds_supported_head_version(struct
> > >> ceph_mds_session *session)
> > >> +{
> > >> + if (!test_bit(CEPHFS_FEATURE_32BITS_RETRY_FWD,
> > >> &session->s_features))
> > >> + return 1;
> > >> +
> > >> + if (!test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID,
> > >> &session->s_features))
> > >> + return 2;
> > >> +
> > >> + return CEPH_MDS_REQUEST_HEAD_VERSION;
> > >> +}
> > >> +
> > >> static struct ceph_mds_request_head_legacy *
> > >> find_legacy_request_head(void *p, u64 features)
> > >> {
> > >> @@ -2923,6 +2934,7 @@ static struct ceph_msg
> > >> *create_request_message(struct ceph_mds_session *session,
> > >> {
> > >> int mds = session->s_mds;
> > >> struct ceph_mds_client *mdsc = session->s_mdsc;
> > >> + struct ceph_client *cl = mdsc->fsc->client;
> > >> struct ceph_msg *msg;
> > >> struct ceph_mds_request_head_legacy *lhead;
> > >> const char *path1 = NULL;
> > >> @@ -2936,7 +2948,7 @@ static struct ceph_msg
> > >> *create_request_message(struct ceph_mds_session *session,
> > >> void *p, *end;
> > >> int ret;
> > >> bool legacy = !(session->s_con.peer_features &
> > >> CEPH_FEATURE_FS_BTIME);
> > >> - bool old_version = !test_bit(CEPHFS_FEATURE_32BITS_RETRY_FWD,
> > >> &session->s_features);
> > >> + u16 request_head_version = mds_supported_head_version(session);
> > >> ret = set_request_path_attr(mdsc, req->r_inode, req->r_dentry,
> > >> req->r_parent, req->r_path1, req->r_ino1.ino,
> > >> @@ -2977,8 +2989,10 @@ static struct ceph_msg
> > >> *create_request_message(struct ceph_mds_session *session,
> > >> */
> > >> if (legacy)
> > >> len = sizeof(struct ceph_mds_request_head_legacy);
> > >> - else if (old_version)
> > >> + else if (request_head_version == 1)
> > >> len = sizeof(struct ceph_mds_request_head_old);
> > >> + else if (request_head_version == 2)
> > >> + len = offsetofend(struct ceph_mds_request_head, ext_num_fwd);
> > >> else
> > >> len = sizeof(struct ceph_mds_request_head);
> > >
> > > This is not what we suppose to. If we do this again and again when
> > > adding new members it will make the code very complicated to maintain.
> > >
> > > Once the CEPHFS_FEATURE_32BITS_RETRY_FWD has been supported the ceph
> > > should correctly decode it and if CEPHFS_FEATURE_HAS_OWNER_UIDGID is
> > > not supported the decoder should skip it directly.
> > >
> > > Is the MDS side buggy ? Why you last version didn't work ?
> > >
> >
> > I think the ceph side is buggy. Possibly we should add one new `length`
> > member in struct `struct ceph_mds_request_head` and just skip the extra
> > bytes when decoding it.
>
> Hm, I think I found something suspicious. In cephfs code we have many
> places that
> call the DECODE_FINISH macro, but in our decoder we don't have it.
>
> From documentation it follows that DECODE_FINISH purpose is precisely
> about this problem.
>
> What do you think?

Upd: this thing also changes on-wire format and adds field to store length.
But this will be a massive and incompatible protocol change. I don't think that
we want to do this in the scope of this task.

>
> >
> > Could you fix it together with your ceph PR ?
> >
> > Thanks
> >
> > - Xiubo
> >
> >
> > > Thanks
> > >
> > > - Xiubo
> > >
> > >> @@ -3028,6 +3042,16 @@ static struct ceph_msg
> > >> *create_request_message(struct ceph_mds_session *session,
> > >> lhead = find_legacy_request_head(msg->front.iov_base,
> > >> session->s_con.peer_features);
> > >> + if ((req->r_mnt_idmap != &nop_mnt_idmap) &&
> > >> + !test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID,
> > >> &session->s_features)) {
> > >> + pr_err_ratelimited_client(cl,
> > >> + "idmapped mount is used and
> > >> CEPHFS_FEATURE_HAS_OWNER_UIDGID"
> > >> + " is not supported by MDS. Fail request with -EIO.\n");
> > >> +
> > >> + ret = -EIO;
> > >> + goto out_err;
> > >> + }
> > >> +
> > >> /*
> > >> * The ceph_mds_request_head_legacy didn't contain a version
> > >> field, and
> > >> * one was added when we moved the message version from 3->4.
> > >> @@ -3035,17 +3059,33 @@ static struct ceph_msg
> > >> *create_request_message(struct ceph_mds_session *session,
> > >> if (legacy) {
> > >> msg->hdr.version = cpu_to_le16(3);
> > >> p = msg->front.iov_base + sizeof(*lhead);
> > >> - } else if (old_version) {
> > >> + } else if (request_head_version == 1) {
> > >> struct ceph_mds_request_head_old *ohead = msg->front.iov_base;
> > >> msg->hdr.version = cpu_to_le16(4);
> > >> ohead->version = cpu_to_le16(1);
> > >> p = msg->front.iov_base + sizeof(*ohead);
> > >> + } else if (request_head_version == 2) {
> > >> + struct ceph_mds_request_head *nhead = msg->front.iov_base;
> > >> +
> > >> + msg->hdr.version = cpu_to_le16(6);
> > >> + nhead->version = cpu_to_le16(2);
> > >> +
> > >> + p = msg->front.iov_base + offsetofend(struct
> > >> ceph_mds_request_head, ext_num_fwd);
> > >> } else {
> > >> struct ceph_mds_request_head *nhead = msg->front.iov_base;
> > >> + kuid_t owner_fsuid;
> > >> + kgid_t owner_fsgid;
> > >> msg->hdr.version = cpu_to_le16(6);
> > >> nhead->version = cpu_to_le16(CEPH_MDS_REQUEST_HEAD_VERSION);
> > >> +
> > >> + owner_fsuid = from_vfsuid(req->r_mnt_idmap, &init_user_ns,
> > >> + VFSUIDT_INIT(req->r_cred->fsuid));
> > >> + owner_fsgid = from_vfsgid(req->r_mnt_idmap, &init_user_ns,
> > >> + VFSGIDT_INIT(req->r_cred->fsgid));
> > >> + nhead->owner_uid = cpu_to_le32(from_kuid(&init_user_ns,
> > >> owner_fsuid));
> > >> + nhead->owner_gid = cpu_to_le32(from_kgid(&init_user_ns,
> > >> owner_fsgid));
> > >> p = msg->front.iov_base + sizeof(*nhead);
> > >> }
> > >> diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
> > >> index e3bbf3ba8ee8..8f683e8203bd 100644
> > >> --- a/fs/ceph/mds_client.h
> > >> +++ b/fs/ceph/mds_client.h
> > >> @@ -33,8 +33,10 @@ enum ceph_feature_type {
> > >> CEPHFS_FEATURE_NOTIFY_SESSION_STATE,
> > >> CEPHFS_FEATURE_OP_GETVXATTR,
> > >> CEPHFS_FEATURE_32BITS_RETRY_FWD,
> > >> + CEPHFS_FEATURE_NEW_SNAPREALM_INFO,
> > >> + CEPHFS_FEATURE_HAS_OWNER_UIDGID,
> > >> - CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_32BITS_RETRY_FWD,
> > >> + CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_HAS_OWNER_UIDGID,
> > >> };
> > >> #define CEPHFS_FEATURES_CLIENT_SUPPORTED { \
> > >> @@ -49,6 +51,7 @@ enum ceph_feature_type {
> > >> CEPHFS_FEATURE_NOTIFY_SESSION_STATE, \
> > >> CEPHFS_FEATURE_OP_GETVXATTR, \
> > >> CEPHFS_FEATURE_32BITS_RETRY_FWD, \
> > >> + CEPHFS_FEATURE_HAS_OWNER_UIDGID, \
> > >> }
> > >> /*
> > >> diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h
> > >> index 5f2301ee88bc..6eb83a51341c 100644
> > >> --- a/include/linux/ceph/ceph_fs.h
> > >> +++ b/include/linux/ceph/ceph_fs.h
> > >> @@ -499,7 +499,7 @@ struct ceph_mds_request_head_legacy {
> > >> union ceph_mds_request_args args;
> > >> } __attribute__ ((packed));
> > >> -#define CEPH_MDS_REQUEST_HEAD_VERSION 2
> > >> +#define CEPH_MDS_REQUEST_HEAD_VERSION 3
> > >> struct ceph_mds_request_head_old {
> > >> __le16 version; /* struct version */
> > >> @@ -530,6 +530,8 @@ struct ceph_mds_request_head {
> > >> __le32 ext_num_retry; /* new count retry attempts */
> > >> __le32 ext_num_fwd; /* new count fwd attempts */
> > >> +
> > >> + __le32 owner_uid, owner_gid; /* used for OPs which create
> > >> inodes */
> > >> } __attribute__ ((packed));
> > >> /* cap/lease release record */
> >

2023-08-04 07:51:33

by Aleksandr Mikhalitsyn

[permalink] [raw]
Subject: Re: [PATCH v8 03/12] ceph: handle idmapped mounts in create_request_message()

On Fri, Aug 4, 2023 at 5:24 AM Xiubo Li <[email protected]> wrote:
>
>
> On 8/4/23 10:26, Xiubo Li wrote:
> >
> > On 8/3/23 21:59, Alexander Mikhalitsyn wrote:
> >> From: Christian Brauner <[email protected]>
> >>
> >> Inode operations that create a new filesystem object such as ->mknod,
> >> ->create, ->mkdir() and others don't take a {g,u}id argument explicitly.
> >> Instead the caller's fs{g,u}id is used for the {g,u}id of the new
> >> filesystem object.
> >>
> >> In order to ensure that the correct {g,u}id is used map the caller's
> >> fs{g,u}id for creation requests. This doesn't require complex changes.
> >> It suffices to pass in the relevant idmapping recorded in the request
> >> message. If this request message was triggered from an inode operation
> >> that creates filesystem objects it will have passed down the relevant
> >> idmaping. If this is a request message that was triggered from an inode
> >> operation that doens't need to take idmappings into account the initial
> >> idmapping is passed down which is an identity mapping.
> >>
> >> This change uses a new cephfs protocol extension
> >> CEPHFS_FEATURE_HAS_OWNER_UIDGID
> >> which adds two new fields (owner_{u,g}id) to the request head structure.
> >> So, we need to ensure that MDS supports it otherwise we need to fail
> >> any IO that comes through an idmapped mount because we can't process it
> >> in a proper way. MDS server without such an extension will use
> >> caller_{u,g}id
> >> fields to set a new inode owner UID/GID which is incorrect because
> >> caller_{u,g}id
> >> values are unmapped. At the same time we can't map these fields with an
> >> idmapping as it can break UID/GID-based permission checks logic on the
> >> MDS side. This problem was described with a lot of details at [1], [2].
> >>
> >> [1]
> >> https://lore.kernel.org/lkml/CAEivzxfw1fHO2TFA4dx3u23ZKK6Q+EThfzuibrhA3RKM=ZOYLg@mail.gmail.com/
> >> [2]
> >> https://lore.kernel.org/all/[email protected]/
> >>
> >> https://github.com/ceph/ceph/pull/52575
> >> https://tracker.ceph.com/issues/62217
> >>
> >> Cc: Xiubo Li <[email protected]>
> >> Cc: Jeff Layton <[email protected]>
> >> Cc: Ilya Dryomov <[email protected]>
> >> Cc: [email protected]
> >> Co-Developed-by: Alexander Mikhalitsyn
> >> <[email protected]>
> >> Signed-off-by: Christian Brauner <[email protected]>
> >> Signed-off-by: Alexander Mikhalitsyn
> >> <[email protected]>
> >> ---
> >> v7:
> >> - reworked to use two new fields for owner UID/GID
> >> (https://github.com/ceph/ceph/pull/52575)
> >> v8:
> >> - properly handled case when old MDS used with new kernel client
> >> ---
> >> fs/ceph/mds_client.c | 46 +++++++++++++++++++++++++++++++++---
> >> fs/ceph/mds_client.h | 5 +++-
> >> include/linux/ceph/ceph_fs.h | 4 +++-
> >> 3 files changed, 50 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> >> index 8829f55103da..7d3106d3b726 100644
> >> --- a/fs/ceph/mds_client.c
> >> +++ b/fs/ceph/mds_client.c
> >> @@ -2902,6 +2902,17 @@ static void encode_mclientrequest_tail(void
> >> **p, const struct ceph_mds_request *
> >> }
> >> }
> >> +static inline u16 mds_supported_head_version(struct
> >> ceph_mds_session *session)
> >> +{
> >> + if (!test_bit(CEPHFS_FEATURE_32BITS_RETRY_FWD,
> >> &session->s_features))
> >> + return 1;
> >> +
> >> + if (!test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID,
> >> &session->s_features))
> >> + return 2;
> >> +
> >> + return CEPH_MDS_REQUEST_HEAD_VERSION;
> >> +}
> >> +
> >> static struct ceph_mds_request_head_legacy *
> >> find_legacy_request_head(void *p, u64 features)
> >> {
> >> @@ -2923,6 +2934,7 @@ static struct ceph_msg
> >> *create_request_message(struct ceph_mds_session *session,
> >> {
> >> int mds = session->s_mds;
> >> struct ceph_mds_client *mdsc = session->s_mdsc;
> >> + struct ceph_client *cl = mdsc->fsc->client;
> >> struct ceph_msg *msg;
> >> struct ceph_mds_request_head_legacy *lhead;
> >> const char *path1 = NULL;
> >> @@ -2936,7 +2948,7 @@ static struct ceph_msg
> >> *create_request_message(struct ceph_mds_session *session,
> >> void *p, *end;
> >> int ret;
> >> bool legacy = !(session->s_con.peer_features &
> >> CEPH_FEATURE_FS_BTIME);
> >> - bool old_version = !test_bit(CEPHFS_FEATURE_32BITS_RETRY_FWD,
> >> &session->s_features);
> >> + u16 request_head_version = mds_supported_head_version(session);
> >> ret = set_request_path_attr(mdsc, req->r_inode, req->r_dentry,
> >> req->r_parent, req->r_path1, req->r_ino1.ino,
> >> @@ -2977,8 +2989,10 @@ static struct ceph_msg
> >> *create_request_message(struct ceph_mds_session *session,
> >> */
> >> if (legacy)
> >> len = sizeof(struct ceph_mds_request_head_legacy);
> >> - else if (old_version)
> >> + else if (request_head_version == 1)
> >> len = sizeof(struct ceph_mds_request_head_old);
> >> + else if (request_head_version == 2)
> >> + len = offsetofend(struct ceph_mds_request_head, ext_num_fwd);
> >> else
> >> len = sizeof(struct ceph_mds_request_head);
> >
> > This is not what we suppose to. If we do this again and again when
> > adding new members it will make the code very complicated to maintain.
> >
> > Once the CEPHFS_FEATURE_32BITS_RETRY_FWD has been supported the ceph
> > should correctly decode it and if CEPHFS_FEATURE_HAS_OWNER_UIDGID is
> > not supported the decoder should skip it directly.
> >
> > Is the MDS side buggy ? Why you last version didn't work ?
> >
>
> I think the ceph side is buggy. Possibly we should add one new `length`
> member in struct `struct ceph_mds_request_head` and just skip the extra
> bytes when decoding it.

Hm, I think I found something suspicious. In cephfs code we have many
places that
call the DECODE_FINISH macro, but in our decoder we don't have it.

From documentation it follows that DECODE_FINISH purpose is precisely
about this problem.

What do you think?

>
> Could you fix it together with your ceph PR ?
>
> Thanks
>
> - Xiubo
>
>
> > Thanks
> >
> > - Xiubo
> >
> >> @@ -3028,6 +3042,16 @@ static struct ceph_msg
> >> *create_request_message(struct ceph_mds_session *session,
> >> lhead = find_legacy_request_head(msg->front.iov_base,
> >> session->s_con.peer_features);
> >> + if ((req->r_mnt_idmap != &nop_mnt_idmap) &&
> >> + !test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID,
> >> &session->s_features)) {
> >> + pr_err_ratelimited_client(cl,
> >> + "idmapped mount is used and
> >> CEPHFS_FEATURE_HAS_OWNER_UIDGID"
> >> + " is not supported by MDS. Fail request with -EIO.\n");
> >> +
> >> + ret = -EIO;
> >> + goto out_err;
> >> + }
> >> +
> >> /*
> >> * The ceph_mds_request_head_legacy didn't contain a version
> >> field, and
> >> * one was added when we moved the message version from 3->4.
> >> @@ -3035,17 +3059,33 @@ static struct ceph_msg
> >> *create_request_message(struct ceph_mds_session *session,
> >> if (legacy) {
> >> msg->hdr.version = cpu_to_le16(3);
> >> p = msg->front.iov_base + sizeof(*lhead);
> >> - } else if (old_version) {
> >> + } else if (request_head_version == 1) {
> >> struct ceph_mds_request_head_old *ohead = msg->front.iov_base;
> >> msg->hdr.version = cpu_to_le16(4);
> >> ohead->version = cpu_to_le16(1);
> >> p = msg->front.iov_base + sizeof(*ohead);
> >> + } else if (request_head_version == 2) {
> >> + struct ceph_mds_request_head *nhead = msg->front.iov_base;
> >> +
> >> + msg->hdr.version = cpu_to_le16(6);
> >> + nhead->version = cpu_to_le16(2);
> >> +
> >> + p = msg->front.iov_base + offsetofend(struct
> >> ceph_mds_request_head, ext_num_fwd);
> >> } else {
> >> struct ceph_mds_request_head *nhead = msg->front.iov_base;
> >> + kuid_t owner_fsuid;
> >> + kgid_t owner_fsgid;
> >> msg->hdr.version = cpu_to_le16(6);
> >> nhead->version = cpu_to_le16(CEPH_MDS_REQUEST_HEAD_VERSION);
> >> +
> >> + owner_fsuid = from_vfsuid(req->r_mnt_idmap, &init_user_ns,
> >> + VFSUIDT_INIT(req->r_cred->fsuid));
> >> + owner_fsgid = from_vfsgid(req->r_mnt_idmap, &init_user_ns,
> >> + VFSGIDT_INIT(req->r_cred->fsgid));
> >> + nhead->owner_uid = cpu_to_le32(from_kuid(&init_user_ns,
> >> owner_fsuid));
> >> + nhead->owner_gid = cpu_to_le32(from_kgid(&init_user_ns,
> >> owner_fsgid));
> >> p = msg->front.iov_base + sizeof(*nhead);
> >> }
> >> diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
> >> index e3bbf3ba8ee8..8f683e8203bd 100644
> >> --- a/fs/ceph/mds_client.h
> >> +++ b/fs/ceph/mds_client.h
> >> @@ -33,8 +33,10 @@ enum ceph_feature_type {
> >> CEPHFS_FEATURE_NOTIFY_SESSION_STATE,
> >> CEPHFS_FEATURE_OP_GETVXATTR,
> >> CEPHFS_FEATURE_32BITS_RETRY_FWD,
> >> + CEPHFS_FEATURE_NEW_SNAPREALM_INFO,
> >> + CEPHFS_FEATURE_HAS_OWNER_UIDGID,
> >> - CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_32BITS_RETRY_FWD,
> >> + CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_HAS_OWNER_UIDGID,
> >> };
> >> #define CEPHFS_FEATURES_CLIENT_SUPPORTED { \
> >> @@ -49,6 +51,7 @@ enum ceph_feature_type {
> >> CEPHFS_FEATURE_NOTIFY_SESSION_STATE, \
> >> CEPHFS_FEATURE_OP_GETVXATTR, \
> >> CEPHFS_FEATURE_32BITS_RETRY_FWD, \
> >> + CEPHFS_FEATURE_HAS_OWNER_UIDGID, \
> >> }
> >> /*
> >> diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h
> >> index 5f2301ee88bc..6eb83a51341c 100644
> >> --- a/include/linux/ceph/ceph_fs.h
> >> +++ b/include/linux/ceph/ceph_fs.h
> >> @@ -499,7 +499,7 @@ struct ceph_mds_request_head_legacy {
> >> union ceph_mds_request_args args;
> >> } __attribute__ ((packed));
> >> -#define CEPH_MDS_REQUEST_HEAD_VERSION 2
> >> +#define CEPH_MDS_REQUEST_HEAD_VERSION 3
> >> struct ceph_mds_request_head_old {
> >> __le16 version; /* struct version */
> >> @@ -530,6 +530,8 @@ struct ceph_mds_request_head {
> >> __le32 ext_num_retry; /* new count retry attempts */
> >> __le32 ext_num_fwd; /* new count fwd attempts */
> >> +
> >> + __le32 owner_uid, owner_gid; /* used for OPs which create
> >> inodes */
> >> } __attribute__ ((packed));
> >> /* cap/lease release record */
>

2023-08-04 07:52:02

by Aleksandr Mikhalitsyn

[permalink] [raw]
Subject: Re: [PATCH v8 03/12] ceph: handle idmapped mounts in create_request_message()

On Fri, Aug 4, 2023 at 4:26 AM Xiubo Li <[email protected]> wrote:
>
>
> On 8/3/23 21:59, Alexander Mikhalitsyn wrote:
> > From: Christian Brauner <[email protected]>
> >
> > Inode operations that create a new filesystem object such as ->mknod,
> > ->create, ->mkdir() and others don't take a {g,u}id argument explicitly.
> > Instead the caller's fs{g,u}id is used for the {g,u}id of the new
> > filesystem object.
> >
> > In order to ensure that the correct {g,u}id is used map the caller's
> > fs{g,u}id for creation requests. This doesn't require complex changes.
> > It suffices to pass in the relevant idmapping recorded in the request
> > message. If this request message was triggered from an inode operation
> > that creates filesystem objects it will have passed down the relevant
> > idmaping. If this is a request message that was triggered from an inode
> > operation that doens't need to take idmappings into account the initial
> > idmapping is passed down which is an identity mapping.
> >
> > This change uses a new cephfs protocol extension CEPHFS_FEATURE_HAS_OWNER_UIDGID
> > which adds two new fields (owner_{u,g}id) to the request head structure.
> > So, we need to ensure that MDS supports it otherwise we need to fail
> > any IO that comes through an idmapped mount because we can't process it
> > in a proper way. MDS server without such an extension will use caller_{u,g}id
> > fields to set a new inode owner UID/GID which is incorrect because caller_{u,g}id
> > values are unmapped. At the same time we can't map these fields with an
> > idmapping as it can break UID/GID-based permission checks logic on the
> > MDS side. This problem was described with a lot of details at [1], [2].
> >
> > [1] https://lore.kernel.org/lkml/CAEivzxfw1fHO2TFA4dx3u23ZKK6Q+EThfzuibrhA3RKM=ZOYLg@mail.gmail.com/
> > [2] https://lore.kernel.org/all/[email protected]/
> >
> > https://github.com/ceph/ceph/pull/52575
> > https://tracker.ceph.com/issues/62217
> >
> > Cc: Xiubo Li <[email protected]>
> > Cc: Jeff Layton <[email protected]>
> > Cc: Ilya Dryomov <[email protected]>
> > Cc: [email protected]
> > Co-Developed-by: Alexander Mikhalitsyn <[email protected]>
> > Signed-off-by: Christian Brauner <[email protected]>
> > Signed-off-by: Alexander Mikhalitsyn <[email protected]>
> > ---
> > v7:
> > - reworked to use two new fields for owner UID/GID (https://github.com/ceph/ceph/pull/52575)
> > v8:
> > - properly handled case when old MDS used with new kernel client
> > ---
> > fs/ceph/mds_client.c | 46 +++++++++++++++++++++++++++++++++---
> > fs/ceph/mds_client.h | 5 +++-
> > include/linux/ceph/ceph_fs.h | 4 +++-
> > 3 files changed, 50 insertions(+), 5 deletions(-)
> >
> > diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> > index 8829f55103da..7d3106d3b726 100644
> > --- a/fs/ceph/mds_client.c
> > +++ b/fs/ceph/mds_client.c
> > @@ -2902,6 +2902,17 @@ static void encode_mclientrequest_tail(void **p, const struct ceph_mds_request *
> > }
> > }
> >
> > +static inline u16 mds_supported_head_version(struct ceph_mds_session *session)
> > +{
> > + if (!test_bit(CEPHFS_FEATURE_32BITS_RETRY_FWD, &session->s_features))
> > + return 1;
> > +
> > + if (!test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID, &session->s_features))
> > + return 2;
> > +
> > + return CEPH_MDS_REQUEST_HEAD_VERSION;
> > +}
> > +
> > static struct ceph_mds_request_head_legacy *
> > find_legacy_request_head(void *p, u64 features)
> > {
> > @@ -2923,6 +2934,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
> > {
> > int mds = session->s_mds;
> > struct ceph_mds_client *mdsc = session->s_mdsc;
> > + struct ceph_client *cl = mdsc->fsc->client;
> > struct ceph_msg *msg;
> > struct ceph_mds_request_head_legacy *lhead;
> > const char *path1 = NULL;
> > @@ -2936,7 +2948,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
> > void *p, *end;
> > int ret;
> > bool legacy = !(session->s_con.peer_features & CEPH_FEATURE_FS_BTIME);
> > - bool old_version = !test_bit(CEPHFS_FEATURE_32BITS_RETRY_FWD, &session->s_features);
> > + u16 request_head_version = mds_supported_head_version(session);
> >
> > ret = set_request_path_attr(mdsc, req->r_inode, req->r_dentry,
> > req->r_parent, req->r_path1, req->r_ino1.ino,
> > @@ -2977,8 +2989,10 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
> > */
> > if (legacy)
> > len = sizeof(struct ceph_mds_request_head_legacy);
> > - else if (old_version)
> > + else if (request_head_version == 1)
> > len = sizeof(struct ceph_mds_request_head_old);
> > + else if (request_head_version == 2)
> > + len = offsetofend(struct ceph_mds_request_head, ext_num_fwd);
> > else
> > len = sizeof(struct ceph_mds_request_head);
> >
>
> This is not what we suppose to. If we do this again and again when
> adding new members it will make the code very complicated to maintain.
>
> Once the CEPHFS_FEATURE_32BITS_RETRY_FWD has been supported the ceph
> should correctly decode it and if CEPHFS_FEATURE_HAS_OWNER_UIDGID is not
> supported the decoder should skip it directly.

I thought that too. But it doesn't work. Just try - take kernel client
testing branch, and then
add a new field to the struct ceph_mds_request_head. Compile and try to mount.
It will stop to work and on the MDS side you will see something like:

2023-08-03T13:15:40.871+0200 7fe64ef5e640 10 mds.c ms_handle_accept
v1:192.168.2.136:0/49354629 con 0x563962206880 session 0x563967054000
2023-08-03T13:15:40.871+0200 7fe650f62640 -1 failed to decode message
of type 24 v6: End of buffer [buffer:2]
2023-08-03T13:15:40.871+0200 7fe650f62640 1 dump:
00000000 03 00 01 00 00 00 00 00 00 00 10 00 00 00 00 00 |................|
00000010 00 00 00 00 00 00 01 01 00 00 00 00 00 00 00 00 |................|
00000020 00 00 00 00 00 00 00 00 00 00 01 00 00 00 00 00 |................|
00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000070 00 00 01 01 00 00 00 00 00 00 00 00 00 00 00 01 |................|
00000080 00 00 00 00 00 00 00 00 00 00 00 00 5b 8c cb 64 |............[..d|
00000090 64 78 11 13 01 00 00 00 00 00 00 00 00 00 00 00 |dx..............|
000000a0 00 00 00 00 00 00 00 00 00 00 00 00 |............|
000000ac

As I understand, the MDS side is not ready to see struct
ceph_mds_request_head bigger in size
than supported.

>
> Is the MDS side buggy ? Why you last version didn't work ?
>
> Thanks
>
> - Xiubo
>
> > @@ -3028,6 +3042,16 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
> > lhead = find_legacy_request_head(msg->front.iov_base,
> > session->s_con.peer_features);
> >
> > + if ((req->r_mnt_idmap != &nop_mnt_idmap) &&
> > + !test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID, &session->s_features)) {
> > + pr_err_ratelimited_client(cl,
> > + "idmapped mount is used and CEPHFS_FEATURE_HAS_OWNER_UIDGID"
> > + " is not supported by MDS. Fail request with -EIO.\n");
> > +
> > + ret = -EIO;
> > + goto out_err;
> > + }
> > +
> > /*
> > * The ceph_mds_request_head_legacy didn't contain a version field, and
> > * one was added when we moved the message version from 3->4.
> > @@ -3035,17 +3059,33 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
> > if (legacy) {
> > msg->hdr.version = cpu_to_le16(3);
> > p = msg->front.iov_base + sizeof(*lhead);
> > - } else if (old_version) {
> > + } else if (request_head_version == 1) {
> > struct ceph_mds_request_head_old *ohead = msg->front.iov_base;
> >
> > msg->hdr.version = cpu_to_le16(4);
> > ohead->version = cpu_to_le16(1);
> > p = msg->front.iov_base + sizeof(*ohead);
> > + } else if (request_head_version == 2) {
> > + struct ceph_mds_request_head *nhead = msg->front.iov_base;
> > +
> > + msg->hdr.version = cpu_to_le16(6);
> > + nhead->version = cpu_to_le16(2);
> > +
> > + p = msg->front.iov_base + offsetofend(struct ceph_mds_request_head, ext_num_fwd);
> > } else {
> > struct ceph_mds_request_head *nhead = msg->front.iov_base;
> > + kuid_t owner_fsuid;
> > + kgid_t owner_fsgid;
> >
> > msg->hdr.version = cpu_to_le16(6);
> > nhead->version = cpu_to_le16(CEPH_MDS_REQUEST_HEAD_VERSION);
> > +
> > + owner_fsuid = from_vfsuid(req->r_mnt_idmap, &init_user_ns,
> > + VFSUIDT_INIT(req->r_cred->fsuid));
> > + owner_fsgid = from_vfsgid(req->r_mnt_idmap, &init_user_ns,
> > + VFSGIDT_INIT(req->r_cred->fsgid));
> > + nhead->owner_uid = cpu_to_le32(from_kuid(&init_user_ns, owner_fsuid));
> > + nhead->owner_gid = cpu_to_le32(from_kgid(&init_user_ns, owner_fsgid));
> > p = msg->front.iov_base + sizeof(*nhead);
> > }
> >
> > diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
> > index e3bbf3ba8ee8..8f683e8203bd 100644
> > --- a/fs/ceph/mds_client.h
> > +++ b/fs/ceph/mds_client.h
> > @@ -33,8 +33,10 @@ enum ceph_feature_type {
> > CEPHFS_FEATURE_NOTIFY_SESSION_STATE,
> > CEPHFS_FEATURE_OP_GETVXATTR,
> > CEPHFS_FEATURE_32BITS_RETRY_FWD,
> > + CEPHFS_FEATURE_NEW_SNAPREALM_INFO,
> > + CEPHFS_FEATURE_HAS_OWNER_UIDGID,
> >
> > - CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_32BITS_RETRY_FWD,
> > + CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_HAS_OWNER_UIDGID,
> > };
> >
> > #define CEPHFS_FEATURES_CLIENT_SUPPORTED { \
> > @@ -49,6 +51,7 @@ enum ceph_feature_type {
> > CEPHFS_FEATURE_NOTIFY_SESSION_STATE, \
> > CEPHFS_FEATURE_OP_GETVXATTR, \
> > CEPHFS_FEATURE_32BITS_RETRY_FWD, \
> > + CEPHFS_FEATURE_HAS_OWNER_UIDGID, \
> > }
> >
> > /*
> > diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h
> > index 5f2301ee88bc..6eb83a51341c 100644
> > --- a/include/linux/ceph/ceph_fs.h
> > +++ b/include/linux/ceph/ceph_fs.h
> > @@ -499,7 +499,7 @@ struct ceph_mds_request_head_legacy {
> > union ceph_mds_request_args args;
> > } __attribute__ ((packed));
> >
> > -#define CEPH_MDS_REQUEST_HEAD_VERSION 2
> > +#define CEPH_MDS_REQUEST_HEAD_VERSION 3
> >
> > struct ceph_mds_request_head_old {
> > __le16 version; /* struct version */
> > @@ -530,6 +530,8 @@ struct ceph_mds_request_head {
> >
> > __le32 ext_num_retry; /* new count retry attempts */
> > __le32 ext_num_fwd; /* new count fwd attempts */
> > +
> > + __le32 owner_uid, owner_gid; /* used for OPs which create inodes */
> > } __attribute__ ((packed));
> >
> > /* cap/lease release record */
>

2023-08-04 10:22:51

by Aleksandr Mikhalitsyn

[permalink] [raw]
Subject: Re: [PATCH v8 03/12] ceph: handle idmapped mounts in create_request_message()

On Fri, Aug 4, 2023 at 8:43 AM Aleksandr Mikhalitsyn
<[email protected]> wrote:
>
> On Fri, Aug 4, 2023 at 8:35 AM Aleksandr Mikhalitsyn
> <[email protected]> wrote:
> >
> > On Fri, Aug 4, 2023 at 5:24 AM Xiubo Li <[email protected]> wrote:
> > >
> > >
> > > On 8/4/23 10:26, Xiubo Li wrote:
> > > >
> > > > On 8/3/23 21:59, Alexander Mikhalitsyn wrote:
> > > >> From: Christian Brauner <[email protected]>
> > > >>
> > > >> Inode operations that create a new filesystem object such as ->mknod,
> > > >> ->create, ->mkdir() and others don't take a {g,u}id argument explicitly.
> > > >> Instead the caller's fs{g,u}id is used for the {g,u}id of the new
> > > >> filesystem object.
> > > >>
> > > >> In order to ensure that the correct {g,u}id is used map the caller's
> > > >> fs{g,u}id for creation requests. This doesn't require complex changes.
> > > >> It suffices to pass in the relevant idmapping recorded in the request
> > > >> message. If this request message was triggered from an inode operation
> > > >> that creates filesystem objects it will have passed down the relevant
> > > >> idmaping. If this is a request message that was triggered from an inode
> > > >> operation that doens't need to take idmappings into account the initial
> > > >> idmapping is passed down which is an identity mapping.
> > > >>
> > > >> This change uses a new cephfs protocol extension
> > > >> CEPHFS_FEATURE_HAS_OWNER_UIDGID
> > > >> which adds two new fields (owner_{u,g}id) to the request head structure.
> > > >> So, we need to ensure that MDS supports it otherwise we need to fail
> > > >> any IO that comes through an idmapped mount because we can't process it
> > > >> in a proper way. MDS server without such an extension will use
> > > >> caller_{u,g}id
> > > >> fields to set a new inode owner UID/GID which is incorrect because
> > > >> caller_{u,g}id
> > > >> values are unmapped. At the same time we can't map these fields with an
> > > >> idmapping as it can break UID/GID-based permission checks logic on the
> > > >> MDS side. This problem was described with a lot of details at [1], [2].
> > > >>
> > > >> [1]
> > > >> https://lore.kernel.org/lkml/CAEivzxfw1fHO2TFA4dx3u23ZKK6Q+EThfzuibrhA3RKM=ZOYLg@mail.gmail.com/
> > > >> [2]
> > > >> https://lore.kernel.org/all/[email protected]/
> > > >>
> > > >> https://github.com/ceph/ceph/pull/52575
> > > >> https://tracker.ceph.com/issues/62217
> > > >>
> > > >> Cc: Xiubo Li <[email protected]>
> > > >> Cc: Jeff Layton <[email protected]>
> > > >> Cc: Ilya Dryomov <[email protected]>
> > > >> Cc: [email protected]
> > > >> Co-Developed-by: Alexander Mikhalitsyn
> > > >> <[email protected]>
> > > >> Signed-off-by: Christian Brauner <[email protected]>
> > > >> Signed-off-by: Alexander Mikhalitsyn
> > > >> <[email protected]>
> > > >> ---
> > > >> v7:
> > > >> - reworked to use two new fields for owner UID/GID
> > > >> (https://github.com/ceph/ceph/pull/52575)
> > > >> v8:
> > > >> - properly handled case when old MDS used with new kernel client
> > > >> ---
> > > >> fs/ceph/mds_client.c | 46 +++++++++++++++++++++++++++++++++---
> > > >> fs/ceph/mds_client.h | 5 +++-
> > > >> include/linux/ceph/ceph_fs.h | 4 +++-
> > > >> 3 files changed, 50 insertions(+), 5 deletions(-)
> > > >>
> > > >> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> > > >> index 8829f55103da..7d3106d3b726 100644
> > > >> --- a/fs/ceph/mds_client.c
> > > >> +++ b/fs/ceph/mds_client.c
> > > >> @@ -2902,6 +2902,17 @@ static void encode_mclientrequest_tail(void
> > > >> **p, const struct ceph_mds_request *
> > > >> }
> > > >> }
> > > >> +static inline u16 mds_supported_head_version(struct
> > > >> ceph_mds_session *session)
> > > >> +{
> > > >> + if (!test_bit(CEPHFS_FEATURE_32BITS_RETRY_FWD,
> > > >> &session->s_features))
> > > >> + return 1;
> > > >> +
> > > >> + if (!test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID,
> > > >> &session->s_features))
> > > >> + return 2;
> > > >> +
> > > >> + return CEPH_MDS_REQUEST_HEAD_VERSION;
> > > >> +}
> > > >> +
> > > >> static struct ceph_mds_request_head_legacy *
> > > >> find_legacy_request_head(void *p, u64 features)
> > > >> {
> > > >> @@ -2923,6 +2934,7 @@ static struct ceph_msg
> > > >> *create_request_message(struct ceph_mds_session *session,
> > > >> {
> > > >> int mds = session->s_mds;
> > > >> struct ceph_mds_client *mdsc = session->s_mdsc;
> > > >> + struct ceph_client *cl = mdsc->fsc->client;
> > > >> struct ceph_msg *msg;
> > > >> struct ceph_mds_request_head_legacy *lhead;
> > > >> const char *path1 = NULL;
> > > >> @@ -2936,7 +2948,7 @@ static struct ceph_msg
> > > >> *create_request_message(struct ceph_mds_session *session,
> > > >> void *p, *end;
> > > >> int ret;
> > > >> bool legacy = !(session->s_con.peer_features &
> > > >> CEPH_FEATURE_FS_BTIME);
> > > >> - bool old_version = !test_bit(CEPHFS_FEATURE_32BITS_RETRY_FWD,
> > > >> &session->s_features);
> > > >> + u16 request_head_version = mds_supported_head_version(session);
> > > >> ret = set_request_path_attr(mdsc, req->r_inode, req->r_dentry,
> > > >> req->r_parent, req->r_path1, req->r_ino1.ino,
> > > >> @@ -2977,8 +2989,10 @@ static struct ceph_msg
> > > >> *create_request_message(struct ceph_mds_session *session,
> > > >> */
> > > >> if (legacy)
> > > >> len = sizeof(struct ceph_mds_request_head_legacy);
> > > >> - else if (old_version)
> > > >> + else if (request_head_version == 1)
> > > >> len = sizeof(struct ceph_mds_request_head_old);
> > > >> + else if (request_head_version == 2)
> > > >> + len = offsetofend(struct ceph_mds_request_head, ext_num_fwd);
> > > >> else
> > > >> len = sizeof(struct ceph_mds_request_head);
> > > >
> > > > This is not what we suppose to. If we do this again and again when
> > > > adding new members it will make the code very complicated to maintain.
> > > >
> > > > Once the CEPHFS_FEATURE_32BITS_RETRY_FWD has been supported the ceph
> > > > should correctly decode it and if CEPHFS_FEATURE_HAS_OWNER_UIDGID is
> > > > not supported the decoder should skip it directly.
> > > >
> > > > Is the MDS side buggy ? Why you last version didn't work ?
> > > >
> > >
> > > I think the ceph side is buggy. Possibly we should add one new `length`
> > > member in struct `struct ceph_mds_request_head` and just skip the extra
> > > bytes when decoding it.
> >
> > Hm, I think I found something suspicious. In cephfs code we have many
> > places that
> > call the DECODE_FINISH macro, but in our decoder we don't have it.
> >
> > From documentation it follows that DECODE_FINISH purpose is precisely
> > about this problem.
> >
> > What do you think?
>
> Upd: this thing also changes on-wire format and adds field to store length.
> But this will be a massive and incompatible protocol change. I don't think that
> we want to do this in the scope of this task.

https://github.com/ceph/ceph/pull/52575#issuecomment-1665141641

>
> >
> > >
> > > Could you fix it together with your ceph PR ?
> > >
> > > Thanks
> > >
> > > - Xiubo
> > >
> > >
> > > > Thanks
> > > >
> > > > - Xiubo
> > > >
> > > >> @@ -3028,6 +3042,16 @@ static struct ceph_msg
> > > >> *create_request_message(struct ceph_mds_session *session,
> > > >> lhead = find_legacy_request_head(msg->front.iov_base,
> > > >> session->s_con.peer_features);
> > > >> + if ((req->r_mnt_idmap != &nop_mnt_idmap) &&
> > > >> + !test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID,
> > > >> &session->s_features)) {
> > > >> + pr_err_ratelimited_client(cl,
> > > >> + "idmapped mount is used and
> > > >> CEPHFS_FEATURE_HAS_OWNER_UIDGID"
> > > >> + " is not supported by MDS. Fail request with -EIO.\n");
> > > >> +
> > > >> + ret = -EIO;
> > > >> + goto out_err;
> > > >> + }
> > > >> +
> > > >> /*
> > > >> * The ceph_mds_request_head_legacy didn't contain a version
> > > >> field, and
> > > >> * one was added when we moved the message version from 3->4.
> > > >> @@ -3035,17 +3059,33 @@ static struct ceph_msg
> > > >> *create_request_message(struct ceph_mds_session *session,
> > > >> if (legacy) {
> > > >> msg->hdr.version = cpu_to_le16(3);
> > > >> p = msg->front.iov_base + sizeof(*lhead);
> > > >> - } else if (old_version) {
> > > >> + } else if (request_head_version == 1) {
> > > >> struct ceph_mds_request_head_old *ohead = msg->front.iov_base;
> > > >> msg->hdr.version = cpu_to_le16(4);
> > > >> ohead->version = cpu_to_le16(1);
> > > >> p = msg->front.iov_base + sizeof(*ohead);
> > > >> + } else if (request_head_version == 2) {
> > > >> + struct ceph_mds_request_head *nhead = msg->front.iov_base;
> > > >> +
> > > >> + msg->hdr.version = cpu_to_le16(6);
> > > >> + nhead->version = cpu_to_le16(2);
> > > >> +
> > > >> + p = msg->front.iov_base + offsetofend(struct
> > > >> ceph_mds_request_head, ext_num_fwd);
> > > >> } else {
> > > >> struct ceph_mds_request_head *nhead = msg->front.iov_base;
> > > >> + kuid_t owner_fsuid;
> > > >> + kgid_t owner_fsgid;
> > > >> msg->hdr.version = cpu_to_le16(6);
> > > >> nhead->version = cpu_to_le16(CEPH_MDS_REQUEST_HEAD_VERSION);
> > > >> +
> > > >> + owner_fsuid = from_vfsuid(req->r_mnt_idmap, &init_user_ns,
> > > >> + VFSUIDT_INIT(req->r_cred->fsuid));
> > > >> + owner_fsgid = from_vfsgid(req->r_mnt_idmap, &init_user_ns,
> > > >> + VFSGIDT_INIT(req->r_cred->fsgid));
> > > >> + nhead->owner_uid = cpu_to_le32(from_kuid(&init_user_ns,
> > > >> owner_fsuid));
> > > >> + nhead->owner_gid = cpu_to_le32(from_kgid(&init_user_ns,
> > > >> owner_fsgid));
> > > >> p = msg->front.iov_base + sizeof(*nhead);
> > > >> }
> > > >> diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
> > > >> index e3bbf3ba8ee8..8f683e8203bd 100644
> > > >> --- a/fs/ceph/mds_client.h
> > > >> +++ b/fs/ceph/mds_client.h
> > > >> @@ -33,8 +33,10 @@ enum ceph_feature_type {
> > > >> CEPHFS_FEATURE_NOTIFY_SESSION_STATE,
> > > >> CEPHFS_FEATURE_OP_GETVXATTR,
> > > >> CEPHFS_FEATURE_32BITS_RETRY_FWD,
> > > >> + CEPHFS_FEATURE_NEW_SNAPREALM_INFO,
> > > >> + CEPHFS_FEATURE_HAS_OWNER_UIDGID,
> > > >> - CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_32BITS_RETRY_FWD,
> > > >> + CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_HAS_OWNER_UIDGID,
> > > >> };
> > > >> #define CEPHFS_FEATURES_CLIENT_SUPPORTED { \
> > > >> @@ -49,6 +51,7 @@ enum ceph_feature_type {
> > > >> CEPHFS_FEATURE_NOTIFY_SESSION_STATE, \
> > > >> CEPHFS_FEATURE_OP_GETVXATTR, \
> > > >> CEPHFS_FEATURE_32BITS_RETRY_FWD, \
> > > >> + CEPHFS_FEATURE_HAS_OWNER_UIDGID, \
> > > >> }
> > > >> /*
> > > >> diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h
> > > >> index 5f2301ee88bc..6eb83a51341c 100644
> > > >> --- a/include/linux/ceph/ceph_fs.h
> > > >> +++ b/include/linux/ceph/ceph_fs.h
> > > >> @@ -499,7 +499,7 @@ struct ceph_mds_request_head_legacy {
> > > >> union ceph_mds_request_args args;
> > > >> } __attribute__ ((packed));
> > > >> -#define CEPH_MDS_REQUEST_HEAD_VERSION 2
> > > >> +#define CEPH_MDS_REQUEST_HEAD_VERSION 3
> > > >> struct ceph_mds_request_head_old {
> > > >> __le16 version; /* struct version */
> > > >> @@ -530,6 +530,8 @@ struct ceph_mds_request_head {
> > > >> __le32 ext_num_retry; /* new count retry attempts */
> > > >> __le32 ext_num_fwd; /* new count fwd attempts */
> > > >> +
> > > >> + __le32 owner_uid, owner_gid; /* used for OPs which create
> > > >> inodes */
> > > >> } __attribute__ ((packed));
> > > >> /* cap/lease release record */
> > >

2023-08-07 01:14:21

by Xiubo Li

[permalink] [raw]
Subject: Re: [PATCH v8 03/12] ceph: handle idmapped mounts in create_request_message()


On 8/4/23 14:35, Aleksandr Mikhalitsyn wrote:
> On Fri, Aug 4, 2023 at 5:24 AM Xiubo Li <[email protected]> wrote:
>>
>> On 8/4/23 10:26, Xiubo Li wrote:
>>> On 8/3/23 21:59, Alexander Mikhalitsyn wrote:
>>>> From: Christian Brauner <[email protected]>
>>>>
>>>> Inode operations that create a new filesystem object such as ->mknod,
>>>> ->create, ->mkdir() and others don't take a {g,u}id argument explicitly.
>>>> Instead the caller's fs{g,u}id is used for the {g,u}id of the new
>>>> filesystem object.
>>>>
>>>> In order to ensure that the correct {g,u}id is used map the caller's
>>>> fs{g,u}id for creation requests. This doesn't require complex changes.
>>>> It suffices to pass in the relevant idmapping recorded in the request
>>>> message. If this request message was triggered from an inode operation
>>>> that creates filesystem objects it will have passed down the relevant
>>>> idmaping. If this is a request message that was triggered from an inode
>>>> operation that doens't need to take idmappings into account the initial
>>>> idmapping is passed down which is an identity mapping.
>>>>
>>>> This change uses a new cephfs protocol extension
>>>> CEPHFS_FEATURE_HAS_OWNER_UIDGID
>>>> which adds two new fields (owner_{u,g}id) to the request head structure.
>>>> So, we need to ensure that MDS supports it otherwise we need to fail
>>>> any IO that comes through an idmapped mount because we can't process it
>>>> in a proper way. MDS server without such an extension will use
>>>> caller_{u,g}id
>>>> fields to set a new inode owner UID/GID which is incorrect because
>>>> caller_{u,g}id
>>>> values are unmapped. At the same time we can't map these fields with an
>>>> idmapping as it can break UID/GID-based permission checks logic on the
>>>> MDS side. This problem was described with a lot of details at [1], [2].
>>>>
>>>> [1]
>>>> https://lore.kernel.org/lkml/CAEivzxfw1fHO2TFA4dx3u23ZKK6Q+EThfzuibrhA3RKM=ZOYLg@mail.gmail.com/
>>>> [2]
>>>> https://lore.kernel.org/all/[email protected]/
>>>>
>>>> https://github.com/ceph/ceph/pull/52575
>>>> https://tracker.ceph.com/issues/62217
>>>>
>>>> Cc: Xiubo Li <[email protected]>
>>>> Cc: Jeff Layton <[email protected]>
>>>> Cc: Ilya Dryomov <[email protected]>
>>>> Cc: [email protected]
>>>> Co-Developed-by: Alexander Mikhalitsyn
>>>> <[email protected]>
>>>> Signed-off-by: Christian Brauner <[email protected]>
>>>> Signed-off-by: Alexander Mikhalitsyn
>>>> <[email protected]>
>>>> ---
>>>> v7:
>>>> - reworked to use two new fields for owner UID/GID
>>>> (https://github.com/ceph/ceph/pull/52575)
>>>> v8:
>>>> - properly handled case when old MDS used with new kernel client
>>>> ---
>>>> fs/ceph/mds_client.c | 46 +++++++++++++++++++++++++++++++++---
>>>> fs/ceph/mds_client.h | 5 +++-
>>>> include/linux/ceph/ceph_fs.h | 4 +++-
>>>> 3 files changed, 50 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
>>>> index 8829f55103da..7d3106d3b726 100644
>>>> --- a/fs/ceph/mds_client.c
>>>> +++ b/fs/ceph/mds_client.c
>>>> @@ -2902,6 +2902,17 @@ static void encode_mclientrequest_tail(void
>>>> **p, const struct ceph_mds_request *
>>>> }
>>>> }
>>>> +static inline u16 mds_supported_head_version(struct
>>>> ceph_mds_session *session)
>>>> +{
>>>> + if (!test_bit(CEPHFS_FEATURE_32BITS_RETRY_FWD,
>>>> &session->s_features))
>>>> + return 1;
>>>> +
>>>> + if (!test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID,
>>>> &session->s_features))
>>>> + return 2;
>>>> +
>>>> + return CEPH_MDS_REQUEST_HEAD_VERSION;
>>>> +}
>>>> +
>>>> static struct ceph_mds_request_head_legacy *
>>>> find_legacy_request_head(void *p, u64 features)
>>>> {
>>>> @@ -2923,6 +2934,7 @@ static struct ceph_msg
>>>> *create_request_message(struct ceph_mds_session *session,
>>>> {
>>>> int mds = session->s_mds;
>>>> struct ceph_mds_client *mdsc = session->s_mdsc;
>>>> + struct ceph_client *cl = mdsc->fsc->client;
>>>> struct ceph_msg *msg;
>>>> struct ceph_mds_request_head_legacy *lhead;
>>>> const char *path1 = NULL;
>>>> @@ -2936,7 +2948,7 @@ static struct ceph_msg
>>>> *create_request_message(struct ceph_mds_session *session,
>>>> void *p, *end;
>>>> int ret;
>>>> bool legacy = !(session->s_con.peer_features &
>>>> CEPH_FEATURE_FS_BTIME);
>>>> - bool old_version = !test_bit(CEPHFS_FEATURE_32BITS_RETRY_FWD,
>>>> &session->s_features);
>>>> + u16 request_head_version = mds_supported_head_version(session);
>>>> ret = set_request_path_attr(mdsc, req->r_inode, req->r_dentry,
>>>> req->r_parent, req->r_path1, req->r_ino1.ino,
>>>> @@ -2977,8 +2989,10 @@ static struct ceph_msg
>>>> *create_request_message(struct ceph_mds_session *session,
>>>> */
>>>> if (legacy)
>>>> len = sizeof(struct ceph_mds_request_head_legacy);
>>>> - else if (old_version)
>>>> + else if (request_head_version == 1)
>>>> len = sizeof(struct ceph_mds_request_head_old);
>>>> + else if (request_head_version == 2)
>>>> + len = offsetofend(struct ceph_mds_request_head, ext_num_fwd);
>>>> else
>>>> len = sizeof(struct ceph_mds_request_head);
>>> This is not what we suppose to. If we do this again and again when
>>> adding new members it will make the code very complicated to maintain.
>>>
>>> Once the CEPHFS_FEATURE_32BITS_RETRY_FWD has been supported the ceph
>>> should correctly decode it and if CEPHFS_FEATURE_HAS_OWNER_UIDGID is
>>> not supported the decoder should skip it directly.
>>>
>>> Is the MDS side buggy ? Why you last version didn't work ?
>>>
>> I think the ceph side is buggy. Possibly we should add one new `length`
>> member in struct `struct ceph_mds_request_head` and just skip the extra
>> bytes when decoding it.
> Hm, I think I found something suspicious. In cephfs code we have many
> places that
> call the DECODE_FINISH macro, but in our decoder we don't have it.
>
> From documentation it follows that DECODE_FINISH purpose is precisely
> about this problem.
>
> What do you think?

Yeah, correct.

We also need to do it like this.

Thanks

- Xiubo


>> Could you fix it together with your ceph PR ?
>>
>> Thanks
>>
>> - Xiubo
>>
>>
>>> Thanks
>>>
>>> - Xiubo
>>>
>>>> @@ -3028,6 +3042,16 @@ static struct ceph_msg
>>>> *create_request_message(struct ceph_mds_session *session,
>>>> lhead = find_legacy_request_head(msg->front.iov_base,
>>>> session->s_con.peer_features);
>>>> + if ((req->r_mnt_idmap != &nop_mnt_idmap) &&
>>>> + !test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID,
>>>> &session->s_features)) {
>>>> + pr_err_ratelimited_client(cl,
>>>> + "idmapped mount is used and
>>>> CEPHFS_FEATURE_HAS_OWNER_UIDGID"
>>>> + " is not supported by MDS. Fail request with -EIO.\n");
>>>> +
>>>> + ret = -EIO;
>>>> + goto out_err;
>>>> + }
>>>> +
>>>> /*
>>>> * The ceph_mds_request_head_legacy didn't contain a version
>>>> field, and
>>>> * one was added when we moved the message version from 3->4.
>>>> @@ -3035,17 +3059,33 @@ static struct ceph_msg
>>>> *create_request_message(struct ceph_mds_session *session,
>>>> if (legacy) {
>>>> msg->hdr.version = cpu_to_le16(3);
>>>> p = msg->front.iov_base + sizeof(*lhead);
>>>> - } else if (old_version) {
>>>> + } else if (request_head_version == 1) {
>>>> struct ceph_mds_request_head_old *ohead = msg->front.iov_base;
>>>> msg->hdr.version = cpu_to_le16(4);
>>>> ohead->version = cpu_to_le16(1);
>>>> p = msg->front.iov_base + sizeof(*ohead);
>>>> + } else if (request_head_version == 2) {
>>>> + struct ceph_mds_request_head *nhead = msg->front.iov_base;
>>>> +
>>>> + msg->hdr.version = cpu_to_le16(6);
>>>> + nhead->version = cpu_to_le16(2);
>>>> +
>>>> + p = msg->front.iov_base + offsetofend(struct
>>>> ceph_mds_request_head, ext_num_fwd);
>>>> } else {
>>>> struct ceph_mds_request_head *nhead = msg->front.iov_base;
>>>> + kuid_t owner_fsuid;
>>>> + kgid_t owner_fsgid;
>>>> msg->hdr.version = cpu_to_le16(6);
>>>> nhead->version = cpu_to_le16(CEPH_MDS_REQUEST_HEAD_VERSION);
>>>> +
>>>> + owner_fsuid = from_vfsuid(req->r_mnt_idmap, &init_user_ns,
>>>> + VFSUIDT_INIT(req->r_cred->fsuid));
>>>> + owner_fsgid = from_vfsgid(req->r_mnt_idmap, &init_user_ns,
>>>> + VFSGIDT_INIT(req->r_cred->fsgid));
>>>> + nhead->owner_uid = cpu_to_le32(from_kuid(&init_user_ns,
>>>> owner_fsuid));
>>>> + nhead->owner_gid = cpu_to_le32(from_kgid(&init_user_ns,
>>>> owner_fsgid));
>>>> p = msg->front.iov_base + sizeof(*nhead);
>>>> }
>>>> diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
>>>> index e3bbf3ba8ee8..8f683e8203bd 100644
>>>> --- a/fs/ceph/mds_client.h
>>>> +++ b/fs/ceph/mds_client.h
>>>> @@ -33,8 +33,10 @@ enum ceph_feature_type {
>>>> CEPHFS_FEATURE_NOTIFY_SESSION_STATE,
>>>> CEPHFS_FEATURE_OP_GETVXATTR,
>>>> CEPHFS_FEATURE_32BITS_RETRY_FWD,
>>>> + CEPHFS_FEATURE_NEW_SNAPREALM_INFO,
>>>> + CEPHFS_FEATURE_HAS_OWNER_UIDGID,
>>>> - CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_32BITS_RETRY_FWD,
>>>> + CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_HAS_OWNER_UIDGID,
>>>> };
>>>> #define CEPHFS_FEATURES_CLIENT_SUPPORTED { \
>>>> @@ -49,6 +51,7 @@ enum ceph_feature_type {
>>>> CEPHFS_FEATURE_NOTIFY_SESSION_STATE, \
>>>> CEPHFS_FEATURE_OP_GETVXATTR, \
>>>> CEPHFS_FEATURE_32BITS_RETRY_FWD, \
>>>> + CEPHFS_FEATURE_HAS_OWNER_UIDGID, \
>>>> }
>>>> /*
>>>> diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h
>>>> index 5f2301ee88bc..6eb83a51341c 100644
>>>> --- a/include/linux/ceph/ceph_fs.h
>>>> +++ b/include/linux/ceph/ceph_fs.h
>>>> @@ -499,7 +499,7 @@ struct ceph_mds_request_head_legacy {
>>>> union ceph_mds_request_args args;
>>>> } __attribute__ ((packed));
>>>> -#define CEPH_MDS_REQUEST_HEAD_VERSION 2
>>>> +#define CEPH_MDS_REQUEST_HEAD_VERSION 3
>>>> struct ceph_mds_request_head_old {
>>>> __le16 version; /* struct version */
>>>> @@ -530,6 +530,8 @@ struct ceph_mds_request_head {
>>>> __le32 ext_num_retry; /* new count retry attempts */
>>>> __le32 ext_num_fwd; /* new count fwd attempts */
>>>> +
>>>> + __le32 owner_uid, owner_gid; /* used for OPs which create
>>>> inodes */
>>>> } __attribute__ ((packed));
>>>> /* cap/lease release record */