2022-03-22 18:08:48

by Jeffrey Layton

[permalink] [raw]
Subject: [RFC PATCH v11 00/51] ceph+fscrypt : full support

This patchset represents a (mostly) working prototype of the
ceph+fscrypt work. With this, I'm able run xfstests with
test_dummy_encryption, and most of the tests that pass on ceph without
fscrypt now pass on it.

When I made the last posting of this series [1], I mentioned that proper
support for sparse read support would be necessary to do this. Thus, the
biggest difference from the v10 set is that this is now based on top of
the patch series that I posted yesterday to implement sparse reads [2].

Aside from that, there are also numerous cleanups all over the tree, as
well as an overhaul of the readdir handling by Xiubo.

This series is not yet bug-free, but it's at a point where it is quite
usable, providing you're running against the Quincy release of ceph
(which should ship sometime in the next few months).

Next Steps:
===========
I'm not going to sugar-coat it. This is a huge, invasive patch series
that touches a lot of the most sensitive code in ceph.

Eric Biggers has acked the changes we need in fscrypt infrastructure. I
still need Al to ack exporting the new_inode_pseudo symbol. The rest is
pretty much all ceph and libceph code.

The main piece missing at this point is support for sparse reads with
ms_mode settings other than "crc". Once that's complete, I want to merge
that and this series into the ceph "testing" branch so we can start
running tests against it in teuthology with fscrypt enabled.

If that goes well, I think we could probably merge this into mainline
for v5.20 or v5.21. There is also some incoming support for netfs write
and DIO read helpers that we may want to convert to as well [3]. That
may alter the timing as well.

Review, comments and questions are welcome...

[1]: https://lore.kernel.org/ceph-devel/[email protected]/

[2]: https://lore.kernel.org/ceph-devel/[email protected]/

[3]: https://lore.kernel.org/ceph-devel/[email protected]/T/#maec7e3579f13a45171ad23d7a49183d169fcfcca

Jeff Layton (41):
vfs: export new_inode_pseudo
fscrypt: export fscrypt_base64url_encode and fscrypt_base64url_decode
fscrypt: export fscrypt_fname_encrypt and fscrypt_fname_encrypted_size
fscrypt: add fscrypt_context_for_new_inode
ceph: preallocate inode for ops that may create one
ceph: crypto context handling for ceph
ceph: parse new fscrypt_auth and fscrypt_file fields in inode traces
ceph: add support for fscrypt_auth/fscrypt_file to cap messages
ceph: add ability to set fscrypt_auth via setattr
ceph: implement -o test_dummy_encryption mount option
ceph: decode alternate_name in lease info
ceph: add fscrypt ioctls
ceph: make ceph_msdc_build_path use ref-walk
ceph: add encrypted fname handling to ceph_mdsc_build_path
ceph: send altname in MClientRequest
ceph: encode encrypted name in dentry release
ceph: properly set DCACHE_NOKEY_NAME flag in lookup
ceph: make d_revalidate call fscrypt revalidator for encrypted
dentries
ceph: add helpers for converting names for userland presentation
ceph: add fscrypt support to ceph_fill_trace
ceph: create symlinks with encrypted and base64-encoded targets
ceph: make ceph_get_name decrypt filenames
ceph: add a new ceph.fscrypt.auth vxattr
ceph: add some fscrypt guardrails
libceph: add CEPH_OSD_OP_ASSERT_VER support
ceph: size handling for encrypted inodes in cap updates
ceph: fscrypt_file field handling in MClientRequest messages
ceph: get file size from fscrypt_file when present in inode traces
ceph: handle fscrypt fields in cap messages from MDS
ceph: add infrastructure for file encryption and decryption
libceph: allow ceph_osdc_new_request to accept a multi-op read
ceph: disable fallocate for encrypted inodes
ceph: disable copy offload on encrypted inodes
ceph: don't use special DIO path for encrypted inodes
ceph: align data in pages in ceph_sync_write
ceph: add read/modify/write to ceph_sync_write
ceph: plumb in decryption during sync reads
ceph: add fscrypt decryption support to ceph_netfs_issue_op
ceph: set i_blkbits to crypto block size for encrypted inodes
ceph: add encryption support to writepage
ceph: fscrypt support for writepages

Luis Henriques (1):
ceph: don't allow changing layout on encrypted files/directories

Xiubo Li (9):
ceph: make the ioctl cmd more readable in debug log
ceph: fix base64 encoded name's length check in ceph_fname_to_usr()
ceph: pass the request to parse_reply_info_readdir()
ceph: add ceph_encode_encrypted_dname() helper
ceph: add support to readdir for encrypted filenames
ceph: add __ceph_get_caps helper support
ceph: add __ceph_sync_read helper support
ceph: add object version support for sync read
ceph: add truncate size handling support for fscrypt

fs/ceph/Makefile | 1 +
fs/ceph/acl.c | 4 +-
fs/ceph/addr.c | 128 ++++++--
fs/ceph/caps.c | 212 +++++++++++--
fs/ceph/crypto.c | 432 +++++++++++++++++++++++++
fs/ceph/crypto.h | 256 +++++++++++++++
fs/ceph/dir.c | 182 ++++++++---
fs/ceph/export.c | 44 ++-
fs/ceph/file.c | 530 ++++++++++++++++++++++++++-----
fs/ceph/inode.c | 546 +++++++++++++++++++++++++++++---
fs/ceph/ioctl.c | 126 +++++++-
fs/ceph/mds_client.c | 455 ++++++++++++++++++++++----
fs/ceph/mds_client.h | 24 +-
fs/ceph/super.c | 91 +++++-
fs/ceph/super.h | 43 ++-
fs/ceph/xattr.c | 29 ++
fs/crypto/fname.c | 44 ++-
fs/crypto/fscrypt_private.h | 9 +-
fs/crypto/hooks.c | 6 +-
fs/crypto/policy.c | 35 +-
fs/inode.c | 1 +
include/linux/ceph/ceph_fs.h | 21 +-
include/linux/ceph/osd_client.h | 6 +-
include/linux/ceph/rados.h | 4 +
include/linux/fscrypt.h | 10 +
net/ceph/osd_client.c | 32 +-
26 files changed, 2907 insertions(+), 364 deletions(-)
create mode 100644 fs/ceph/crypto.c
create mode 100644 fs/ceph/crypto.h

--
2.35.1


2022-03-22 18:15:21

by Jeffrey Layton

[permalink] [raw]
Subject: [RFC PATCH v11 07/51] ceph: parse new fscrypt_auth and fscrypt_file fields in inode traces

...and store them in the ceph_inode_info.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/ceph/file.c | 2 ++
fs/ceph/inode.c | 18 +++++++++++++-
fs/ceph/mds_client.c | 57 ++++++++++++++++++++++++++++++++++++++++++++
fs/ceph/mds_client.h | 4 ++++
fs/ceph/super.h | 6 +++++
5 files changed, 86 insertions(+), 1 deletion(-)

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index cccf729b55a8..5832dcea2d8c 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -629,6 +629,8 @@ static int ceph_finish_async_create(struct inode *dir, struct inode *inode,
iinfo.xattr_data = xattr_buf;
memset(iinfo.xattr_data, 0, iinfo.xattr_len);

+ /* FIXME: set fscrypt_auth and fscrypt_file */
+
in.ino = cpu_to_le64(vino.ino);
in.snapid = cpu_to_le64(CEPH_NOSNAP);
in.version = cpu_to_le64(1); // ???
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 2e0e321a58cb..2c9a482444e0 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -615,7 +615,10 @@ struct inode *ceph_alloc_inode(struct super_block *sb)
INIT_WORK(&ci->i_work, ceph_inode_work);
ci->i_work_mask = 0;
memset(&ci->i_btime, '\0', sizeof(ci->i_btime));
-
+#ifdef CONFIG_FS_ENCRYPTION
+ ci->fscrypt_auth = NULL;
+ ci->fscrypt_auth_len = 0;
+#endif
ceph_fscache_inode_init(ci);

return &ci->vfs_inode;
@@ -626,6 +629,9 @@ void ceph_free_inode(struct inode *inode)
struct ceph_inode_info *ci = ceph_inode(inode);

kfree(ci->i_symlink);
+#ifdef CONFIG_FS_ENCRYPTION
+ kfree(ci->fscrypt_auth);
+#endif
kmem_cache_free(ceph_inode_cachep, ci);
}

@@ -1026,6 +1032,16 @@ int ceph_fill_inode(struct inode *inode, struct page *locked_page,
xattr_blob = NULL;
}

+#ifdef CONFIG_FS_ENCRYPTION
+ if (iinfo->fscrypt_auth_len && !ci->fscrypt_auth) {
+ ci->fscrypt_auth_len = iinfo->fscrypt_auth_len;
+ ci->fscrypt_auth = iinfo->fscrypt_auth;
+ iinfo->fscrypt_auth = NULL;
+ iinfo->fscrypt_auth_len = 0;
+ inode_set_flags(inode, S_ENCRYPTED, S_ENCRYPTED);
+ }
+#endif
+
/* finally update i_version */
if (le64_to_cpu(info->version) > ci->i_version)
ci->i_version = le64_to_cpu(info->version);
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index e64a8cefdb7f..8e7ef76d80ea 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -184,8 +184,52 @@ static int parse_reply_info_in(void **p, void *end,
info->rsnaps = 0;
}

+ if (struct_v >= 5) {
+ u32 alen;
+
+ ceph_decode_32_safe(p, end, alen, bad);
+
+ while (alen--) {
+ u32 len;
+
+ /* key */
+ ceph_decode_32_safe(p, end, len, bad);
+ ceph_decode_skip_n(p, end, len, bad);
+ /* value */
+ ceph_decode_32_safe(p, end, len, bad);
+ ceph_decode_skip_n(p, end, len, bad);
+ }
+ }
+
+ /* fscrypt flag -- ignore */
+ if (struct_v >= 6)
+ ceph_decode_skip_8(p, end, bad);
+
+ info->fscrypt_auth = NULL;
+ info->fscrypt_auth_len = 0;
+ info->fscrypt_file = NULL;
+ info->fscrypt_file_len = 0;
+ if (struct_v >= 7) {
+ ceph_decode_32_safe(p, end, info->fscrypt_auth_len, bad);
+ if (info->fscrypt_auth_len) {
+ info->fscrypt_auth = kmalloc(info->fscrypt_auth_len, GFP_KERNEL);
+ if (!info->fscrypt_auth)
+ return -ENOMEM;
+ ceph_decode_copy_safe(p, end, info->fscrypt_auth,
+ info->fscrypt_auth_len, bad);
+ }
+ ceph_decode_32_safe(p, end, info->fscrypt_file_len, bad);
+ if (info->fscrypt_file_len) {
+ info->fscrypt_file = kmalloc(info->fscrypt_file_len, GFP_KERNEL);
+ if (!info->fscrypt_file)
+ return -ENOMEM;
+ ceph_decode_copy_safe(p, end, info->fscrypt_file,
+ info->fscrypt_file_len, bad);
+ }
+ }
*p = end;
} else {
+ /* legacy (unversioned) struct */
if (features & CEPH_FEATURE_MDS_INLINE_DATA) {
ceph_decode_64_safe(p, end, info->inline_version, bad);
ceph_decode_32_safe(p, end, info->inline_len, bad);
@@ -650,8 +694,21 @@ static int parse_reply_info(struct ceph_mds_session *s, struct ceph_msg *msg,

static void destroy_reply_info(struct ceph_mds_reply_info_parsed *info)
{
+ int i;
+
+ kfree(info->diri.fscrypt_auth);
+ kfree(info->diri.fscrypt_file);
+ kfree(info->targeti.fscrypt_auth);
+ kfree(info->targeti.fscrypt_file);
if (!info->dir_entries)
return;
+
+ for (i = 0; i < info->dir_nr; i++) {
+ struct ceph_mds_reply_dir_entry *rde = info->dir_entries + i;
+
+ kfree(rde->inode.fscrypt_auth);
+ kfree(rde->inode.fscrypt_file);
+ }
free_pages((unsigned long)info->dir_entries, get_order(info->dir_buf_size));
}

diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
index 2e945979a2e0..96d726ee5250 100644
--- a/fs/ceph/mds_client.h
+++ b/fs/ceph/mds_client.h
@@ -88,6 +88,10 @@ struct ceph_mds_reply_info_in {
s32 dir_pin;
struct ceph_timespec btime;
struct ceph_timespec snap_btime;
+ u8 *fscrypt_auth;
+ u8 *fscrypt_file;
+ u32 fscrypt_auth_len;
+ u32 fscrypt_file_len;
u64 rsnaps;
u64 change_attr;
};
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index f23e49f46440..e12e5b484564 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -438,6 +438,12 @@ struct ceph_inode_info {
struct work_struct i_work;
unsigned long i_work_mask;

+#ifdef CONFIG_FS_ENCRYPTION
+ u32 fscrypt_auth_len;
+ u32 fscrypt_file_len;
+ u8 *fscrypt_auth;
+ u8 *fscrypt_file;
+#endif
#ifdef CONFIG_CEPH_FSCACHE
struct fscache_cookie *fscache;
#endif
--
2.35.1

2022-03-22 18:26:03

by Jeffrey Layton

[permalink] [raw]
Subject: [RFC PATCH v11 16/51] ceph: send altname in MClientRequest

In the event that we have a filename longer than CEPH_NOHASH_NAME_MAX,
we'll need to hash the tail of the filename. The client however will
still need to know the full name of the file if it has a key.

To support this, the MClientRequest field has grown a new alternate_name
field that we populate with the full (binary) crypttext of the filename.
This is then transmitted to the clients in readdir or traces as part of
the dentry lease.

Add support for populating this field when the filenames are very long.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/ceph/mds_client.c | 75 +++++++++++++++++++++++++++++++++++++++++---
fs/ceph/mds_client.h | 3 ++
2 files changed, 73 insertions(+), 5 deletions(-)

diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index ff80f09fbc12..e5f569f9d6a0 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -972,6 +972,7 @@ void ceph_mdsc_release_request(struct kref *kref)
if (req->r_pagelist)
ceph_pagelist_release(req->r_pagelist);
kfree(req->r_fscrypt_auth);
+ kfree(req->r_altname);
put_request_session(req);
ceph_unreserve_caps(req->r_mdsc, &req->r_caps_reservation);
WARN_ON_ONCE(!list_empty(&req->r_wait));
@@ -2386,6 +2387,63 @@ static inline u64 __get_oldest_tid(struct ceph_mds_client *mdsc)
return mdsc->oldest_tid;
}

+#if IS_ENABLED(CONFIG_FS_ENCRYPTION)
+static u8 *get_fscrypt_altname(const struct ceph_mds_request *req, u32 *plen)
+{
+ struct inode *dir = req->r_parent;
+ struct dentry *dentry = req->r_dentry;
+ u8 *cryptbuf = NULL;
+ u32 len = 0;
+ int ret = 0;
+
+ /* only encode if we have parent and dentry */
+ if (!dir || !dentry)
+ goto success;
+
+ /* No-op unless this is encrypted */
+ if (!IS_ENCRYPTED(dir))
+ goto success;
+
+ ret = __fscrypt_prepare_readdir(dir);
+ if (ret)
+ return ERR_PTR(ret);
+
+ /* No key? Just ignore it. */
+ if (!fscrypt_has_encryption_key(dir))
+ goto success;
+
+ if (!fscrypt_fname_encrypted_size(dir, dentry->d_name.len, NAME_MAX, &len)) {
+ WARN_ON_ONCE(1);
+ return ERR_PTR(-ENAMETOOLONG);
+ }
+
+ /* No need to append altname if name is short enough */
+ if (len <= CEPH_NOHASH_NAME_MAX) {
+ len = 0;
+ goto success;
+ }
+
+ cryptbuf = kmalloc(len, GFP_KERNEL);
+ if (!cryptbuf)
+ return ERR_PTR(-ENOMEM);
+
+ ret = fscrypt_fname_encrypt(dir, &dentry->d_name, cryptbuf, len);
+ if (ret) {
+ kfree(cryptbuf);
+ return ERR_PTR(ret);
+ }
+success:
+ *plen = len;
+ return cryptbuf;
+}
+#else
+static u8 *get_fscrypt_altname(const struct ceph_mds_request *req, u32 *plen)
+{
+ *plen = 0;
+ return NULL;
+}
+#endif
+
/**
* ceph_mdsc_build_path - build a path string to a given dentry
* @dentry: dentry to which path should be built
@@ -2606,14 +2664,15 @@ static void encode_mclientrequest_tail(void **p, const struct ceph_mds_request *
ceph_encode_timespec64(&ts, &req->r_stamp);
ceph_encode_copy(p, &ts, sizeof(ts));

- /* gid_list */
+ /* v4: gid_list */
ceph_encode_32(p, req->r_cred->group_info->ngroups);
for (i = 0; i < req->r_cred->group_info->ngroups; i++)
ceph_encode_64(p, from_kgid(&init_user_ns,
req->r_cred->group_info->gid[i]));

- /* v5: altname (TODO: skip for now) */
- ceph_encode_32(p, 0);
+ /* v5: altname */
+ ceph_encode_32(p, req->r_altname_len);
+ ceph_encode_copy(p, req->r_altname, req->r_altname_len);

/* v6: fscrypt_auth and fscrypt_file */
if (req->r_fscrypt_auth) {
@@ -2669,7 +2728,13 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
goto out_free1;
}

- /* head */
+ req->r_altname = get_fscrypt_altname(req, &req->r_altname_len);
+ if (IS_ERR(req->r_altname)) {
+ msg = ERR_CAST(req->r_altname);
+ req->r_altname = NULL;
+ goto out_free2;
+ }
+
len = legacy ? sizeof(*head) : sizeof(struct ceph_mds_request_head);

/* filepaths */
@@ -2695,7 +2760,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
len += sizeof(u32) + (sizeof(u64) * req->r_cred->group_info->ngroups);

/* alternate name */
- len += sizeof(u32); // TODO
+ len += sizeof(u32) + req->r_altname_len;

/* fscrypt_auth */
len += sizeof(u32); // fscrypt_auth
diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
index 2cc75f9ae7c7..cd719691a86d 100644
--- a/fs/ceph/mds_client.h
+++ b/fs/ceph/mds_client.h
@@ -290,6 +290,9 @@ struct ceph_mds_request {

struct ceph_fscrypt_auth *r_fscrypt_auth;

+ u8 *r_altname; /* fscrypt binary crypttext for long filenames */
+ u32 r_altname_len; /* length of r_altname */
+
int r_fmode; /* file mode, if expecting cap */
int r_request_release_offset;
const struct cred *r_cred;
--
2.35.1

2022-03-22 18:26:05

by Jeffrey Layton

[permalink] [raw]
Subject: [RFC PATCH v11 27/51] ceph: make ceph_get_name decrypt filenames

When we do a lookupino to the MDS, we get a filename in the trace.
ceph_get_name uses that name directly, so we must properly decrypt
it before copying it to the name buffer.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/ceph/export.c | 44 ++++++++++++++++++++++++++++++++------------
1 file changed, 32 insertions(+), 12 deletions(-)

diff --git a/fs/ceph/export.c b/fs/ceph/export.c
index e0fa66ac8b9f..0ebf2bd93055 100644
--- a/fs/ceph/export.c
+++ b/fs/ceph/export.c
@@ -7,6 +7,7 @@

#include "super.h"
#include "mds_client.h"
+#include "crypto.h"

/*
* Basic fh
@@ -534,7 +535,9 @@ static int ceph_get_name(struct dentry *parent, char *name,
{
struct ceph_mds_client *mdsc;
struct ceph_mds_request *req;
+ struct inode *dir = d_inode(parent);
struct inode *inode = d_inode(child);
+ struct ceph_mds_reply_info_parsed *rinfo;
int err;

if (ceph_snap(inode) != CEPH_NOSNAP)
@@ -546,30 +549,47 @@ static int ceph_get_name(struct dentry *parent, char *name,
if (IS_ERR(req))
return PTR_ERR(req);

- inode_lock(d_inode(parent));
-
+ inode_lock(dir);
req->r_inode = inode;
ihold(inode);
req->r_ino2 = ceph_vino(d_inode(parent));
- req->r_parent = d_inode(parent);
- ihold(req->r_parent);
+ req->r_parent = dir;
+ ihold(dir);
set_bit(CEPH_MDS_R_PARENT_LOCKED, &req->r_req_flags);
req->r_num_caps = 2;
err = ceph_mdsc_do_request(mdsc, NULL, req);
+ inode_unlock(dir);

- inode_unlock(d_inode(parent));
+ if (err)
+ goto out;

- if (!err) {
- struct ceph_mds_reply_info_parsed *rinfo = &req->r_reply_info;
+ rinfo = &req->r_reply_info;
+ if (!IS_ENCRYPTED(dir)) {
memcpy(name, rinfo->dname, rinfo->dname_len);
name[rinfo->dname_len] = 0;
- dout("get_name %p ino %llx.%llx name %s\n",
- child, ceph_vinop(inode), name);
} else {
- dout("get_name %p ino %llx.%llx err %d\n",
- child, ceph_vinop(inode), err);
- }
+ struct fscrypt_str oname = FSTR_INIT(NULL, 0);
+ struct ceph_fname fname = { .dir = dir,
+ .name = rinfo->dname,
+ .ctext = rinfo->altname,
+ .name_len = rinfo->dname_len,
+ .ctext_len = rinfo->altname_len };
+
+ err = ceph_fname_alloc_buffer(dir, &oname);
+ if (err < 0)
+ goto out;

+ err = ceph_fname_to_usr(&fname, NULL, &oname, NULL);
+ if (!err) {
+ memcpy(name, oname.name, oname.len);
+ name[oname.len] = 0;
+ }
+ ceph_fname_free_buffer(dir, &oname);
+ }
+out:
+ dout("get_name %p ino %llx.%llx err %d %s%s\n",
+ child, ceph_vinop(inode), err,
+ err ? "" : "name ", err ? "" : name);
ceph_mdsc_put_request(req);
return err;
}
--
2.35.1

2022-03-22 18:32:49

by Jeffrey Layton

[permalink] [raw]
Subject: [RFC PATCH v11 35/51] ceph: handle fscrypt fields in cap messages from MDS

Handle the new fscrypt_file and fscrypt_auth fields in cap messages. Use
them to populate new fields in cap_extra_info and update the inode with
those values.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/ceph/caps.c | 74 ++++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 72 insertions(+), 2 deletions(-)

diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index 65af0dcf12ec..fbf120a6aa96 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -3364,6 +3364,9 @@ struct cap_extra_info {
/* currently issued */
int issued;
struct timespec64 btime;
+ u8 *fscrypt_auth;
+ u32 fscrypt_auth_len;
+ u64 fscrypt_file_size;
};

/*
@@ -3396,6 +3399,14 @@ static void handle_cap_grant(struct inode *inode,
bool deleted_inode = false;
bool fill_inline = false;

+ /*
+ * If there is at least one crypto block then we'll trust fscrypt_file_size.
+ * If the real length of the file is 0, then ignore it (it has probably been
+ * truncated down to 0 by the MDS).
+ */
+ if (IS_ENCRYPTED(inode) && size)
+ size = extra_info->fscrypt_file_size;
+
dout("handle_cap_grant inode %p cap %p mds%d seq %d %s\n",
inode, cap, session->s_mds, seq, ceph_cap_string(newcaps));
dout(" size %llu max_size %llu, i_size %llu\n", size, max_size,
@@ -3873,7 +3884,8 @@ static void handle_cap_flushsnap_ack(struct inode *inode, u64 flush_tid,
*/
static bool handle_cap_trunc(struct inode *inode,
struct ceph_mds_caps *trunc,
- struct ceph_mds_session *session)
+ struct ceph_mds_session *session,
+ struct cap_extra_info *extra_info)
{
struct ceph_inode_info *ci = ceph_inode(inode);
int mds = session->s_mds;
@@ -3890,6 +3902,14 @@ static bool handle_cap_trunc(struct inode *inode,

issued |= implemented | dirty;

+ /*
+ * If there is at least one crypto block then we'll trust fscrypt_file_size.
+ * If the real length of the file is 0, then ignore it (it has probably been
+ * truncated down to 0 by the MDS).
+ */
+ if (IS_ENCRYPTED(inode) && size)
+ size = extra_info->fscrypt_file_size;
+
dout("handle_cap_trunc inode %p mds%d seq %d to %lld seq %d\n",
inode, mds, seq, truncate_size, truncate_seq);
queue_trunc = ceph_fill_file_size(inode, issued,
@@ -4111,6 +4131,49 @@ static void handle_cap_import(struct ceph_mds_client *mdsc,
*target_cap = cap;
}

+#ifdef CONFIG_FS_ENCRYPTION
+static int parse_fscrypt_fields(void **p, void *end, struct cap_extra_info *extra)
+{
+ u32 len;
+
+ ceph_decode_32_safe(p, end, extra->fscrypt_auth_len, bad);
+ if (extra->fscrypt_auth_len) {
+ ceph_decode_need(p, end, extra->fscrypt_auth_len, bad);
+ extra->fscrypt_auth = kmalloc(extra->fscrypt_auth_len, GFP_KERNEL);
+ if (!extra->fscrypt_auth)
+ return -ENOMEM;
+ ceph_decode_copy_safe(p, end, extra->fscrypt_auth,
+ extra->fscrypt_auth_len, bad);
+ }
+
+ ceph_decode_32_safe(p, end, len, bad);
+ if (len >= sizeof(u64)) {
+ ceph_decode_64_safe(p, end, extra->fscrypt_file_size, bad);
+ len -= sizeof(u64);
+ }
+ ceph_decode_skip_n(p, end, len, bad);
+ return 0;
+bad:
+ return -EIO;
+}
+#else
+static int parse_fscrypt_fields(void **p, void *end, struct cap_extra_info *extra)
+{
+ u32 len;
+
+ /* Don't care about these fields unless we're encryption-capable */
+ ceph_decode_32_safe(p, end, len, bad);
+ if (len)
+ ceph_decode_skip_n(p, end, len, bad);
+ ceph_decode_32_safe(p, end, len, bad);
+ if (len)
+ ceph_decode_skip_n(p, end, len, bad);
+ return 0;
+bad:
+ return -EIO;
+}
+#endif
+
/*
* Handle a caps message from the MDS.
*
@@ -4229,6 +4292,11 @@ void ceph_handle_caps(struct ceph_mds_session *session,
ceph_decode_64_safe(&p, end, extra_info.nsubdirs, bad);
}

+ if (msg_version >= 12) {
+ if (parse_fscrypt_fields(&p, end, &extra_info))
+ goto bad;
+ }
+
/* lookup ino */
inode = ceph_find_inode(mdsc->fsc->sb, vino);
dout(" op %s ino %llx.%llx inode %p\n", ceph_cap_op_name(op), vino.ino,
@@ -4325,7 +4393,8 @@ void ceph_handle_caps(struct ceph_mds_session *session,
break;

case CEPH_CAP_OP_TRUNC:
- queue_trunc = handle_cap_trunc(inode, h, session);
+ queue_trunc = handle_cap_trunc(inode, h, session,
+ &extra_info);
spin_unlock(&ci->i_ceph_lock);
if (queue_trunc)
ceph_queue_vmtruncate(inode);
@@ -4343,6 +4412,7 @@ void ceph_handle_caps(struct ceph_mds_session *session,
iput(inode);
out:
ceph_put_string(extra_info.pool_ns);
+ kfree(extra_info.fscrypt_auth);
return;

flush_cap_releases:
--
2.35.1

2022-03-22 18:34:07

by Jeffrey Layton

[permalink] [raw]
Subject: [RFC PATCH v11 50/51] ceph: add encryption support to writepage

Allow writepage to issue encrypted writes. Extend out the requested size
and offset to cover complete blocks, and then encrypt and write them to
the OSDs.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/ceph/addr.c | 34 +++++++++++++++++++++++++++-------
1 file changed, 27 insertions(+), 7 deletions(-)

diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 13a37a568a1d..403e7a960a4e 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -594,10 +594,12 @@ static int writepage_nounlock(struct page *page, struct writeback_control *wbc)
loff_t page_off = page_offset(page);
int err;
loff_t len = thp_size(page);
+ loff_t wlen;
struct ceph_writeback_ctl ceph_wbc;
struct ceph_osd_client *osdc = &fsc->client->osdc;
struct ceph_osd_request *req;
bool caching = ceph_is_cache_enabled(inode);
+ struct page *bounce_page = NULL;

dout("writepage %p idx %lu\n", page, page->index);

@@ -628,6 +630,8 @@ static int writepage_nounlock(struct page *page, struct writeback_control *wbc)

if (ceph_wbc.i_size < page_off + len)
len = ceph_wbc.i_size - page_off;
+ if (IS_ENCRYPTED(inode))
+ wlen = round_up(len, CEPH_FSCRYPT_BLOCK_SIZE);

dout("writepage %p page %p index %lu on %llu~%llu snapc %p seq %lld\n",
inode, page, page->index, page_off, len, snapc, snapc->seq);
@@ -636,22 +640,37 @@ static int writepage_nounlock(struct page *page, struct writeback_control *wbc)
CONGESTION_ON_THRESH(fsc->mount_options->congestion_kb))
set_bdi_congested(inode_to_bdi(inode), BLK_RW_ASYNC);

- req = ceph_osdc_new_request(osdc, &ci->i_layout, ceph_vino(inode), page_off, &len, 0, 1,
- CEPH_OSD_OP_WRITE, CEPH_OSD_FLAG_WRITE, snapc,
- ceph_wbc.truncate_seq, ceph_wbc.truncate_size,
- true);
+ req = ceph_osdc_new_request(osdc, &ci->i_layout, ceph_vino(inode),
+ page_off, &wlen, 0, 1, CEPH_OSD_OP_WRITE,
+ CEPH_OSD_FLAG_WRITE, snapc,
+ ceph_wbc.truncate_seq,
+ ceph_wbc.truncate_size, true);
if (IS_ERR(req))
return PTR_ERR(req);

+ if (wlen < len)
+ len = wlen;
+
set_page_writeback(page);
if (caching)
ceph_set_page_fscache(page);
ceph_fscache_write_to_cache(inode, page_off, len, caching);

+ if (IS_ENCRYPTED(inode)) {
+ bounce_page = fscrypt_encrypt_pagecache_blocks(page, CEPH_FSCRYPT_BLOCK_SIZE,
+ 0, GFP_NOFS);
+ if (IS_ERR(bounce_page)) {
+ err = PTR_ERR(bounce_page);
+ goto out;
+ }
+ }
/* it may be a short write due to an object boundary */
WARN_ON_ONCE(len > thp_size(page));
- osd_req_op_extent_osd_data_pages(req, 0, &page, len, 0, false, false);
- dout("writepage %llu~%llu (%llu bytes)\n", page_off, len, len);
+ osd_req_op_extent_osd_data_pages(req, 0,
+ bounce_page ? &bounce_page : &page, wlen, 0,
+ false, false);
+ dout("writepage %llu~%llu (%llu bytes, %sencrypted)\n",
+ page_off, len, wlen, IS_ENCRYPTED(inode) ? "" : "not ");

req->r_mtime = inode->i_mtime;
err = ceph_osdc_start_request(osdc, req, true);
@@ -660,7 +679,8 @@ static int writepage_nounlock(struct page *page, struct writeback_control *wbc)

ceph_update_write_metrics(&fsc->mdsc->metric, req->r_start_latency,
req->r_end_latency, len, err);
-
+ fscrypt_free_bounce_page(bounce_page);
+out:
ceph_osdc_put_request(req);
if (err == 0)
err = len;
--
2.35.1

2022-03-22 18:34:10

by Jeffrey Layton

[permalink] [raw]
Subject: [RFC PATCH v11 12/51] ceph: add fscrypt ioctls

We gate most of the ioctls on MDS feature support. The exception is the
key removal and status functions that we still want to work if the MDS's
were to (inexplicably) lose the feature.

For the set_policy ioctl, we take Fs caps to ensure that nothing can
create files in the directory while the ioctl is running. That should
be enough to ensure that the "empty_dir" check is reliable.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/ceph/ioctl.c | 83 +++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 83 insertions(+)

diff --git a/fs/ceph/ioctl.c b/fs/ceph/ioctl.c
index 6e061bf62ad4..477ecc667aee 100644
--- a/fs/ceph/ioctl.c
+++ b/fs/ceph/ioctl.c
@@ -6,6 +6,7 @@
#include "mds_client.h"
#include "ioctl.h"
#include <linux/ceph/striper.h>
+#include <linux/fscrypt.h>

/*
* ioctls
@@ -268,8 +269,54 @@ static long ceph_ioctl_syncio(struct file *file)
return 0;
}

+static int vet_mds_for_fscrypt(struct file *file)
+{
+ int i, ret = -EOPNOTSUPP;
+ struct ceph_mds_client *mdsc = ceph_sb_to_mdsc(file_inode(file)->i_sb);
+
+ mutex_lock(&mdsc->mutex);
+ for (i = 0; i < mdsc->max_sessions; i++) {
+ struct ceph_mds_session *s = mdsc->sessions[i];
+
+ if (!s)
+ continue;
+ if (test_bit(CEPHFS_FEATURE_ALTERNATE_NAME, &s->s_features))
+ ret = 0;
+ break;
+ }
+ mutex_unlock(&mdsc->mutex);
+ return ret;
+}
+
+static long ceph_set_encryption_policy(struct file *file, unsigned long arg)
+{
+ int ret, got = 0;
+ struct inode *inode = file_inode(file);
+ struct ceph_inode_info *ci = ceph_inode(inode);
+
+ ret = vet_mds_for_fscrypt(file);
+ if (ret)
+ return ret;
+
+ /*
+ * Ensure we hold these caps so that we _know_ that the rstats check
+ * in the empty_dir check is reliable.
+ */
+ ret = ceph_get_caps(file, CEPH_CAP_FILE_SHARED, 0, -1, &got);
+ if (ret)
+ return ret;
+
+ ret = fscrypt_ioctl_set_policy(file, (const void __user *)arg);
+ if (got)
+ ceph_put_cap_refs(ci, got);
+
+ return ret;
+}
+
long ceph_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
{
+ int ret;
+
dout("ioctl file %p cmd %u arg %lu\n", file, cmd, arg);
switch (cmd) {
case CEPH_IOC_GET_LAYOUT:
@@ -289,6 +336,42 @@ long ceph_ioctl(struct file *file, unsigned int cmd, unsigned long arg)

case CEPH_IOC_SYNCIO:
return ceph_ioctl_syncio(file);
+
+ case FS_IOC_SET_ENCRYPTION_POLICY:
+ return ceph_set_encryption_policy(file, arg);
+
+ case FS_IOC_GET_ENCRYPTION_POLICY:
+ ret = vet_mds_for_fscrypt(file);
+ if (ret)
+ return ret;
+ return fscrypt_ioctl_get_policy(file, (void __user *)arg);
+
+ case FS_IOC_GET_ENCRYPTION_POLICY_EX:
+ ret = vet_mds_for_fscrypt(file);
+ if (ret)
+ return ret;
+ return fscrypt_ioctl_get_policy_ex(file, (void __user *)arg);
+
+ case FS_IOC_ADD_ENCRYPTION_KEY:
+ ret = vet_mds_for_fscrypt(file);
+ if (ret)
+ return ret;
+ return fscrypt_ioctl_add_key(file, (void __user *)arg);
+
+ case FS_IOC_REMOVE_ENCRYPTION_KEY:
+ return fscrypt_ioctl_remove_key(file, (void __user *)arg);
+
+ case FS_IOC_REMOVE_ENCRYPTION_KEY_ALL_USERS:
+ return fscrypt_ioctl_remove_key_all_users(file, (void __user *)arg);
+
+ case FS_IOC_GET_ENCRYPTION_KEY_STATUS:
+ return fscrypt_ioctl_get_key_status(file, (void __user *)arg);
+
+ case FS_IOC_GET_ENCRYPTION_NONCE:
+ ret = vet_mds_for_fscrypt(file);
+ if (ret)
+ return ret;
+ return fscrypt_ioctl_get_nonce(file, (void __user *)arg);
}

return -ENOTTY;
--
2.35.1

2022-03-22 18:34:58

by Jeffrey Layton

[permalink] [raw]
Subject: Re: [RFC PATCH v11 00/51] ceph+fscrypt : full support

On Tue, 2022-03-22 at 10:12 -0400, Jeff Layton wrote:
> This patchset represents a (mostly) working prototype of the
> ceph+fscrypt work. With this, I'm able run xfstests with
> test_dummy_encryption, and most of the tests that pass on ceph without
> fscrypt now pass on it.
>
> When I made the last posting of this series [1], I mentioned that proper
> support for sparse read support would be necessary to do this. Thus, the
> biggest difference from the v10 set is that this is now based on top of
> the patch series that I posted yesterday to implement sparse reads [2].
>
> Aside from that, there are also numerous cleanups all over the tree, as
> well as an overhaul of the readdir handling by Xiubo.
>
> This series is not yet bug-free, but it's at a point where it is quite
> usable, providing you're running against the Quincy release of ceph
> (which should ship sometime in the next few months).
>
> Next Steps:
> ===========
> I'm not going to sugar-coat it. This is a huge, invasive patch series
> that touches a lot of the most sensitive code in ceph.
>
> Eric Biggers has acked the changes we need in fscrypt infrastructure. I
> still need Al to ack exporting the new_inode_pseudo symbol. The rest is
> pretty much all ceph and libceph code.
>
> The main piece missing at this point is support for sparse reads with
> ms_mode settings other than "crc". Once that's complete, I want to merge
> that and this series into the ceph "testing" branch so we can start
> running tests against it in teuthology with fscrypt enabled.
>
> If that goes well, I think we could probably merge this into mainline
> for v5.20 or v5.21. There is also some incoming support for netfs write
> and DIO read helpers that we may want to convert to as well [3]. That
> may alter the timing as well.
>
> Review, comments and questions are welcome...
>
> [1]: https://lore.kernel.org/ceph-devel/[email protected]/
>
> [2]: https://lore.kernel.org/ceph-devel/[email protected]/
>
> [3]: https://lore.kernel.org/ceph-devel/[email protected]/T/#maec7e3579f13a45171ad23d7a49183d169fcfcca
>
> Jeff Layton (41):
> vfs: export new_inode_pseudo
> fscrypt: export fscrypt_base64url_encode and fscrypt_base64url_decode
> fscrypt: export fscrypt_fname_encrypt and fscrypt_fname_encrypted_size
> fscrypt: add fscrypt_context_for_new_inode
> ceph: preallocate inode for ops that may create one
> ceph: crypto context handling for ceph
> ceph: parse new fscrypt_auth and fscrypt_file fields in inode traces
> ceph: add support for fscrypt_auth/fscrypt_file to cap messages
> ceph: add ability to set fscrypt_auth via setattr
> ceph: implement -o test_dummy_encryption mount option
> ceph: decode alternate_name in lease info
> ceph: add fscrypt ioctls
> ceph: make ceph_msdc_build_path use ref-walk
> ceph: add encrypted fname handling to ceph_mdsc_build_path
> ceph: send altname in MClientRequest
> ceph: encode encrypted name in dentry release
> ceph: properly set DCACHE_NOKEY_NAME flag in lookup
> ceph: make d_revalidate call fscrypt revalidator for encrypted
> dentries
> ceph: add helpers for converting names for userland presentation
> ceph: add fscrypt support to ceph_fill_trace
> ceph: create symlinks with encrypted and base64-encoded targets
> ceph: make ceph_get_name decrypt filenames
> ceph: add a new ceph.fscrypt.auth vxattr
> ceph: add some fscrypt guardrails
> libceph: add CEPH_OSD_OP_ASSERT_VER support
> ceph: size handling for encrypted inodes in cap updates
> ceph: fscrypt_file field handling in MClientRequest messages
> ceph: get file size from fscrypt_file when present in inode traces
> ceph: handle fscrypt fields in cap messages from MDS
> ceph: add infrastructure for file encryption and decryption
> libceph: allow ceph_osdc_new_request to accept a multi-op read
> ceph: disable fallocate for encrypted inodes
> ceph: disable copy offload on encrypted inodes
> ceph: don't use special DIO path for encrypted inodes
> ceph: align data in pages in ceph_sync_write
> ceph: add read/modify/write to ceph_sync_write
> ceph: plumb in decryption during sync reads
> ceph: add fscrypt decryption support to ceph_netfs_issue_op
> ceph: set i_blkbits to crypto block size for encrypted inodes
> ceph: add encryption support to writepage
> ceph: fscrypt support for writepages
>
> Luis Henriques (1):
> ceph: don't allow changing layout on encrypted files/directories
>
> Xiubo Li (9):
> ceph: make the ioctl cmd more readable in debug log
> ceph: fix base64 encoded name's length check in ceph_fname_to_usr()
> ceph: pass the request to parse_reply_info_readdir()
> ceph: add ceph_encode_encrypted_dname() helper
> ceph: add support to readdir for encrypted filenames
> ceph: add __ceph_get_caps helper support
> ceph: add __ceph_sync_read helper support
> ceph: add object version support for sync read
> ceph: add truncate size handling support for fscrypt
>
> fs/ceph/Makefile | 1 +
> fs/ceph/acl.c | 4 +-
> fs/ceph/addr.c | 128 ++++++--
> fs/ceph/caps.c | 212 +++++++++++--
> fs/ceph/crypto.c | 432 +++++++++++++++++++++++++
> fs/ceph/crypto.h | 256 +++++++++++++++
> fs/ceph/dir.c | 182 ++++++++---
> fs/ceph/export.c | 44 ++-
> fs/ceph/file.c | 530 ++++++++++++++++++++++++++-----
> fs/ceph/inode.c | 546 +++++++++++++++++++++++++++++---
> fs/ceph/ioctl.c | 126 +++++++-
> fs/ceph/mds_client.c | 455 ++++++++++++++++++++++----
> fs/ceph/mds_client.h | 24 +-
> fs/ceph/super.c | 91 +++++-
> fs/ceph/super.h | 43 ++-
> fs/ceph/xattr.c | 29 ++
> fs/crypto/fname.c | 44 ++-
> fs/crypto/fscrypt_private.h | 9 +-
> fs/crypto/hooks.c | 6 +-
> fs/crypto/policy.c | 35 +-
> fs/inode.c | 1 +
> include/linux/ceph/ceph_fs.h | 21 +-
> include/linux/ceph/osd_client.h | 6 +-
> include/linux/ceph/rados.h | 4 +
> include/linux/fscrypt.h | 10 +
> net/ceph/osd_client.c | 32 +-
> 26 files changed, 2907 insertions(+), 364 deletions(-)
> create mode 100644 fs/ceph/crypto.c
> create mode 100644 fs/ceph/crypto.h
>

I'm going to go ahead and update the wip-fscrypt branch in the ceph
kernel tree to use this series. Please note that for now, that branch
won't work correctly when the ms_mode=secure or ms_mode=legacy transport
modes are used.

Once the sparse read support is updated to include those, we should be
able to use other transports with it.

Cheers,
--
Jeff Layton <[email protected]>

2022-03-22 18:37:38

by Jeffrey Layton

[permalink] [raw]
Subject: [RFC PATCH v11 51/51] ceph: fscrypt support for writepages

Add the appropriate machinery to write back dirty data with encryption.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/ceph/addr.c | 62 ++++++++++++++++++++++++++++++++++++++----------
fs/ceph/crypto.h | 18 +++++++++++++-
2 files changed, 67 insertions(+), 13 deletions(-)

diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 403e7a960a4e..cc4f561bd03c 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -556,10 +556,12 @@ static u64 get_writepages_data_length(struct inode *inode,
struct page *page, u64 start)
{
struct ceph_inode_info *ci = ceph_inode(inode);
- struct ceph_snap_context *snapc = page_snap_context(page);
+ struct ceph_snap_context *snapc;
struct ceph_cap_snap *capsnap = NULL;
u64 end = i_size_read(inode);
+ u64 ret;

+ snapc = page_snap_context(ceph_fscrypt_pagecache_page(page));
if (snapc != ci->i_head_snapc) {
bool found = false;
spin_lock(&ci->i_ceph_lock);
@@ -574,9 +576,12 @@ static u64 get_writepages_data_length(struct inode *inode,
spin_unlock(&ci->i_ceph_lock);
WARN_ON(!found);
}
- if (end > page_offset(page) + thp_size(page))
- end = page_offset(page) + thp_size(page);
- return end > start ? end - start : 0;
+ if (end > ceph_fscrypt_page_offset(page) + thp_size(page))
+ end = ceph_fscrypt_page_offset(page) + thp_size(page);
+ ret = end > start ? end - start : 0;
+ if (ret && fscrypt_is_bounce_page(page))
+ ret = round_up(ret, CEPH_FSCRYPT_BLOCK_SIZE);
+ return ret;
}

/*
@@ -792,6 +797,11 @@ static void writepages_finish(struct ceph_osd_request *req)
total_pages += num_pages;
for (j = 0; j < num_pages; j++) {
page = osd_data->pages[j];
+ if (fscrypt_is_bounce_page(page)) {
+ page = fscrypt_pagecache_page(page);
+ fscrypt_free_bounce_page(osd_data->pages[j]);
+ osd_data->pages[j] = page;
+ }
BUG_ON(!page);
WARN_ON(!PageUptodate(page));

@@ -1050,10 +1060,28 @@ static int ceph_writepages_start(struct address_space *mapping,
BLK_RW_ASYNC);
}

+ if (IS_ENCRYPTED(inode)) {
+ pages[locked_pages] =
+ fscrypt_encrypt_pagecache_blocks(page,
+ PAGE_SIZE, 0,
+ locked_pages ? GFP_NOWAIT : GFP_NOFS);
+ if (IS_ERR(pages[locked_pages])) {
+ if (PTR_ERR(pages[locked_pages]) == -EINVAL)
+ pr_err("%s: inode->i_blkbits=%hhu\n",
+ __func__, inode->i_blkbits);
+ /* better not fail on first page! */
+ BUG_ON(locked_pages == 0);
+ pages[locked_pages] = NULL;
+ redirty_page_for_writepage(wbc, page);
+ unlock_page(page);
+ break;
+ }
+ ++locked_pages;
+ } else {
+ pages[locked_pages++] = page;
+ }

- pages[locked_pages++] = page;
pvec.pages[i] = NULL;
-
len += thp_size(page);
}

@@ -1081,7 +1109,7 @@ static int ceph_writepages_start(struct address_space *mapping,
}

new_request:
- offset = page_offset(pages[0]);
+ offset = ceph_fscrypt_page_offset(pages[0]);
len = wsize;

req = ceph_osdc_new_request(&fsc->client->osdc,
@@ -1102,8 +1130,8 @@ static int ceph_writepages_start(struct address_space *mapping,
ceph_wbc.truncate_size, true);
BUG_ON(IS_ERR(req));
}
- BUG_ON(len < page_offset(pages[locked_pages - 1]) +
- thp_size(page) - offset);
+ BUG_ON(len < ceph_fscrypt_page_offset(pages[locked_pages - 1]) +
+ thp_size(pages[locked_pages - 1]) - offset);

req->r_callback = writepages_finish;
req->r_inode = inode;
@@ -1113,7 +1141,9 @@ static int ceph_writepages_start(struct address_space *mapping,
data_pages = pages;
op_idx = 0;
for (i = 0; i < locked_pages; i++) {
- u64 cur_offset = page_offset(pages[i]);
+ struct page *page = ceph_fscrypt_pagecache_page(pages[i]);
+
+ u64 cur_offset = page_offset(page);
/*
* Discontinuity in page range? Ceph can handle that by just passing
* multiple extents in the write op.
@@ -1142,9 +1172,9 @@ static int ceph_writepages_start(struct address_space *mapping,
op_idx++;
}

- set_page_writeback(pages[i]);
+ set_page_writeback(page);
if (caching)
- ceph_set_page_fscache(pages[i]);
+ ceph_set_page_fscache(page);
len += thp_size(page);
}
ceph_fscache_write_to_cache(inode, offset, len, caching);
@@ -1160,8 +1190,16 @@ static int ceph_writepages_start(struct address_space *mapping,
offset);
len = max(len, min_len);
}
+ if (IS_ENCRYPTED(inode))
+ len = round_up(len, CEPH_FSCRYPT_BLOCK_SIZE);
+
dout("writepages got pages at %llu~%llu\n", offset, len);

+ if (IS_ENCRYPTED(inode) &&
+ ((offset | len) & ~CEPH_FSCRYPT_BLOCK_MASK))
+ pr_warn("%s: bad encrypted write offset=%lld len=%llu\n",
+ __func__, offset, len);
+
osd_req_op_extent_osd_data_pages(req, op_idx, data_pages, len,
0, from_pool, false);
osd_req_op_extent_update(req, op_idx, len);
diff --git a/fs/ceph/crypto.h b/fs/ceph/crypto.h
index 92a7b221a975..0cf526f07567 100644
--- a/fs/ceph/crypto.h
+++ b/fs/ceph/crypto.h
@@ -146,6 +146,12 @@ int ceph_fscrypt_decrypt_extents(struct inode *inode, struct page **page, u64 of
struct ceph_sparse_extent *map, u32 ext_cnt);
int ceph_fscrypt_encrypt_pages(struct inode *inode, struct page **page, u64 off,
int len, gfp_t gfp);
+
+static inline struct page *ceph_fscrypt_pagecache_page(struct page *page)
+{
+ return fscrypt_is_bounce_page(page) ? fscrypt_pagecache_page(page) : page;
+}
+
#else /* CONFIG_FS_ENCRYPTION */

static inline void ceph_fscrypt_set_ops(struct super_block *sb)
@@ -235,6 +241,16 @@ static inline int ceph_fscrypt_encrypt_pages(struct inode *inode, struct page **
{
return 0;
}
+
+static inline struct page *ceph_fscrypt_pagecache_page(struct page *page)
+{
+ return page;
+}
#endif /* CONFIG_FS_ENCRYPTION */

-#endif
+static inline loff_t ceph_fscrypt_page_offset(struct page *page)
+{
+ return page_offset(ceph_fscrypt_pagecache_page(page));
+}
+
+#endif /* _CEPH_CRYPTO_H */
--
2.35.1

2022-03-22 19:36:12

by Jeffrey Layton

[permalink] [raw]
Subject: [RFC PATCH v11 47/51] ceph: plumb in decryption during sync reads

Switch to using sparse reads when the inode is encrypted.

Note that the crypto block may be smaller than a page, but the reverse
cannot be true.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/ceph/file.c | 89 ++++++++++++++++++++++++++++++++++++--------------
1 file changed, 65 insertions(+), 24 deletions(-)

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 19d5c50f60df..eb04dc8f1f93 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -934,7 +934,7 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
u64 off = *ki_pos;
u64 len = iov_iter_count(to);
u64 i_size = i_size_read(inode);
- bool sparse = ceph_test_mount_opt(fsc, SPARSEREAD);
+ bool sparse = IS_ENCRYPTED(inode) || ceph_test_mount_opt(fsc, SPARSEREAD);
u64 objver = 0;

dout("sync_read on inode %p %llx~%llx\n", inode, *ki_pos, len);
@@ -962,10 +962,19 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
int idx;
size_t left;
struct ceph_osd_req_op *op;
+ u64 read_off = off;
+ u64 read_len = len;
+
+ /* determine new offset/length if encrypted */
+ ceph_fscrypt_adjust_off_and_len(inode, &read_off, &read_len);
+
+ dout("sync_read orig %llu~%llu reading %llu~%llu",
+ off, len, read_off, read_len);

req = ceph_osdc_new_request(osdc, &ci->i_layout,
- ci->i_vino, off, &len, 0, 1,
- sparse ? CEPH_OSD_OP_SPARSE_READ : CEPH_OSD_OP_READ,
+ ci->i_vino, read_off, &read_len, 0, 1,
+ sparse ? CEPH_OSD_OP_SPARSE_READ :
+ CEPH_OSD_OP_READ,
CEPH_OSD_FLAG_READ,
NULL, ci->i_truncate_seq,
ci->i_truncate_size, false);
@@ -974,10 +983,13 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
break;
}

+ /* adjust len downward if the request truncated the len */
+ if (off + len > read_off + read_len)
+ len = read_off + read_len - off;
more = len < iov_iter_count(to);

- num_pages = calc_pages_for(off, len);
- page_off = off & ~PAGE_MASK;
+ num_pages = calc_pages_for(read_off, read_len);
+ page_off = offset_in_page(off);
pages = ceph_alloc_page_vector(num_pages, GFP_KERNEL);
if (IS_ERR(pages)) {
ceph_osdc_put_request(req);
@@ -985,7 +997,8 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
break;
}

- osd_req_op_extent_osd_data_pages(req, 0, pages, len, page_off,
+ osd_req_op_extent_osd_data_pages(req, 0, pages, read_len,
+ offset_in_page(read_off),
false, false);

op = &req->r_ops[0];
@@ -1004,7 +1017,7 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
ceph_update_read_metrics(&fsc->mdsc->metric,
req->r_start_latency,
req->r_end_latency,
- len, ret);
+ read_len, ret);

if (ret > 0)
objver = req->r_version;
@@ -1019,8 +1032,34 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
else if (ret == -ENOENT)
ret = 0;

+ if (ret > 0 && IS_ENCRYPTED(inode)) {
+ int fret;
+
+ fret = ceph_fscrypt_decrypt_extents(inode, pages, read_off,
+ op->extent.sparse_ext, op->extent.sparse_ext_cnt);
+ if (fret < 0) {
+ ret = fret;
+ ceph_osdc_put_request(req);
+ break;
+ }
+
+ /* account for any partial block at the beginning */
+ fret -= (off - read_off);
+
+ /*
+ * Short read after big offset adjustment?
+ * Nothing is usable, just call it a zero
+ * len read.
+ */
+ fret = max(fret, 0);
+
+ /* account for partial block at the end */
+ ret = min_t(ssize_t, fret, len);
+ }
+
ceph_osdc_put_request(req);

+ /* Short read but not EOF? Zero out the remainder. */
if (ret >= 0 && ret < len && (off + ret < i_size)) {
int zlen = min(len - ret, i_size - off - ret);
int zoff = page_off + ret;
@@ -1034,15 +1073,16 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
idx = 0;
left = ret > 0 ? ret : 0;
while (left > 0) {
- size_t len, copied;
- page_off = off & ~PAGE_MASK;
- len = min_t(size_t, left, PAGE_SIZE - page_off);
+ size_t plen, copied;
+
+ plen = min_t(size_t, left, PAGE_SIZE - page_off);
SetPageUptodate(pages[idx]);
copied = copy_page_to_iter(pages[idx++],
- page_off, len, to);
+ page_off, plen, to);
off += copied;
left -= copied;
- if (copied < len) {
+ page_off = 0;
+ if (copied < plen) {
ret = -EFAULT;
break;
}
@@ -1059,20 +1099,21 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
break;
}

- if (off > *ki_pos) {
- if (off >= i_size) {
- *retry_op = CHECK_EOF;
- ret = i_size - *ki_pos;
- *ki_pos = i_size;
- } else {
- ret = off - *ki_pos;
- *ki_pos = off;
+ if (ret > 0) {
+ if (off > *ki_pos) {
+ if (off >= i_size) {
+ *retry_op = CHECK_EOF;
+ ret = i_size - *ki_pos;
+ *ki_pos = i_size;
+ } else {
+ ret = off - *ki_pos;
+ *ki_pos = off;
+ }
}
- }
-
- if (last_objver && ret > 0)
- *last_objver = objver;

+ if (last_objver)
+ *last_objver = objver;
+ }
dout("sync_read result %zd retry_op %d\n", ret, *retry_op);
return ret;
}
--
2.35.1

2022-03-22 19:37:26

by Jeffrey Layton

[permalink] [raw]
Subject: [RFC PATCH v11 26/51] ceph: create symlinks with encrypted and base64-encoded targets

When creating symlinks in encrypted directories, encrypt and
base64-encode the target with the new inode's key before sending to the
MDS.

When filling a symlinked inode, base64-decode it into a buffer that
we'll keep in ci->i_symlink. When get_link is called, decrypt the buffer
into a new one that will hang off i_link.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/ceph/dir.c | 51 ++++++++++++++++++++---
fs/ceph/inode.c | 107 ++++++++++++++++++++++++++++++++++++++++++------
2 files changed, 141 insertions(+), 17 deletions(-)

diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
index 5ce2a6384e55..82a5f37e9d4a 100644
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -942,6 +942,40 @@ static int ceph_create(struct user_namespace *mnt_userns, struct inode *dir,
return ceph_mknod(mnt_userns, dir, dentry, mode, 0);
}

+#if IS_ENABLED(CONFIG_FS_ENCRYPTION)
+static int prep_encrypted_symlink_target(struct ceph_mds_request *req, const char *dest)
+{
+ int err;
+ int len = strlen(dest);
+ struct fscrypt_str osd_link = FSTR_INIT(NULL, 0);
+
+ err = fscrypt_prepare_symlink(req->r_parent, dest, len, PATH_MAX, &osd_link);
+ if (err)
+ goto out;
+
+ err = fscrypt_encrypt_symlink(req->r_new_inode, dest, len, &osd_link);
+ if (err)
+ goto out;
+
+ req->r_path2 = kmalloc(FSCRYPT_BASE64URL_CHARS(osd_link.len) + 1, GFP_KERNEL);
+ if (!req->r_path2) {
+ err = -ENOMEM;
+ goto out;
+ }
+
+ len = fscrypt_base64url_encode(osd_link.name, osd_link.len, req->r_path2);
+ req->r_path2[len] = '\0';
+out:
+ fscrypt_fname_free_buffer(&osd_link);
+ return err;
+}
+#else
+static int prep_encrypted_symlink_target(struct ceph_mds_request *req, const char *dest)
+{
+ return -EOPNOTSUPP;
+}
+#endif
+
static int ceph_symlink(struct user_namespace *mnt_userns, struct inode *dir,
struct dentry *dentry, const char *dest)
{
@@ -973,14 +1007,21 @@ static int ceph_symlink(struct user_namespace *mnt_userns, struct inode *dir,
goto out_req;
}

- req->r_path2 = kstrdup(dest, GFP_KERNEL);
- if (!req->r_path2) {
- err = -ENOMEM;
- goto out_req;
- }
req->r_parent = dir;
ihold(dir);

+ if (IS_ENCRYPTED(req->r_new_inode)) {
+ err = prep_encrypted_symlink_target(req, dest);
+ if (err)
+ goto out_req;
+ } else {
+ req->r_path2 = kstrdup(dest, GFP_KERNEL);
+ if (!req->r_path2) {
+ err = -ENOMEM;
+ goto out_req;
+ }
+ }
+
set_bit(CEPH_MDS_R_PARENT_LOCKED, &req->r_req_flags);
req->r_dentry = dget(dentry);
req->r_num_caps = 2;
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 8f0ba67ec78f..fe006f189c0f 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -35,6 +35,7 @@
*/

static const struct inode_operations ceph_symlink_iops;
+static const struct inode_operations ceph_encrypted_symlink_iops;

static void ceph_inode_work(struct work_struct *work);

@@ -638,6 +639,7 @@ void ceph_free_inode(struct inode *inode)
#ifdef CONFIG_FS_ENCRYPTION
kfree(ci->fscrypt_auth);
#endif
+ fscrypt_free_inode(inode);
kmem_cache_free(ceph_inode_cachep, ci);
}

@@ -835,6 +837,34 @@ void ceph_fill_file_time(struct inode *inode, int issued,
inode, time_warp_seq, ci->i_time_warp_seq);
}

+#if IS_ENABLED(CONFIG_FS_ENCRYPTION)
+static int decode_encrypted_symlink(const char *encsym, int enclen, u8 **decsym)
+{
+ int declen;
+ u8 *sym;
+
+ sym = kmalloc(enclen + 1, GFP_NOFS);
+ if (!sym)
+ return -ENOMEM;
+
+ declen = fscrypt_base64url_decode(encsym, enclen, sym);
+ if (declen < 0) {
+ pr_err("%s: can't decode symlink (%d). Content: %.*s\n",
+ __func__, declen, enclen, encsym);
+ kfree(sym);
+ return -EIO;
+ }
+ sym[declen + 1] = '\0';
+ *decsym = sym;
+ return declen;
+}
+#else
+static int decode_encrypted_symlink(const char *encsym, int symlen, u8 **decsym)
+{
+ return -EOPNOTSUPP;
+}
+#endif
+
/*
* Populate an inode based on info from mds. May be called on new or
* existing inodes.
@@ -1068,26 +1098,39 @@ int ceph_fill_inode(struct inode *inode, struct page *locked_page,
inode->i_fop = &ceph_file_fops;
break;
case S_IFLNK:
- inode->i_op = &ceph_symlink_iops;
if (!ci->i_symlink) {
u32 symlen = iinfo->symlink_len;
char *sym;

spin_unlock(&ci->i_ceph_lock);

- if (symlen != i_size_read(inode)) {
- pr_err("%s %llx.%llx BAD symlink "
- "size %lld\n", __func__,
- ceph_vinop(inode),
- i_size_read(inode));
+ if (IS_ENCRYPTED(inode)) {
+ if (symlen != i_size_read(inode))
+ pr_err("%s %llx.%llx BAD symlink size %lld\n",
+ __func__, ceph_vinop(inode), i_size_read(inode));
+
+ err = decode_encrypted_symlink(iinfo->symlink, symlen, (u8 **)&sym);
+ if (err < 0) {
+ pr_err("%s decoding encrypted symlink failed: %d\n",
+ __func__, err);
+ goto out;
+ }
+ symlen = err;
i_size_write(inode, symlen);
inode->i_blocks = calc_inode_blocks(symlen);
- }
+ } else {
+ if (symlen != i_size_read(inode)) {
+ pr_err("%s %llx.%llx BAD symlink size %lld\n",
+ __func__, ceph_vinop(inode), i_size_read(inode));
+ i_size_write(inode, symlen);
+ inode->i_blocks = calc_inode_blocks(symlen);
+ }

- err = -ENOMEM;
- sym = kstrndup(iinfo->symlink, symlen, GFP_NOFS);
- if (!sym)
- goto out;
+ err = -ENOMEM;
+ sym = kstrndup(iinfo->symlink, symlen, GFP_NOFS);
+ if (!sym)
+ goto out;
+ }

spin_lock(&ci->i_ceph_lock);
if (!ci->i_symlink)
@@ -1095,7 +1138,17 @@ int ceph_fill_inode(struct inode *inode, struct page *locked_page,
else
kfree(sym); /* lost a race */
}
- inode->i_link = ci->i_symlink;
+
+ if (IS_ENCRYPTED(inode)) {
+ /*
+ * Encrypted symlinks need to be decrypted before we can
+ * cache their targets in i_link. Don't touch it here.
+ */
+ inode->i_op = &ceph_encrypted_symlink_iops;
+ } else {
+ inode->i_link = ci->i_symlink;
+ inode->i_op = &ceph_symlink_iops;
+ }
break;
case S_IFDIR:
inode->i_op = &ceph_dir_iops;
@@ -2122,6 +2175,29 @@ static void ceph_inode_work(struct work_struct *work)
iput(inode);
}

+static const char *ceph_encrypted_get_link(struct dentry *dentry, struct inode *inode,
+ struct delayed_call *done)
+{
+ struct ceph_inode_info *ci = ceph_inode(inode);
+
+ if (!dentry)
+ return ERR_PTR(-ECHILD);
+
+ return fscrypt_get_symlink(inode, ci->i_symlink, i_size_read(inode), done);
+}
+
+static int ceph_encrypted_symlink_getattr(struct user_namespace *mnt_userns,
+ const struct path *path, struct kstat *stat,
+ u32 request_mask, unsigned int query_flags)
+{
+ int ret;
+
+ ret = ceph_getattr(mnt_userns, path, stat, request_mask, query_flags);
+ if (ret)
+ return ret;
+ return fscrypt_symlink_getattr(path, stat);
+}
+
/*
* symlinks
*/
@@ -2132,6 +2208,13 @@ static const struct inode_operations ceph_symlink_iops = {
.listxattr = ceph_listxattr,
};

+static const struct inode_operations ceph_encrypted_symlink_iops = {
+ .get_link = ceph_encrypted_get_link,
+ .setattr = ceph_setattr,
+ .getattr = ceph_encrypted_symlink_getattr,
+ .listxattr = ceph_listxattr,
+};
+
int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *cia)
{
struct ceph_inode_info *ci = ceph_inode(inode);
--
2.35.1

2022-03-22 19:59:51

by Jeffrey Layton

[permalink] [raw]
Subject: [RFC PATCH v11 33/51] ceph: fscrypt_file field handling in MClientRequest messages

For encrypted inodes, transmit a rounded-up size to the MDS as the
normal file size and send the real inode size in fscrypt_file field.

Also, fix up creates and truncates to also transmit fscrypt_file.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/ceph/dir.c | 3 +++
fs/ceph/file.c | 2 ++
fs/ceph/inode.c | 18 ++++++++++++++++--
fs/ceph/mds_client.c | 9 ++++++++-
fs/ceph/mds_client.h | 2 ++
5 files changed, 31 insertions(+), 3 deletions(-)

diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
index 8a9f916bfc6c..5ccf6453f02f 100644
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -910,6 +910,9 @@ static int ceph_mknod(struct user_namespace *mnt_userns, struct inode *dir,
goto out_req;
}

+ if (S_ISREG(mode) && IS_ENCRYPTED(dir))
+ set_bit(CEPH_MDS_R_FSCRYPT_FILE, &req->r_req_flags);
+
req->r_dentry = dget(dentry);
req->r_num_caps = 2;
req->r_parent = dir;
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 8a222ce5f8ce..df790317bedb 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -765,6 +765,8 @@ int ceph_atomic_open(struct inode *dir, struct dentry *dentry,
req->r_args.open.mask = cpu_to_le32(mask);
req->r_parent = dir;
ihold(dir);
+ if (IS_ENCRYPTED(dir))
+ set_bit(CEPH_MDS_R_FSCRYPT_FILE, &req->r_req_flags);

if (flags & O_CREAT) {
struct ceph_file_layout lo;
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 35e7ef462136..599e27dae8c8 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -2374,11 +2374,25 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c
}
} else if ((issued & CEPH_CAP_FILE_SHARED) == 0 ||
attr->ia_size != isize) {
- req->r_args.setattr.size = cpu_to_le64(attr->ia_size);
- req->r_args.setattr.old_size = cpu_to_le64(isize);
mask |= CEPH_SETATTR_SIZE;
release |= CEPH_CAP_FILE_SHARED | CEPH_CAP_FILE_EXCL |
CEPH_CAP_FILE_RD | CEPH_CAP_FILE_WR;
+ if (IS_ENCRYPTED(inode) && attr->ia_size) {
+ set_bit(CEPH_MDS_R_FSCRYPT_FILE, &req->r_req_flags);
+ mask |= CEPH_SETATTR_FSCRYPT_FILE;
+ req->r_args.setattr.size =
+ cpu_to_le64(round_up(attr->ia_size,
+ CEPH_FSCRYPT_BLOCK_SIZE));
+ req->r_args.setattr.old_size =
+ cpu_to_le64(round_up(isize,
+ CEPH_FSCRYPT_BLOCK_SIZE));
+ req->r_fscrypt_file = attr->ia_size;
+ /* FIXME: client must zero out any partial blocks! */
+ } else {
+ req->r_args.setattr.size = cpu_to_le64(attr->ia_size);
+ req->r_args.setattr.old_size = cpu_to_le64(isize);
+ req->r_fscrypt_file = 0;
+ }
}
}
if (ia_valid & ATTR_MTIME) {
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index a747ea7b7647..cd0c780a6f84 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -2752,7 +2752,12 @@ static void encode_mclientrequest_tail(void **p, const struct ceph_mds_request *
} else {
ceph_encode_32(p, 0);
}
- ceph_encode_32(p, 0); // fscrypt_file for now
+ if (test_bit(CEPH_MDS_R_FSCRYPT_FILE, &req->r_req_flags)) {
+ ceph_encode_32(p, sizeof(__le64));
+ ceph_encode_64(p, req->r_fscrypt_file);
+ } else {
+ ceph_encode_32(p, 0);
+ }
}

/*
@@ -2838,6 +2843,8 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,

/* fscrypt_file */
len += sizeof(u32);
+ if (test_bit(CEPH_MDS_R_FSCRYPT_FILE, &req->r_req_flags))
+ len += sizeof(__le64);

msg = ceph_msg_new2(CEPH_MSG_CLIENT_REQUEST, len, 1, GFP_NOFS, false);
if (!msg) {
diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
index 046a9368c4a9..e297bf98c39f 100644
--- a/fs/ceph/mds_client.h
+++ b/fs/ceph/mds_client.h
@@ -282,6 +282,7 @@ struct ceph_mds_request {
#define CEPH_MDS_R_DID_PREPOPULATE (6) /* prepopulated readdir */
#define CEPH_MDS_R_PARENT_LOCKED (7) /* is r_parent->i_rwsem wlocked? */
#define CEPH_MDS_R_ASYNC (8) /* async request */
+#define CEPH_MDS_R_FSCRYPT_FILE (9) /* must marshal fscrypt_file field */
unsigned long r_req_flags;

struct mutex r_fill_mutex;
@@ -289,6 +290,7 @@ struct ceph_mds_request {
union ceph_mds_request_args r_args;

struct ceph_fscrypt_auth *r_fscrypt_auth;
+ u64 r_fscrypt_file;

u8 *r_altname; /* fscrypt binary crypttext for long filenames */
u32 r_altname_len; /* length of r_altname */
--
2.35.1

2022-03-22 20:50:50

by Jeffrey Layton

[permalink] [raw]
Subject: [RFC PATCH v11 49/51] ceph: set i_blkbits to crypto block size for encrypted inodes

Some of the underlying infrastructure for fscrypt relies on i_blkbits
being aligned to the crypto blocksize.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/ceph/inode.c | 16 +++++++++-------
1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 9f34e4993b61..b048d9da8310 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -975,13 +975,6 @@ int ceph_fill_inode(struct inode *inode, struct page *locked_page,
issued |= __ceph_caps_dirty(ci);
new_issued = ~issued & info_caps;

- /* directories have fl_stripe_unit set to zero */
- if (le32_to_cpu(info->layout.fl_stripe_unit))
- inode->i_blkbits =
- fls(le32_to_cpu(info->layout.fl_stripe_unit)) - 1;
- else
- inode->i_blkbits = CEPH_BLOCK_SHIFT;
-
__ceph_update_quota(ci, iinfo->max_bytes, iinfo->max_files);

if ((new_version || (new_issued & CEPH_CAP_AUTH_SHARED)) &&
@@ -1006,6 +999,15 @@ int ceph_fill_inode(struct inode *inode, struct page *locked_page,
#endif
}

+ /* directories have fl_stripe_unit set to zero */
+ if (IS_ENCRYPTED(inode))
+ inode->i_blkbits = CEPH_FSCRYPT_BLOCK_SHIFT;
+ else if (le32_to_cpu(info->layout.fl_stripe_unit))
+ inode->i_blkbits =
+ fls(le32_to_cpu(info->layout.fl_stripe_unit)) - 1;
+ else
+ inode->i_blkbits = CEPH_BLOCK_SHIFT;
+
if ((new_version || (new_issued & CEPH_CAP_LINK_SHARED)) &&
(issued & CEPH_CAP_LINK_EXCL) == 0)
set_nlink(inode, le32_to_cpu(info->nlink));
--
2.35.1

2022-03-22 21:38:55

by Jeffrey Layton

[permalink] [raw]
Subject: [RFC PATCH v11 11/51] ceph: decode alternate_name in lease info

Ceph is a bit different from local filesystems, in that we don't want
to store filenames as raw binary data, since we may also be dealing
with clients that don't support fscrypt.

We could just base64-encode the encrypted filenames, but that could
leave us with filenames longer than NAME_MAX. It turns out that the
MDS doesn't care much about filename length, but the clients do.

To manage this, we've added a new "alternate name" field that can be
optionally added to any dentry that we'll use to store the binary
crypttext of the filename if its base64-encoded value will be longer
than NAME_MAX. When a dentry has one of these names attached, the MDS
will send it along in the lease info, which we can then store for
later usage.

Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: Xiubo Li <[email protected]>
---
fs/ceph/mds_client.c | 43 +++++++++++++++++++++++++++++++++----------
fs/ceph/mds_client.h | 11 +++++++----
2 files changed, 40 insertions(+), 14 deletions(-)

diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 648816eb4228..49ba47baac8e 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -308,27 +308,47 @@ static int parse_reply_info_dir(void **p, void *end,

static int parse_reply_info_lease(void **p, void *end,
struct ceph_mds_reply_lease **lease,
- u64 features)
+ u64 features, u32 *altname_len, u8 **altname)
{
+ u8 struct_v;
+ u32 struct_len;
+ void *lend;
+
if (features == (u64)-1) {
- u8 struct_v, struct_compat;
- u32 struct_len;
+ u8 struct_compat;
+
ceph_decode_8_safe(p, end, struct_v, bad);
ceph_decode_8_safe(p, end, struct_compat, bad);
+
/* struct_v is expected to be >= 1. we only understand
* encoding whose struct_compat == 1. */
if (!struct_v || struct_compat != 1)
goto bad;
+
ceph_decode_32_safe(p, end, struct_len, bad);
- ceph_decode_need(p, end, struct_len, bad);
- end = *p + struct_len;
+ } else {
+ struct_len = sizeof(**lease);
+ *altname_len = 0;
+ *altname = NULL;
}

- ceph_decode_need(p, end, sizeof(**lease), bad);
+ lend = *p + struct_len;
+ ceph_decode_need(p, end, struct_len, bad);
*lease = *p;
*p += sizeof(**lease);
- if (features == (u64)-1)
- *p = end;
+
+ if (features == (u64)-1) {
+ if (struct_v >= 2) {
+ ceph_decode_32_safe(p, end, *altname_len, bad);
+ ceph_decode_need(p, end, *altname_len, bad);
+ *altname = *p;
+ *p += *altname_len;
+ } else {
+ *altname = NULL;
+ *altname_len = 0;
+ }
+ }
+ *p = lend;
return 0;
bad:
return -EIO;
@@ -358,7 +378,8 @@ static int parse_reply_info_trace(void **p, void *end,
info->dname = *p;
*p += info->dname_len;

- err = parse_reply_info_lease(p, end, &info->dlease, features);
+ err = parse_reply_info_lease(p, end, &info->dlease, features,
+ &info->altname_len, &info->altname);
if (err < 0)
goto out_bad;
}
@@ -425,9 +446,11 @@ static int parse_reply_info_readdir(void **p, void *end,
dout("parsed dir dname '%.*s'\n", rde->name_len, rde->name);

/* dentry lease */
- err = parse_reply_info_lease(p, end, &rde->lease, features);
+ err = parse_reply_info_lease(p, end, &rde->lease, features,
+ &rde->altname_len, &rde->altname);
if (err)
goto out_bad;
+
/* inode */
err = parse_reply_info_in(p, end, &rde->inode, features);
if (err < 0)
diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
index aab3ab284fce..2cc75f9ae7c7 100644
--- a/fs/ceph/mds_client.h
+++ b/fs/ceph/mds_client.h
@@ -29,8 +29,8 @@ enum ceph_feature_type {
CEPHFS_FEATURE_MULTI_RECONNECT,
CEPHFS_FEATURE_DELEG_INO,
CEPHFS_FEATURE_METRIC_COLLECT,
-
- CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_METRIC_COLLECT,
+ CEPHFS_FEATURE_ALTERNATE_NAME,
+ CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_ALTERNATE_NAME,
};

/*
@@ -45,8 +45,7 @@ enum ceph_feature_type {
CEPHFS_FEATURE_MULTI_RECONNECT, \
CEPHFS_FEATURE_DELEG_INO, \
CEPHFS_FEATURE_METRIC_COLLECT, \
- \
- CEPHFS_FEATURE_MAX, \
+ CEPHFS_FEATURE_ALTERNATE_NAME, \
}
#define CEPHFS_FEATURES_CLIENT_REQUIRED {}

@@ -98,7 +97,9 @@ struct ceph_mds_reply_info_in {

struct ceph_mds_reply_dir_entry {
char *name;
+ u8 *altname;
u32 name_len;
+ u32 altname_len;
struct ceph_mds_reply_lease *lease;
struct ceph_mds_reply_info_in inode;
loff_t offset;
@@ -122,7 +123,9 @@ struct ceph_mds_reply_info_parsed {
struct ceph_mds_reply_info_in diri, targeti;
struct ceph_mds_reply_dirfrag *dirfrag;
char *dname;
+ u8 *altname;
u32 dname_len;
+ u32 altname_len;
struct ceph_mds_reply_lease *dlease;
struct ceph_mds_reply_xattr xattr_info;

--
2.35.1

2022-03-22 22:30:13

by Jeffrey Layton

[permalink] [raw]
Subject: [RFC PATCH v11 25/51] ceph: add support to readdir for encrypted filenames

From: Xiubo Li <[email protected]>

Once we've decrypted the names in a readdir reply, we no longer need the
crypttext, so overwrite them in ceph_mds_reply_dir_entry with the
unencrypted names. Then in both ceph_readdir_prepopulate() and
ceph_readdir() we will use the dencrypted name directly.

[ jlayton: convert some BUG_ONs into error returns ]

Signed-off-by: Xiubo Li <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
---
fs/ceph/crypto.c | 12 +++++--
fs/ceph/crypto.h | 1 +
fs/ceph/dir.c | 35 +++++++++++++++----
fs/ceph/inode.c | 12 ++++---
fs/ceph/mds_client.c | 81 ++++++++++++++++++++++++++++++++++++++++----
fs/ceph/mds_client.h | 4 +--
6 files changed, 124 insertions(+), 21 deletions(-)

diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c
index 420a2cc1a8e5..c331b895c430 100644
--- a/fs/ceph/crypto.c
+++ b/fs/ceph/crypto.c
@@ -135,7 +135,10 @@ int ceph_encode_encrypted_dname(const struct inode *parent, struct qstr *d_name,
int ret;
u8 *cryptbuf;

- WARN_ON_ONCE(!fscrypt_has_encryption_key(parent));
+ if (!fscrypt_has_encryption_key(parent)) {
+ memcpy(buf, d_name->name, d_name->len);
+ return d_name->len;
+ }

/*
* Convert cleartext d_name to ciphertext. If result is longer than
@@ -177,6 +180,8 @@ int ceph_encode_encrypted_dname(const struct inode *parent, struct qstr *d_name,

int ceph_encode_encrypted_fname(const struct inode *parent, struct dentry *dentry, char *buf)
{
+ WARN_ON_ONCE(!fscrypt_has_encryption_key(parent));
+
return ceph_encode_encrypted_dname(parent, &dentry->d_name, buf);
}

@@ -221,7 +226,10 @@ int ceph_fname_to_usr(const struct ceph_fname *fname, struct fscrypt_str *tname,
* generating a nokey name via fscrypt.
*/
if (!fscrypt_has_encryption_key(fname->dir)) {
- memcpy(oname->name, fname->name, fname->name_len);
+ if (fname->no_copy)
+ oname->name = fname->name;
+ else
+ memcpy(oname->name, fname->name, fname->name_len);
oname->len = fname->name_len;
if (is_nokey)
*is_nokey = true;
diff --git a/fs/ceph/crypto.h b/fs/ceph/crypto.h
index e54150260eba..080905b0c73c 100644
--- a/fs/ceph/crypto.h
+++ b/fs/ceph/crypto.h
@@ -19,6 +19,7 @@ struct ceph_fname {
unsigned char *ctext; // binary crypttext (if any)
u32 name_len; // length of name buffer
u32 ctext_len; // length of crypttext
+ bool no_copy;
};

struct ceph_fscrypt_auth {
diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
index caf2547c3fe1..5ce2a6384e55 100644
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -9,6 +9,7 @@

#include "super.h"
#include "mds_client.h"
+#include "crypto.h"

/*
* Directory operations: readdir, lookup, create, link, unlink,
@@ -241,7 +242,9 @@ static int __dcache_readdir(struct file *file, struct dir_context *ctx,
di = ceph_dentry(dentry);
if (d_unhashed(dentry) ||
d_really_is_negative(dentry) ||
- di->lease_shared_gen != shared_gen) {
+ di->lease_shared_gen != shared_gen ||
+ ((dentry->d_flags & DCACHE_NOKEY_NAME) &&
+ fscrypt_has_encryption_key(dir))) {
spin_unlock(&dentry->d_lock);
dput(dentry);
err = -EAGAIN;
@@ -340,6 +343,10 @@ static int ceph_readdir(struct file *file, struct dir_context *ctx)
ctx->pos = 2;
}

+ err = fscrypt_prepare_readdir(inode);
+ if (err)
+ return err;
+
spin_lock(&ci->i_ceph_lock);
/* request Fx cap. if have Fx, we don't need to release Fs cap
* for later create/unlink. */
@@ -389,6 +396,7 @@ static int ceph_readdir(struct file *file, struct dir_context *ctx)
req = ceph_mdsc_create_request(mdsc, op, USE_AUTH_MDS);
if (IS_ERR(req))
return PTR_ERR(req);
+
err = ceph_alloc_readdir_reply_buffer(req, inode);
if (err) {
ceph_mdsc_put_request(req);
@@ -402,11 +410,20 @@ static int ceph_readdir(struct file *file, struct dir_context *ctx)
req->r_inode_drop = CEPH_CAP_FILE_EXCL;
}
if (dfi->last_name) {
- req->r_path2 = kstrdup(dfi->last_name, GFP_KERNEL);
+ struct qstr d_name = { .name = dfi->last_name,
+ .len = strlen(dfi->last_name) };
+
+ req->r_path2 = kzalloc(NAME_MAX + 1, GFP_KERNEL);
if (!req->r_path2) {
ceph_mdsc_put_request(req);
return -ENOMEM;
}
+
+ err = ceph_encode_encrypted_dname(inode, &d_name, req->r_path2);
+ if (err < 0) {
+ ceph_mdsc_put_request(req);
+ return err;
+ }
} else if (is_hash_order(ctx->pos)) {
req->r_args.readdir.offset_hash =
cpu_to_le32(fpos_hash(ctx->pos));
@@ -511,15 +528,20 @@ static int ceph_readdir(struct file *file, struct dir_context *ctx)
for (; i < rinfo->dir_nr; i++) {
struct ceph_mds_reply_dir_entry *rde = rinfo->dir_entries + i;

- BUG_ON(rde->offset < ctx->pos);
+ if (rde->offset < ctx->pos) {
+ pr_warn("%s: rde->offset 0x%llx ctx->pos 0x%llx\n",
+ __func__, rde->offset, ctx->pos);
+ return -EIO;
+ }
+
+ if (WARN_ON_ONCE(!rde->inode.in))
+ return -EIO;

ctx->pos = rde->offset;
dout("readdir (%d/%d) -> %llx '%.*s' %p\n",
i, rinfo->dir_nr, ctx->pos,
rde->name_len, rde->name, &rde->inode.in);

- BUG_ON(!rde->inode.in);
-
if (!dir_emit(ctx, rde->name, rde->name_len,
ceph_present_ino(inode->i_sb, le64_to_cpu(rde->inode.in->ino)),
le32_to_cpu(rde->inode.in->mode) >> 12)) {
@@ -532,6 +554,8 @@ static int ceph_readdir(struct file *file, struct dir_context *ctx)
dout("filldir stopping us...\n");
return 0;
}
+
+ /* Reset the lengths to their original allocated vals */
ctx->pos++;
}

@@ -586,7 +610,6 @@ static int ceph_readdir(struct file *file, struct dir_context *ctx)
dfi->dir_ordered_count);
spin_unlock(&ci->i_ceph_lock);
}
-
dout("readdir %p file %p done.\n", inode, file);
return 0;
}
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 9585162200f6..8f0ba67ec78f 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -1747,7 +1747,8 @@ int ceph_readdir_prepopulate(struct ceph_mds_request *req,
struct ceph_mds_session *session)
{
struct dentry *parent = req->r_dentry;
- struct ceph_inode_info *ci = ceph_inode(d_inode(parent));
+ struct inode *inode = d_inode(parent);
+ struct ceph_inode_info *ci = ceph_inode(inode);
struct ceph_mds_reply_info_parsed *rinfo = &req->r_reply_info;
struct qstr dname;
struct dentry *dn;
@@ -1821,9 +1822,7 @@ int ceph_readdir_prepopulate(struct ceph_mds_request *req,
tvino.snap = le64_to_cpu(rde->inode.in->snapid);

if (rinfo->hash_order) {
- u32 hash = ceph_str_hash(ci->i_dir_layout.dl_dir_hash,
- rde->name, rde->name_len);
- hash = ceph_frag_value(hash);
+ u32 hash = ceph_frag_value(rde->raw_hash);
if (hash != last_hash)
fpos_offset = 2;
last_hash = hash;
@@ -1846,6 +1845,11 @@ int ceph_readdir_prepopulate(struct ceph_mds_request *req,
err = -ENOMEM;
goto out;
}
+ if (rde->is_nokey) {
+ spin_lock(&dn->d_lock);
+ dn->d_flags |= DCACHE_NOKEY_NAME;
+ spin_unlock(&dn->d_lock);
+ }
} else if (d_really_is_positive(dn) &&
(ceph_ino(d_inode(dn)) != tvino.ino ||
ceph_snap(d_inode(dn)) != tvino.snap)) {
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index d11599bb85f6..a747ea7b7647 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -439,20 +439,87 @@ static int parse_reply_info_readdir(void **p, void *end,

info->dir_nr = num;
while (num) {
+ struct inode *inode = d_inode(req->r_dentry);
+ struct ceph_inode_info *ci = ceph_inode(inode);
struct ceph_mds_reply_dir_entry *rde = info->dir_entries + i;
+ struct fscrypt_str tname = FSTR_INIT(NULL, 0);
+ struct fscrypt_str oname = FSTR_INIT(NULL, 0);
+ struct ceph_fname fname;
+ u32 altname_len, _name_len;
+ u8 *altname, *_name;
+
/* dentry */
- ceph_decode_32_safe(p, end, rde->name_len, bad);
- ceph_decode_need(p, end, rde->name_len, bad);
- rde->name = *p;
- *p += rde->name_len;
- dout("parsed dir dname '%.*s'\n", rde->name_len, rde->name);
+ ceph_decode_32_safe(p, end, _name_len, bad);
+ ceph_decode_need(p, end, _name_len, bad);
+ _name = *p;
+ *p += _name_len;
+ dout("parsed dir dname '%.*s'\n", _name_len, _name);
+
+ if (info->hash_order)
+ rde->raw_hash = ceph_str_hash(ci->i_dir_layout.dl_dir_hash,
+ _name, _name_len);

/* dentry lease */
err = parse_reply_info_lease(p, end, &rde->lease, features,
- &rde->altname_len, &rde->altname);
+ &altname_len, &altname);
if (err)
goto out_bad;

+ /*
+ * Try to dencrypt the dentry names and update them
+ * in the ceph_mds_reply_dir_entry struct.
+ */
+ fname.dir = inode;
+ fname.name = _name;
+ fname.name_len = _name_len;
+ fname.ctext = altname;
+ fname.ctext_len = altname_len;
+ /*
+ * The _name_len maybe larger than altname_len, such as
+ * when the human readable name length is in range of
+ * (CEPH_NOHASH_NAME_MAX, CEPH_NOHASH_NAME_MAX + SHA256_DIGEST_SIZE),
+ * then the copy in ceph_fname_to_usr will corrupt the
+ * data if there has no encryption key.
+ *
+ * Just set the no_copy flag and then if there has no
+ * encryption key the oname.name will be assigned to
+ * _name always.
+ */
+ fname.no_copy = true;
+ if (altname_len == 0) {
+ /*
+ * Set tname to _name, and this will be used
+ * to do the base64_decode in-place. It's
+ * safe because the decoded string should
+ * always be shorter, which is 3/4 of origin
+ * string.
+ */
+ tname.name = _name;
+
+ /*
+ * Set oname to _name too, and this will be
+ * used to do the dencryption in-place.
+ */
+ oname.name = _name;
+ oname.len = _name_len;
+ } else {
+ /*
+ * This will do the decryption only in-place
+ * from altname cryptext directly.
+ */
+ oname.name = altname;
+ oname.len = altname_len;
+ }
+ rde->is_nokey = false;
+ err = ceph_fname_to_usr(&fname, &tname, &oname, &rde->is_nokey);
+ if (err) {
+ pr_err("%s unable to decode %.*s, got %d\n", __func__,
+ _name_len, _name, err);
+ goto out_bad;
+ }
+ rde->name = oname.name;
+ rde->name_len = oname.len;
+
/* inode */
err = parse_reply_info_in(p, end, &rde->inode, features);
if (err < 0)
@@ -3472,7 +3539,7 @@ static void handle_reply(struct ceph_mds_session *session, struct ceph_msg *msg)
if (err == 0) {
if (result == 0 && (req->r_op == CEPH_MDS_OP_READDIR ||
req->r_op == CEPH_MDS_OP_LSSNAP))
- ceph_readdir_prepopulate(req, req->r_session);
+ err = ceph_readdir_prepopulate(req, req->r_session);
}
current->journal_info = NULL;
mutex_unlock(&req->r_fill_mutex);
diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
index cd719691a86d..046a9368c4a9 100644
--- a/fs/ceph/mds_client.h
+++ b/fs/ceph/mds_client.h
@@ -96,10 +96,10 @@ struct ceph_mds_reply_info_in {
};

struct ceph_mds_reply_dir_entry {
+ bool is_nokey;
char *name;
- u8 *altname;
u32 name_len;
- u32 altname_len;
+ u32 raw_hash;
struct ceph_mds_reply_lease *lease;
struct ceph_mds_reply_info_in inode;
loff_t offset;
--
2.35.1

2022-03-22 23:27:34

by Jeffrey Layton

[permalink] [raw]
Subject: [RFC PATCH v11 36/51] ceph: add __ceph_get_caps helper support

From: Xiubo Li <[email protected]>

Break out the guts of ceph_get_caps into a helper that takes an inode
and ceph_file_info instead of a file pointer.

Signed-off-by: Xiubo Li <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
---
fs/ceph/caps.c | 19 +++++++++++++------
fs/ceph/super.h | 2 ++
2 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index fbf120a6aa96..2aa338219d27 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -2946,10 +2946,9 @@ int ceph_try_get_caps(struct inode *inode, int need, int want,
* due to a small max_size, make sure we check_max_size (and possibly
* ask the mds) so we don't get hung up indefinitely.
*/
-int ceph_get_caps(struct file *filp, int need, int want, loff_t endoff, int *got)
+int __ceph_get_caps(struct inode *inode, struct ceph_file_info *fi, int need,
+ int want, loff_t endoff, int *got)
{
- struct ceph_file_info *fi = filp->private_data;
- struct inode *inode = file_inode(filp);
struct ceph_inode_info *ci = ceph_inode(inode);
struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
int ret, _got, flags;
@@ -2958,7 +2957,7 @@ int ceph_get_caps(struct file *filp, int need, int want, loff_t endoff, int *got
if (ret < 0)
return ret;

- if ((fi->fmode & CEPH_FILE_MODE_WR) &&
+ if (fi && (fi->fmode & CEPH_FILE_MODE_WR) &&
fi->filp_gen != READ_ONCE(fsc->filp_gen))
return -EBADF;

@@ -2966,7 +2965,7 @@ int ceph_get_caps(struct file *filp, int need, int want, loff_t endoff, int *got

while (true) {
flags &= CEPH_FILE_MODE_MASK;
- if (atomic_read(&fi->num_locks))
+ if (fi && atomic_read(&fi->num_locks))
flags |= CHECK_FILELOCK;
_got = 0;
ret = try_get_cap_refs(inode, need, want, endoff,
@@ -3011,7 +3010,7 @@ int ceph_get_caps(struct file *filp, int need, int want, loff_t endoff, int *got
continue;
}

- if ((fi->fmode & CEPH_FILE_MODE_WR) &&
+ if (fi && (fi->fmode & CEPH_FILE_MODE_WR) &&
fi->filp_gen != READ_ONCE(fsc->filp_gen)) {
if (ret >= 0 && _got)
ceph_put_cap_refs(ci, _got);
@@ -3074,6 +3073,14 @@ int ceph_get_caps(struct file *filp, int need, int want, loff_t endoff, int *got
return 0;
}

+int ceph_get_caps(struct file *filp, int need, int want, loff_t endoff, int *got)
+{
+ struct ceph_file_info *fi = filp->private_data;
+ struct inode *inode = file_inode(filp);
+
+ return __ceph_get_caps(inode, fi, need, want, endoff, got);
+}
+
/*
* Take cap refs. Caller must already know we hold at least one ref
* on the caps in question or we don't know this is safe.
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index 8f5fdb59344c..fef4cda44861 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -1238,6 +1238,8 @@ extern int ceph_encode_dentry_release(void **p, struct dentry *dn,
struct inode *dir,
int mds, int drop, int unless);

+extern int __ceph_get_caps(struct inode *inode, struct ceph_file_info *fi,
+ int need, int want, loff_t endoff, int *got);
extern int ceph_get_caps(struct file *filp, int need, int want,
loff_t endoff, int *got);
extern int ceph_try_get_caps(struct inode *inode,
--
2.35.1

2022-03-22 23:29:47

by Jeffrey Layton

[permalink] [raw]
Subject: [RFC PATCH v11 21/51] ceph: fix base64 encoded name's length check in ceph_fname_to_usr()

From: Xiubo Li <[email protected]>

The fname->name is based64_encoded names and the max long shouldn't
exceed the NAME_MAX.

The FSCRYPT_BASE64URL_CHARS(NAME_MAX) will be 255 * 4 / 3.

Signed-off-by: Xiubo Li <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
---
fs/ceph/crypto.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c
index 86de8483032f..e56017d66354 100644
--- a/fs/ceph/crypto.c
+++ b/fs/ceph/crypto.c
@@ -204,7 +204,7 @@ int ceph_fname_to_usr(const struct ceph_fname *fname, struct fscrypt_str *tname,
}

/* Sanity check that the resulting name will fit in the buffer */
- if (fname->name_len > FSCRYPT_BASE64URL_CHARS(NAME_MAX))
+ if (fname->name_len > NAME_MAX || fname->ctext_len > NAME_MAX)
return -EIO;

ret = __fscrypt_prepare_readdir(fname->dir);
--
2.35.1

2022-03-23 00:32:31

by Jeffrey Layton

[permalink] [raw]
Subject: [RFC PATCH v11 17/51] ceph: encode encrypted name in dentry release

Encode encrypted dentry names when sending a dentry release request.
Also add a more helpful comment over ceph_encode_dentry_release.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/ceph/caps.c | 32 ++++++++++++++++++++++++++++----
fs/ceph/mds_client.c | 20 ++++++++++++++++----
2 files changed, 44 insertions(+), 8 deletions(-)

diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index b0b7688331b4..55f6ca00aff7 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -4619,6 +4619,18 @@ int ceph_encode_inode_release(void **p, struct inode *inode,
return ret;
}

+/**
+ * ceph_encode_dentry_release - encode a dentry release into an outgoing request
+ * @p: outgoing request buffer
+ * @dentry: dentry to release
+ * @dir: dir to release it from
+ * @mds: mds that we're speaking to
+ * @drop: caps being dropped
+ * @unless: unless we have these caps
+ *
+ * Encode a dentry release into an outgoing request buffer. Returns 1 if the
+ * thing was released, or a negative error code otherwise.
+ */
int ceph_encode_dentry_release(void **p, struct dentry *dentry,
struct inode *dir,
int mds, int drop, int unless)
@@ -4651,13 +4663,25 @@ int ceph_encode_dentry_release(void **p, struct dentry *dentry,
if (ret && di->lease_session && di->lease_session->s_mds == mds) {
dout("encode_dentry_release %p mds%d seq %d\n",
dentry, mds, (int)di->lease_seq);
- rel->dname_len = cpu_to_le32(dentry->d_name.len);
- memcpy(*p, dentry->d_name.name, dentry->d_name.len);
- *p += dentry->d_name.len;
rel->dname_seq = cpu_to_le32(di->lease_seq);
__ceph_mdsc_drop_dentry_lease(dentry);
+ spin_unlock(&dentry->d_lock);
+ if (IS_ENCRYPTED(dir) && fscrypt_has_encryption_key(dir)) {
+ int ret2 = ceph_encode_encrypted_fname(dir, dentry, *p);
+
+ if (ret2 < 0)
+ return ret2;
+
+ rel->dname_len = cpu_to_le32(ret2);
+ *p += ret2;
+ } else {
+ rel->dname_len = cpu_to_le32(dentry->d_name.len);
+ memcpy(*p, dentry->d_name.name, dentry->d_name.len);
+ *p += dentry->d_name.len;
+ }
+ } else {
+ spin_unlock(&dentry->d_lock);
}
- spin_unlock(&dentry->d_lock);
return ret;
}

diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index e5f569f9d6a0..a76166d93575 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -2819,15 +2819,23 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
req->r_inode ? req->r_inode : d_inode(req->r_dentry),
mds, req->r_inode_drop, req->r_inode_unless,
req->r_op == CEPH_MDS_OP_READDIR);
- if (req->r_dentry_drop)
- releases += ceph_encode_dentry_release(&p, req->r_dentry,
+ if (req->r_dentry_drop) {
+ ret = ceph_encode_dentry_release(&p, req->r_dentry,
req->r_parent, mds, req->r_dentry_drop,
req->r_dentry_unless);
- if (req->r_old_dentry_drop)
- releases += ceph_encode_dentry_release(&p, req->r_old_dentry,
+ if (ret < 0)
+ goto out_err;
+ releases += ret;
+ }
+ if (req->r_old_dentry_drop) {
+ ret = ceph_encode_dentry_release(&p, req->r_old_dentry,
req->r_old_dentry_dir, mds,
req->r_old_dentry_drop,
req->r_old_dentry_unless);
+ if (ret < 0)
+ goto out_err;
+ releases += ret;
+ }
if (req->r_old_inode_drop)
releases += ceph_encode_inode_release(&p,
d_inode(req->r_old_dentry),
@@ -2869,6 +2877,10 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
ceph_mdsc_free_path((char *)path1, pathlen1);
out:
return msg;
+out_err:
+ ceph_msg_put(msg);
+ msg = ERR_PTR(ret);
+ goto out_free2;
}

/*
--
2.35.1

2022-03-23 00:44:50

by Jeffrey Layton

[permalink] [raw]
Subject: [RFC PATCH v11 05/51] ceph: preallocate inode for ops that may create one

When creating a new inode, we need to determine the crypto context
before we can transmit the RPC. The fscrypt API has a routine for getting
a crypto context before a create occurs, but it requires an inode.

Change the ceph code to preallocate an inode in advance of a create of
any sort (open(), mknod(), symlink(), etc). Move the existing code that
generates the ACL and SELinux blobs into this routine since that's
mostly common across all the different codepaths.

In most cases, we just want to allow ceph_fill_trace to use that inode
after the reply comes in, so add a new field to the MDS request for it
(r_new_inode).

The async create codepath is a bit different though. In that case, we
want to hash the inode in advance of the RPC so that it can be used
before the reply comes in. If the call subsequently fails with
-EJUKEBOX, then just put the references and clean up the as_ctx. Note
that with this change, we now need to regenerate the as_ctx when this
occurs, but it's quite rare for it to happen.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/ceph/dir.c | 70 ++++++++++++++++++++-----------------
fs/ceph/file.c | 62 ++++++++++++++++++++-------------
fs/ceph/inode.c | 82 ++++++++++++++++++++++++++++++++++++++++----
fs/ceph/mds_client.c | 3 +-
fs/ceph/mds_client.h | 1 +
fs/ceph/super.h | 7 +++-
6 files changed, 160 insertions(+), 65 deletions(-)

diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
index eae417d71136..8cc7a49ee508 100644
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -861,13 +861,6 @@ static int ceph_mknod(struct user_namespace *mnt_userns, struct inode *dir,
goto out;
}

- err = ceph_pre_init_acls(dir, &mode, &as_ctx);
- if (err < 0)
- goto out;
- err = ceph_security_init_secctx(dentry, mode, &as_ctx);
- if (err < 0)
- goto out;
-
dout("mknod in dir %p dentry %p mode 0%ho rdev %d\n",
dir, dentry, mode, rdev);
req = ceph_mdsc_create_request(mdsc, CEPH_MDS_OP_MKNOD, USE_AUTH_MDS);
@@ -875,6 +868,14 @@ static int ceph_mknod(struct user_namespace *mnt_userns, struct inode *dir,
err = PTR_ERR(req);
goto out;
}
+
+ req->r_new_inode = ceph_new_inode(dir, dentry, &mode, &as_ctx);
+ if (IS_ERR(req->r_new_inode)) {
+ err = PTR_ERR(req->r_new_inode);
+ req->r_new_inode = NULL;
+ goto out_req;
+ }
+
req->r_dentry = dget(dentry);
req->r_num_caps = 2;
req->r_parent = dir;
@@ -884,13 +885,13 @@ static int ceph_mknod(struct user_namespace *mnt_userns, struct inode *dir,
req->r_args.mknod.rdev = cpu_to_le32(rdev);
req->r_dentry_drop = CEPH_CAP_FILE_SHARED | CEPH_CAP_AUTH_EXCL;
req->r_dentry_unless = CEPH_CAP_FILE_EXCL;
- if (as_ctx.pagelist) {
- req->r_pagelist = as_ctx.pagelist;
- as_ctx.pagelist = NULL;
- }
+
+ ceph_as_ctx_to_req(req, &as_ctx);
+
err = ceph_mdsc_do_request(mdsc, dir, req);
if (!err && !req->r_reply_info.head->is_dentry)
err = ceph_handle_notrace_create(dir, dentry);
+out_req:
ceph_mdsc_put_request(req);
out:
if (!err)
@@ -913,6 +914,7 @@ static int ceph_symlink(struct user_namespace *mnt_userns, struct inode *dir,
struct ceph_mds_client *mdsc = ceph_sb_to_mdsc(dir->i_sb);
struct ceph_mds_request *req;
struct ceph_acl_sec_ctx as_ctx = {};
+ umode_t mode = S_IFLNK | 0777;
int err;

if (ceph_snap(dir) != CEPH_NOSNAP)
@@ -923,21 +925,24 @@ static int ceph_symlink(struct user_namespace *mnt_userns, struct inode *dir,
goto out;
}

- err = ceph_security_init_secctx(dentry, S_IFLNK | 0777, &as_ctx);
- if (err < 0)
- goto out;
-
dout("symlink in dir %p dentry %p to '%s'\n", dir, dentry, dest);
req = ceph_mdsc_create_request(mdsc, CEPH_MDS_OP_SYMLINK, USE_AUTH_MDS);
if (IS_ERR(req)) {
err = PTR_ERR(req);
goto out;
}
+
+ req->r_new_inode = ceph_new_inode(dir, dentry, &mode, &as_ctx);
+ if (IS_ERR(req->r_new_inode)) {
+ err = PTR_ERR(req->r_new_inode);
+ req->r_new_inode = NULL;
+ goto out_req;
+ }
+
req->r_path2 = kstrdup(dest, GFP_KERNEL);
if (!req->r_path2) {
err = -ENOMEM;
- ceph_mdsc_put_request(req);
- goto out;
+ goto out_req;
}
req->r_parent = dir;
ihold(dir);
@@ -947,13 +952,13 @@ static int ceph_symlink(struct user_namespace *mnt_userns, struct inode *dir,
req->r_num_caps = 2;
req->r_dentry_drop = CEPH_CAP_FILE_SHARED | CEPH_CAP_AUTH_EXCL;
req->r_dentry_unless = CEPH_CAP_FILE_EXCL;
- if (as_ctx.pagelist) {
- req->r_pagelist = as_ctx.pagelist;
- as_ctx.pagelist = NULL;
- }
+
+ ceph_as_ctx_to_req(req, &as_ctx);
+
err = ceph_mdsc_do_request(mdsc, dir, req);
if (!err && !req->r_reply_info.head->is_dentry)
err = ceph_handle_notrace_create(dir, dentry);
+out_req:
ceph_mdsc_put_request(req);
out:
if (err)
@@ -989,13 +994,6 @@ static int ceph_mkdir(struct user_namespace *mnt_userns, struct inode *dir,
goto out;
}

- mode |= S_IFDIR;
- err = ceph_pre_init_acls(dir, &mode, &as_ctx);
- if (err < 0)
- goto out;
- err = ceph_security_init_secctx(dentry, mode, &as_ctx);
- if (err < 0)
- goto out;

req = ceph_mdsc_create_request(mdsc, op, USE_AUTH_MDS);
if (IS_ERR(req)) {
@@ -1003,6 +1001,14 @@ static int ceph_mkdir(struct user_namespace *mnt_userns, struct inode *dir,
goto out;
}

+ mode |= S_IFDIR;
+ req->r_new_inode = ceph_new_inode(dir, dentry, &mode, &as_ctx);
+ if (IS_ERR(req->r_new_inode)) {
+ err = PTR_ERR(req->r_new_inode);
+ req->r_new_inode = NULL;
+ goto out_req;
+ }
+
req->r_dentry = dget(dentry);
req->r_num_caps = 2;
req->r_parent = dir;
@@ -1011,15 +1017,15 @@ static int ceph_mkdir(struct user_namespace *mnt_userns, struct inode *dir,
req->r_args.mkdir.mode = cpu_to_le32(mode);
req->r_dentry_drop = CEPH_CAP_FILE_SHARED | CEPH_CAP_AUTH_EXCL;
req->r_dentry_unless = CEPH_CAP_FILE_EXCL;
- if (as_ctx.pagelist) {
- req->r_pagelist = as_ctx.pagelist;
- as_ctx.pagelist = NULL;
- }
+
+ ceph_as_ctx_to_req(req, &as_ctx);
+
err = ceph_mdsc_do_request(mdsc, dir, req);
if (!err &&
!req->r_reply_info.head->is_target &&
!req->r_reply_info.head->is_dentry)
err = ceph_handle_notrace_create(dir, dentry);
+out_req:
ceph_mdsc_put_request(req);
out:
if (!err)
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 3444a3b748e8..cccf729b55a8 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -601,7 +601,8 @@ static void ceph_async_create_cb(struct ceph_mds_client *mdsc,
ceph_mdsc_release_dir_caps(req);
}

-static int ceph_finish_async_create(struct inode *dir, struct dentry *dentry,
+static int ceph_finish_async_create(struct inode *dir, struct inode *inode,
+ struct dentry *dentry,
struct file *file, umode_t mode,
struct ceph_mds_request *req,
struct ceph_acl_sec_ctx *as_ctx,
@@ -612,7 +613,6 @@ static int ceph_finish_async_create(struct inode *dir, struct dentry *dentry,
struct ceph_mds_reply_inode in = { };
struct ceph_mds_reply_info_in iinfo = { .in = &in };
struct ceph_inode_info *ci = ceph_inode(dir);
- struct inode *inode;
struct timespec64 now;
struct ceph_string *pool_ns;
struct ceph_mds_client *mdsc = ceph_sb_to_mdsc(dir->i_sb);
@@ -621,10 +621,6 @@ static int ceph_finish_async_create(struct inode *dir, struct dentry *dentry,

ktime_get_real_ts64(&now);

- inode = ceph_get_inode(dentry->d_sb, vino);
- if (IS_ERR(inode))
- return PTR_ERR(inode);
-
iinfo.inline_version = CEPH_INLINE_NONE;
iinfo.change_attr = 1;
ceph_encode_timespec64(&iinfo.btime, &now);
@@ -680,8 +676,7 @@ static int ceph_finish_async_create(struct inode *dir, struct dentry *dentry,
ceph_dir_clear_complete(dir);
if (!d_unhashed(dentry))
d_drop(dentry);
- if (inode->i_state & I_NEW)
- discard_new_inode(inode);
+ discard_new_inode(inode);
} else {
struct dentry *dn;

@@ -721,6 +716,7 @@ int ceph_atomic_open(struct inode *dir, struct dentry *dentry,
struct ceph_fs_client *fsc = ceph_sb_to_client(dir->i_sb);
struct ceph_mds_client *mdsc = fsc->mdsc;
struct ceph_mds_request *req;
+ struct inode *new_inode = NULL;
struct dentry *dn;
struct ceph_acl_sec_ctx as_ctx = {};
bool try_async = ceph_test_mount_opt(fsc, ASYNC_DIROPS);
@@ -733,21 +729,21 @@ int ceph_atomic_open(struct inode *dir, struct dentry *dentry,

if (dentry->d_name.len > NAME_MAX)
return -ENAMETOOLONG;
-
+retry:
if (flags & O_CREAT) {
if (ceph_quota_is_max_files_exceeded(dir))
return -EDQUOT;
- err = ceph_pre_init_acls(dir, &mode, &as_ctx);
- if (err < 0)
- return err;
- err = ceph_security_init_secctx(dentry, mode, &as_ctx);
- if (err < 0)
+
+ new_inode = ceph_new_inode(dir, dentry, &mode, &as_ctx);
+ if (IS_ERR(new_inode)) {
+ err = PTR_ERR(new_inode);
goto out_ctx;
+ }
} else if (!d_in_lookup(dentry)) {
/* If it's not being looked up, it's negative */
return -ENOENT;
}
-retry:
+
/* do the open */
req = prepare_open_request(dir->i_sb, flags, mode);
if (IS_ERR(req)) {
@@ -768,25 +764,40 @@ int ceph_atomic_open(struct inode *dir, struct dentry *dentry,

req->r_dentry_drop = CEPH_CAP_FILE_SHARED | CEPH_CAP_AUTH_EXCL;
req->r_dentry_unless = CEPH_CAP_FILE_EXCL;
- if (as_ctx.pagelist) {
- req->r_pagelist = as_ctx.pagelist;
- as_ctx.pagelist = NULL;
- }
- if (try_async &&
- (req->r_dir_caps =
- try_prep_async_create(dir, dentry, &lo,
- &req->r_deleg_ino))) {
+
+ ceph_as_ctx_to_req(req, &as_ctx);
+
+ if (try_async && (req->r_dir_caps =
+ try_prep_async_create(dir, dentry, &lo, &req->r_deleg_ino))) {
+ struct ceph_vino vino = { .ino = req->r_deleg_ino,
+ .snap = CEPH_NOSNAP };
+
set_bit(CEPH_MDS_R_ASYNC, &req->r_req_flags);
req->r_args.open.flags |= cpu_to_le32(CEPH_O_EXCL);
req->r_callback = ceph_async_create_cb;
+
+ /* Hash inode before RPC */
+ new_inode = ceph_get_inode(dir->i_sb, vino, new_inode);
+ if (IS_ERR(new_inode)) {
+ err = PTR_ERR(new_inode);
+ new_inode = NULL;
+ goto out_req;
+ }
+ WARN_ON_ONCE(!(new_inode->i_state & I_NEW));
+
err = ceph_mdsc_submit_request(mdsc, dir, req);
if (!err) {
- err = ceph_finish_async_create(dir, dentry,
+ err = ceph_finish_async_create(dir, new_inode, dentry,
file, mode, req,
&as_ctx, &lo);
+ new_inode = NULL;
} else if (err == -EJUKEBOX) {
restore_deleg_ino(dir, req->r_deleg_ino);
ceph_mdsc_put_request(req);
+ discard_new_inode(new_inode);
+ ceph_release_acl_sec_ctx(&as_ctx);
+ memset(&as_ctx, 0, sizeof(as_ctx));
+ new_inode = NULL;
try_async = false;
ceph_put_string(rcu_dereference_raw(lo.pool_ns));
goto retry;
@@ -797,6 +808,8 @@ int ceph_atomic_open(struct inode *dir, struct dentry *dentry,
}

set_bit(CEPH_MDS_R_PARENT_LOCKED, &req->r_req_flags);
+ req->r_new_inode = new_inode;
+ new_inode = NULL;
err = ceph_mdsc_do_request(mdsc,
(flags & (O_CREAT|O_TRUNC)) ? dir : NULL,
req);
@@ -839,6 +852,7 @@ int ceph_atomic_open(struct inode *dir, struct dentry *dentry,
}
out_req:
ceph_mdsc_put_request(req);
+ iput(new_inode);
out_ctx:
ceph_release_acl_sec_ctx(&as_ctx);
dout("atomic_open result=%d\n", err);
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 64b341f5e7bc..7547b7de170f 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -52,17 +52,85 @@ static int ceph_set_ino_cb(struct inode *inode, void *data)
return 0;
}

-struct inode *ceph_get_inode(struct super_block *sb, struct ceph_vino vino)
+/**
+ * ceph_new_inode - allocate a new inode in advance of an expected create
+ * @dir: parent directory for new inode
+ * @dentry: dentry that may eventually point to new inode
+ * @mode: mode of new inode
+ * @as_ctx: pointer to inherited security context
+ *
+ * Allocate a new inode in advance of an operation to create a new inode.
+ * This allocates the inode and sets up the acl_sec_ctx with appropriate
+ * info for the new inode.
+ *
+ * Returns a pointer to the new inode or an ERR_PTR.
+ */
+struct inode *ceph_new_inode(struct inode *dir, struct dentry *dentry,
+ umode_t *mode, struct ceph_acl_sec_ctx *as_ctx)
+{
+ int err;
+ struct inode *inode;
+
+ inode = new_inode_pseudo(dir->i_sb);
+ if (!inode)
+ return ERR_PTR(-ENOMEM);
+
+ if (!S_ISLNK(*mode)) {
+ err = ceph_pre_init_acls(dir, mode, as_ctx);
+ if (err < 0)
+ goto out_err;
+ }
+
+ err = ceph_security_init_secctx(dentry, *mode, as_ctx);
+ if (err < 0)
+ goto out_err;
+
+ inode->i_state = 0;
+ inode->i_mode = *mode;
+ return inode;
+out_err:
+ iput(inode);
+ return ERR_PTR(err);
+}
+
+void ceph_as_ctx_to_req(struct ceph_mds_request *req, struct ceph_acl_sec_ctx *as_ctx)
+{
+ if (as_ctx->pagelist) {
+ req->r_pagelist = as_ctx->pagelist;
+ as_ctx->pagelist = NULL;
+ }
+}
+
+/**
+ * ceph_get_inode - find or create/hash a new inode
+ * @sb: superblock to search and allocate in
+ * @vino: vino to search for
+ * @newino: optional new inode to insert if one isn't found (may be NULL)
+ *
+ * Search for or insert a new inode into the hash for the given vino, and return a
+ * reference to it. If new is non-NULL, its reference is consumed.
+ */
+struct inode *ceph_get_inode(struct super_block *sb, struct ceph_vino vino, struct inode *newino)
{
struct inode *inode;

if (ceph_vino_is_reserved(vino))
return ERR_PTR(-EREMOTEIO);

- inode = iget5_locked(sb, (unsigned long)vino.ino, ceph_ino_compare,
- ceph_set_ino_cb, &vino);
- if (!inode)
+ if (newino) {
+ inode = inode_insert5(newino, (unsigned long)vino.ino, ceph_ino_compare,
+ ceph_set_ino_cb, &vino);
+ if (inode != newino)
+ iput(newino);
+ } else {
+ inode = iget5_locked(sb, (unsigned long)vino.ino, ceph_ino_compare,
+ ceph_set_ino_cb, &vino);
+ }
+
+ if (!inode) {
+ dout("No inode found for %llx.%llx\n", vino.ino, vino.snap);
return ERR_PTR(-ENOMEM);
+ }

dout("get_inode on %llu=%llx.%llx got %p new %d\n", ceph_present_inode(inode),
ceph_vinop(inode), inode, !!(inode->i_state & I_NEW));
@@ -78,7 +146,7 @@ struct inode *ceph_get_snapdir(struct inode *parent)
.ino = ceph_ino(parent),
.snap = CEPH_SNAPDIR,
};
- struct inode *inode = ceph_get_inode(parent->i_sb, vino);
+ struct inode *inode = ceph_get_inode(parent->i_sb, vino, NULL);
struct ceph_inode_info *ci = ceph_inode(inode);

if (IS_ERR(inode))
@@ -1550,7 +1618,7 @@ static int readdir_prepopulate_inodes_only(struct ceph_mds_request *req,
vino.ino = le64_to_cpu(rde->inode.in->ino);
vino.snap = le64_to_cpu(rde->inode.in->snapid);

- in = ceph_get_inode(req->r_dentry->d_sb, vino);
+ in = ceph_get_inode(req->r_dentry->d_sb, vino, NULL);
if (IS_ERR(in)) {
err = PTR_ERR(in);
dout("new_inode badness got %d\n", err);
@@ -1752,7 +1820,7 @@ int ceph_readdir_prepopulate(struct ceph_mds_request *req,
if (d_really_is_positive(dn)) {
in = d_inode(dn);
} else {
- in = ceph_get_inode(parent->d_sb, tvino);
+ in = ceph_get_inode(parent->d_sb, tvino, NULL);
if (IS_ERR(in)) {
dout("new_inode badness\n");
d_drop(dn);
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index b3b8f6299176..e64a8cefdb7f 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -868,6 +868,7 @@ void ceph_mdsc_release_request(struct kref *kref)
iput(req->r_parent);
}
iput(req->r_target_inode);
+ iput(req->r_new_inode);
if (req->r_dentry)
dput(req->r_dentry);
if (req->r_old_dentry)
@@ -3193,7 +3194,7 @@ static void handle_reply(struct ceph_mds_session *session, struct ceph_msg *msg)
.snap = le64_to_cpu(rinfo->targeti.in->snapid)
};

- in = ceph_get_inode(mdsc->fsc->sb, tvino);
+ in = ceph_get_inode(mdsc->fsc->sb, tvino, xchg(&req->r_new_inode, NULL));
if (IS_ERR(in)) {
err = PTR_ERR(in);
mutex_lock(&session->s_mutex);
diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
index 33497846e47e..2e945979a2e0 100644
--- a/fs/ceph/mds_client.h
+++ b/fs/ceph/mds_client.h
@@ -265,6 +265,7 @@ struct ceph_mds_request {

struct inode *r_parent; /* parent dir inode */
struct inode *r_target_inode; /* resulting inode */
+ struct inode *r_new_inode; /* new inode (for creates) */

#define CEPH_MDS_R_DIRECT_IS_HASH (1) /* r_direct_hash is valid */
#define CEPH_MDS_R_ABORTED (2) /* call was aborted */
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index 4999207a5466..f23e49f46440 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -975,6 +975,7 @@ static inline bool __ceph_have_pending_cap_snap(struct ceph_inode_info *ci)
/* inode.c */
struct ceph_mds_reply_info_in;
struct ceph_mds_reply_dirfrag;
+struct ceph_acl_sec_ctx;

extern const struct inode_operations ceph_file_iops;

@@ -982,8 +983,12 @@ extern struct inode *ceph_alloc_inode(struct super_block *sb);
extern void ceph_evict_inode(struct inode *inode);
extern void ceph_free_inode(struct inode *inode);

+struct inode *ceph_new_inode(struct inode *dir, struct dentry *dentry,
+ umode_t *mode, struct ceph_acl_sec_ctx *as_ctx);
+void ceph_as_ctx_to_req(struct ceph_mds_request *req, struct ceph_acl_sec_ctx *as_ctx);
+
extern struct inode *ceph_get_inode(struct super_block *sb,
- struct ceph_vino vino);
+ struct ceph_vino vino, struct inode *newino);
extern struct inode *ceph_get_snapdir(struct inode *parent);
extern int ceph_fill_file_size(struct inode *inode, int issued,
u32 truncate_seq, u64 truncate_size, u64 size);
--
2.35.1

2022-03-23 01:10:54

by Jeffrey Layton

[permalink] [raw]
Subject: [RFC PATCH v11 30/51] ceph: don't allow changing layout on encrypted files/directories

From: Luis Henriques <[email protected]>

Encryption is currently only supported on files/directories with layouts
where stripe_count=1. Forbid changing layouts when encryption is involved.

Signed-off-by: Luis Henriques <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
---
fs/ceph/ioctl.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/fs/ceph/ioctl.c b/fs/ceph/ioctl.c
index b9f0f4e460ab..9675ef3a6c47 100644
--- a/fs/ceph/ioctl.c
+++ b/fs/ceph/ioctl.c
@@ -294,6 +294,10 @@ static long ceph_set_encryption_policy(struct file *file, unsigned long arg)
struct inode *inode = file_inode(file);
struct ceph_inode_info *ci = ceph_inode(inode);

+ /* encrypted directories can't have striped layout */
+ if (ci->i_layout.stripe_count > 1)
+ return -EINVAL;
+
ret = vet_mds_for_fscrypt(file);
if (ret)
return ret;
--
2.35.1

2022-03-23 01:26:11

by Jeffrey Layton

[permalink] [raw]
Subject: [RFC PATCH v11 43/51] ceph: disable copy offload on encrypted inodes

If we have an encrypted inode, then the client will need to re-encrypt
the contents of the new object. Disable copy offload to or from
encrypted inodes.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/ceph/file.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 00e6a5bc37c8..ba17288b1db3 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -2522,6 +2522,10 @@ static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off,
return -EOPNOTSUPP;
}

+ /* Every encrypted inode gets its own key, so we can't offload them */
+ if (IS_ENCRYPTED(src_inode) || IS_ENCRYPTED(dst_inode))
+ return -EOPNOTSUPP;
+
if (len < src_ci->i_layout.object_size)
return -EOPNOTSUPP; /* no remote copy will be done */

--
2.35.1

2022-03-23 02:27:12

by Jeffrey Layton

[permalink] [raw]
Subject: [RFC PATCH v11 22/51] ceph: add fscrypt support to ceph_fill_trace

When we get a dentry in a trace, decrypt the name so we can properly
instantiate the dentry.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/ceph/inode.c | 30 ++++++++++++++++++++++++++++--
1 file changed, 28 insertions(+), 2 deletions(-)

diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 9043761bd9c8..9585162200f6 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -1403,8 +1403,15 @@ int ceph_fill_trace(struct super_block *sb, struct ceph_mds_request *req)
if (dir && req->r_op == CEPH_MDS_OP_LOOKUPNAME &&
test_bit(CEPH_MDS_R_PARENT_LOCKED, &req->r_req_flags) &&
!test_bit(CEPH_MDS_R_ABORTED, &req->r_req_flags)) {
+ bool is_nokey = false;
struct qstr dname;
struct dentry *dn, *parent;
+ struct fscrypt_str oname = FSTR_INIT(NULL, 0);
+ struct ceph_fname fname = { .dir = dir,
+ .name = rinfo->dname,
+ .ctext = rinfo->altname,
+ .name_len = rinfo->dname_len,
+ .ctext_len = rinfo->altname_len };

BUG_ON(!rinfo->head->is_target);
BUG_ON(req->r_dentry);
@@ -1412,8 +1419,20 @@ int ceph_fill_trace(struct super_block *sb, struct ceph_mds_request *req)
parent = d_find_any_alias(dir);
BUG_ON(!parent);

- dname.name = rinfo->dname;
- dname.len = rinfo->dname_len;
+ err = ceph_fname_alloc_buffer(dir, &oname);
+ if (err < 0) {
+ dput(parent);
+ goto done;
+ }
+
+ err = ceph_fname_to_usr(&fname, NULL, &oname, &is_nokey);
+ if (err < 0) {
+ dput(parent);
+ ceph_fname_free_buffer(dir, &oname);
+ goto done;
+ }
+ dname.name = oname.name;
+ dname.len = oname.len;
dname.hash = full_name_hash(parent, dname.name, dname.len);
tvino.ino = le64_to_cpu(rinfo->targeti.in->ino);
tvino.snap = le64_to_cpu(rinfo->targeti.in->snapid);
@@ -1428,9 +1447,15 @@ int ceph_fill_trace(struct super_block *sb, struct ceph_mds_request *req)
dname.len, dname.name, dn);
if (!dn) {
dput(parent);
+ ceph_fname_free_buffer(dir, &oname);
err = -ENOMEM;
goto done;
}
+ if (is_nokey) {
+ spin_lock(&dn->d_lock);
+ dn->d_flags |= DCACHE_NOKEY_NAME;
+ spin_unlock(&dn->d_lock);
+ }
err = 0;
} else if (d_really_is_positive(dn) &&
(ceph_ino(d_inode(dn)) != tvino.ino ||
@@ -1442,6 +1467,7 @@ int ceph_fill_trace(struct super_block *sb, struct ceph_mds_request *req)
dput(dn);
goto retry_lookup;
}
+ ceph_fname_free_buffer(dir, &oname);

req->r_dentry = dn;
dput(parent);
--
2.35.1

2022-03-23 02:27:50

by Jeffrey Layton

[permalink] [raw]
Subject: [RFC PATCH v11 01/51] vfs: export new_inode_pseudo

Ceph needs to be able to allocate inodes ahead of a create that might
involve a fscrypt-encrypted inode. new_inode() almost fits the bill,
but it puts the inode on the sb->s_inodes list and when we go to hash
it, that might be done again.

We could work around that by setting I_CREATING on the new inode, but
that causes ilookup5 to return -ESTALE if something tries to find it
before I_NEW is cleared. This is desirable behavior for most
filesystems, but doesn't work for ceph.

To work around all of this, just use new_inode_pseudo which doesn't add
it to the sb->s_inodes list.

Cc: Al Viro <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
---
fs/inode.c | 1 +
1 file changed, 1 insertion(+)

Al, can I get your Acked-by on this if you're OK with it? Alternately if
you just want to take it in via your tree, then that would be fine too.

diff --git a/fs/inode.c b/fs/inode.c
index 63324df6fa27..9ddf7d1a7359 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1025,6 +1025,7 @@ struct inode *new_inode_pseudo(struct super_block *sb)
}
return inode;
}
+EXPORT_SYMBOL(new_inode_pseudo);

/**
* new_inode - obtain an inode
--
2.35.1

2022-03-23 02:35:49

by Jeffrey Layton

[permalink] [raw]
Subject: [RFC PATCH v11 19/51] ceph: make d_revalidate call fscrypt revalidator for encrypted dentries

If we have a dentry which represents a no-key name, then we need to test
whether the parent directory's encryption key has since been added. Do
that before we test anything else about the dentry.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/ceph/dir.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
index 897f8618151b..caf2547c3fe1 100644
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -1709,6 +1709,10 @@ static int ceph_d_revalidate(struct dentry *dentry, unsigned int flags)
struct inode *dir, *inode;
struct ceph_mds_client *mdsc;

+ valid = fscrypt_d_revalidate(dentry, flags);
+ if (valid <= 0)
+ return valid;
+
if (flags & LOOKUP_RCU) {
parent = READ_ONCE(dentry->d_parent);
dir = d_inode_rcu(parent);
@@ -1721,8 +1725,8 @@ static int ceph_d_revalidate(struct dentry *dentry, unsigned int flags)
inode = d_inode(dentry);
}

- dout("d_revalidate %p '%pd' inode %p offset 0x%llx\n", dentry,
- dentry, inode, ceph_dentry(dentry)->offset);
+ dout("d_revalidate %p '%pd' inode %p offset 0x%llx nokey %d\n", dentry,
+ dentry, inode, ceph_dentry(dentry)->offset, !!(dentry->d_flags & DCACHE_NOKEY_NAME));

mdsc = ceph_sb_to_client(dir->i_sb)->mdsc;

--
2.35.1

2022-03-23 07:49:55

by Jeffrey Layton

[permalink] [raw]
Subject: [RFC PATCH v11 23/51] ceph: pass the request to parse_reply_info_readdir()

From: Xiubo Li <[email protected]>

Instead of passing just the r_reply_info to the readdir reply parser,
pass the request pointer directly instead. This will facilitate
implementing readdir on fscrypted directories.

Signed-off-by: Xiubo Li <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
---
fs/ceph/mds_client.c | 22 ++++++++++++----------
1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index a76166d93575..d11599bb85f6 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -406,9 +406,10 @@ static int parse_reply_info_trace(void **p, void *end,
* parse readdir results
*/
static int parse_reply_info_readdir(void **p, void *end,
- struct ceph_mds_reply_info_parsed *info,
- u64 features)
+ struct ceph_mds_request *req,
+ u64 features)
{
+ struct ceph_mds_reply_info_parsed *info = &req->r_reply_info;
u32 num, i = 0;
int err;

@@ -650,15 +651,16 @@ static int parse_reply_info_getvxattr(void **p, void *end,
* parse extra results
*/
static int parse_reply_info_extra(void **p, void *end,
- struct ceph_mds_reply_info_parsed *info,
+ struct ceph_mds_request *req,
u64 features, struct ceph_mds_session *s)
{
+ struct ceph_mds_reply_info_parsed *info = &req->r_reply_info;
u32 op = le32_to_cpu(info->head->op);

if (op == CEPH_MDS_OP_GETFILELOCK)
return parse_reply_info_filelock(p, end, info, features);
else if (op == CEPH_MDS_OP_READDIR || op == CEPH_MDS_OP_LSSNAP)
- return parse_reply_info_readdir(p, end, info, features);
+ return parse_reply_info_readdir(p, end, req, features);
else if (op == CEPH_MDS_OP_CREATE)
return parse_reply_info_create(p, end, info, features, s);
else if (op == CEPH_MDS_OP_GETVXATTR)
@@ -671,9 +673,9 @@ static int parse_reply_info_extra(void **p, void *end,
* parse entire mds reply
*/
static int parse_reply_info(struct ceph_mds_session *s, struct ceph_msg *msg,
- struct ceph_mds_reply_info_parsed *info,
- u64 features)
+ struct ceph_mds_request *req, u64 features)
{
+ struct ceph_mds_reply_info_parsed *info = &req->r_reply_info;
void *p, *end;
u32 len;
int err;
@@ -695,7 +697,7 @@ static int parse_reply_info(struct ceph_mds_session *s, struct ceph_msg *msg,
ceph_decode_32_safe(&p, end, len, bad);
if (len > 0) {
ceph_decode_need(&p, end, len, bad);
- err = parse_reply_info_extra(&p, p+len, info, features, s);
+ err = parse_reply_info_extra(&p, p+len, req, features, s);
if (err < 0)
goto out_bad;
}
@@ -3419,14 +3421,14 @@ static void handle_reply(struct ceph_mds_session *session, struct ceph_msg *msg)
}

dout("handle_reply tid %lld result %d\n", tid, result);
- rinfo = &req->r_reply_info;
if (test_bit(CEPHFS_FEATURE_REPLY_ENCODING, &session->s_features))
- err = parse_reply_info(session, msg, rinfo, (u64)-1);
+ err = parse_reply_info(session, msg, req, (u64)-1);
else
- err = parse_reply_info(session, msg, rinfo, session->s_con.peer_features);
+ err = parse_reply_info(session, msg, req, session->s_con.peer_features);
mutex_unlock(&mdsc->mutex);

/* Must find target inode outside of mutexes to avoid deadlocks */
+ rinfo = &req->r_reply_info;
if ((err >= 0) && rinfo->head->is_target) {
struct inode *in;
struct ceph_vino tvino = {
--
2.35.1

2022-03-23 08:52:53

by Jeffrey Layton

[permalink] [raw]
Subject: [RFC PATCH v11 34/51] ceph: get file size from fscrypt_file when present in inode traces

When we get an inode trace from the MDS, grab the fscrypt_file field if
the inode is encrypted, and use it to populate the i_size field instead
of the regular inode size field.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/ceph/inode.c | 38 +++++++++++++++++++++++++-------------
1 file changed, 25 insertions(+), 13 deletions(-)

diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 599e27dae8c8..b905c49fc7a9 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -989,6 +989,16 @@ int ceph_fill_inode(struct inode *inode, struct page *locked_page,
from_kgid(&init_user_ns, inode->i_gid));
ceph_decode_timespec64(&ci->i_btime, &iinfo->btime);
ceph_decode_timespec64(&ci->i_snap_btime, &iinfo->snap_btime);
+
+#ifdef CONFIG_FS_ENCRYPTION
+ if (iinfo->fscrypt_auth_len && !ci->fscrypt_auth) {
+ ci->fscrypt_auth_len = iinfo->fscrypt_auth_len;
+ ci->fscrypt_auth = iinfo->fscrypt_auth;
+ iinfo->fscrypt_auth = NULL;
+ iinfo->fscrypt_auth_len = 0;
+ inode_set_flags(inode, S_ENCRYPTED, S_ENCRYPTED);
+ }
+#endif
}

if ((new_version || (new_issued & CEPH_CAP_LINK_SHARED)) &&
@@ -1012,6 +1022,7 @@ int ceph_fill_inode(struct inode *inode, struct page *locked_page,

if (new_version ||
(new_issued & (CEPH_CAP_ANY_FILE_RD | CEPH_CAP_ANY_FILE_WR))) {
+ u64 size = le64_to_cpu(info->size);
s64 old_pool = ci->i_layout.pool_id;
struct ceph_string *old_ns;

@@ -1025,10 +1036,21 @@ int ceph_fill_inode(struct inode *inode, struct page *locked_page,

pool_ns = old_ns;

+ if (IS_ENCRYPTED(inode) && size && (iinfo->fscrypt_file_len == sizeof(__le64))) {
+ u64 fsize = __le64_to_cpu(*(__le64 *)iinfo->fscrypt_file);
+
+ if (size == round_up(fsize, CEPH_FSCRYPT_BLOCK_SIZE)) {
+ size = fsize;
+ } else {
+ pr_warn("fscrypt size mismatch: size=%llu fscrypt_file=%llu, discarding fscrypt_file size.\n",
+ info->size, size);
+ }
+ }
+
queue_trunc = ceph_fill_file_size(inode, issued,
- le32_to_cpu(info->truncate_seq),
- le64_to_cpu(info->truncate_size),
- le64_to_cpu(info->size));
+ le32_to_cpu(info->truncate_seq),
+ le64_to_cpu(info->truncate_size),
+ size);
/* only update max_size on auth cap */
if ((info->cap.flags & CEPH_CAP_FLAG_AUTH) &&
ci->i_max_size != le64_to_cpu(info->max_size)) {
@@ -1068,16 +1090,6 @@ int ceph_fill_inode(struct inode *inode, struct page *locked_page,
xattr_blob = NULL;
}

-#ifdef CONFIG_FS_ENCRYPTION
- if (iinfo->fscrypt_auth_len && !ci->fscrypt_auth) {
- ci->fscrypt_auth_len = iinfo->fscrypt_auth_len;
- ci->fscrypt_auth = iinfo->fscrypt_auth;
- iinfo->fscrypt_auth = NULL;
- iinfo->fscrypt_auth_len = 0;
- inode_set_flags(inode, S_ENCRYPTED, S_ENCRYPTED);
- }
-#endif
-
/* finally update i_version */
if (le64_to_cpu(info->version) > ci->i_version)
ci->i_version = le64_to_cpu(info->version);
--
2.35.1

2022-03-23 09:03:03

by Jeffrey Layton

[permalink] [raw]
Subject: [RFC PATCH v11 04/51] fscrypt: add fscrypt_context_for_new_inode

Most filesystems just call fscrypt_set_context on new inodes, which
usually causes a setxattr. That's a bit late for ceph, which can send
along a full set of attributes with the create request.

Doing so allows it to avoid race windows that where the new inode could
be seen by other clients without the crypto context attached. It also
avoids the separate round trip to the server.

Refactor the fscrypt code a bit to allow us to create a new crypto
context, attach it to the inode, and write it to the buffer, but without
calling set_context on it. ceph can later use this to marshal the
context into the attributes we send along with the create request.

Acked-by: Eric Biggers <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
---
fs/crypto/policy.c | 35 +++++++++++++++++++++++++++++------
include/linux/fscrypt.h | 1 +
2 files changed, 30 insertions(+), 6 deletions(-)

diff --git a/fs/crypto/policy.c b/fs/crypto/policy.c
index ed3d623724cd..ec861af96252 100644
--- a/fs/crypto/policy.c
+++ b/fs/crypto/policy.c
@@ -664,6 +664,32 @@ const union fscrypt_policy *fscrypt_policy_to_inherit(struct inode *dir)
return fscrypt_get_dummy_policy(dir->i_sb);
}

+/**
+ * fscrypt_context_for_new_inode() - create an encryption context for a new inode
+ * @ctx: where context should be written
+ * @inode: inode from which to fetch policy and nonce
+ *
+ * Given an in-core "prepared" (via fscrypt_prepare_new_inode) inode,
+ * generate a new context and write it to ctx. ctx _must_ be at least
+ * FSCRYPT_SET_CONTEXT_MAX_SIZE bytes.
+ *
+ * Return: size of the resulting context or a negative error code.
+ */
+int fscrypt_context_for_new_inode(void *ctx, struct inode *inode)
+{
+ struct fscrypt_info *ci = inode->i_crypt_info;
+
+ BUILD_BUG_ON(sizeof(union fscrypt_context) !=
+ FSCRYPT_SET_CONTEXT_MAX_SIZE);
+
+ /* fscrypt_prepare_new_inode() should have set up the key already. */
+ if (WARN_ON_ONCE(!ci))
+ return -ENOKEY;
+
+ return fscrypt_new_context(ctx, &ci->ci_policy, ci->ci_nonce);
+}
+EXPORT_SYMBOL_GPL(fscrypt_context_for_new_inode);
+
/**
* fscrypt_set_context() - Set the fscrypt context of a new inode
* @inode: a new inode
@@ -680,12 +706,9 @@ int fscrypt_set_context(struct inode *inode, void *fs_data)
union fscrypt_context ctx;
int ctxsize;

- /* fscrypt_prepare_new_inode() should have set up the key already. */
- if (WARN_ON_ONCE(!ci))
- return -ENOKEY;
-
- BUILD_BUG_ON(sizeof(ctx) != FSCRYPT_SET_CONTEXT_MAX_SIZE);
- ctxsize = fscrypt_new_context(&ctx, &ci->ci_policy, ci->ci_nonce);
+ ctxsize = fscrypt_context_for_new_inode(&ctx, inode);
+ if (ctxsize < 0)
+ return ctxsize;

/*
* This may be the first time the inode number is available, so do any
diff --git a/include/linux/fscrypt.h b/include/linux/fscrypt.h
index c90e176b5843..530433098f82 100644
--- a/include/linux/fscrypt.h
+++ b/include/linux/fscrypt.h
@@ -276,6 +276,7 @@ int fscrypt_ioctl_get_policy(struct file *filp, void __user *arg);
int fscrypt_ioctl_get_policy_ex(struct file *filp, void __user *arg);
int fscrypt_ioctl_get_nonce(struct file *filp, void __user *arg);
int fscrypt_has_permitted_context(struct inode *parent, struct inode *child);
+int fscrypt_context_for_new_inode(void *ctx, struct inode *inode);
int fscrypt_set_context(struct inode *inode, void *fs_data);

struct fscrypt_dummy_policy {
--
2.35.1

2022-03-23 09:14:51

by Jeffrey Layton

[permalink] [raw]
Subject: [RFC PATCH v11 18/51] ceph: properly set DCACHE_NOKEY_NAME flag in lookup

This is required so that we know to invalidate these dentries when the
directory is unlocked.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/ceph/dir.c | 11 +++++++++++
1 file changed, 11 insertions(+)

diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
index 8cc7a49ee508..897f8618151b 100644
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -760,6 +760,17 @@ static struct dentry *ceph_lookup(struct inode *dir, struct dentry *dentry,
if (dentry->d_name.len > NAME_MAX)
return ERR_PTR(-ENAMETOOLONG);

+ if (IS_ENCRYPTED(dir)) {
+ err = __fscrypt_prepare_readdir(dir);
+ if (err)
+ return ERR_PTR(err);
+ if (!fscrypt_has_encryption_key(dir)) {
+ spin_lock(&dentry->d_lock);
+ dentry->d_flags |= DCACHE_NOKEY_NAME;
+ spin_unlock(&dentry->d_lock);
+ }
+ }
+
/* can we conclude ENOENT locally? */
if (d_really_is_negative(dentry)) {
struct ceph_inode_info *ci = ceph_inode(dir);
--
2.35.1

2022-03-23 09:20:16

by Jeffrey Layton

[permalink] [raw]
Subject: [RFC PATCH v11 42/51] ceph: disable fallocate for encrypted inodes

...hopefully, just for now.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/ceph/file.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 1985e3102533..00e6a5bc37c8 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -2203,6 +2203,9 @@ static long ceph_fallocate(struct file *file, int mode,
if (!S_ISREG(inode->i_mode))
return -EOPNOTSUPP;

+ if (IS_ENCRYPTED(inode))
+ return -EOPNOTSUPP;
+
prealloc_cf = ceph_alloc_cap_flush();
if (!prealloc_cf)
return -ENOMEM;
--
2.35.1

2022-03-23 12:18:02

by Jeffrey Layton

[permalink] [raw]
Subject: [RFC PATCH v11 41/51] libceph: allow ceph_osdc_new_request to accept a multi-op read

Currently we have some special-casing for multi-op writes, but in the
case of a read, we can't really handle it. All of the current multi-op
callers call it with CEPH_OSD_FLAG_WRITE set.

Have ceph_osdc_new_request check for CEPH_OSD_FLAG_READ and if it's set,
allocate multiple reply ops instead of multiple request ops. If neither
flag is set, return -EINVAL.

Signed-off-by: Jeff Layton <[email protected]>
---
net/ceph/osd_client.c | 27 +++++++++++++++++++++------
1 file changed, 21 insertions(+), 6 deletions(-)

diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index b67c56ddade7..5df1450cd30f 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -1130,15 +1130,30 @@ struct ceph_osd_request *ceph_osdc_new_request(struct ceph_osd_client *osdc,
if (flags & CEPH_OSD_FLAG_WRITE)
req->r_data_offset = off;

- if (num_ops > 1)
+ if (num_ops > 1) {
+ int num_req_ops, num_rep_ops;
+
/*
- * This is a special case for ceph_writepages_start(), but it
- * also covers ceph_uninline_data(). If more multi-op request
- * use cases emerge, we will need a separate helper.
+ * If this is a multi-op write request, assume that we'll need
+ * request ops. If it's a multi-op read then assume we'll need
+ * reply ops. Anything else and call it -EINVAL.
*/
- r = __ceph_osdc_alloc_messages(req, GFP_NOFS, num_ops, 0);
- else
+ if (flags & CEPH_OSD_FLAG_WRITE) {
+ num_req_ops = num_ops;
+ num_rep_ops = 0;
+ } else if (flags & CEPH_OSD_FLAG_READ) {
+ num_req_ops = 0;
+ num_rep_ops = num_ops;
+ } else {
+ r = -EINVAL;
+ goto fail;
+ }
+
+ r = __ceph_osdc_alloc_messages(req, GFP_NOFS, num_req_ops,
+ num_rep_ops);
+ } else {
r = ceph_osdc_alloc_messages(req, GFP_NOFS);
+ }
if (r)
goto fail;

--
2.35.1

2022-03-23 14:19:51

by Jeffrey Layton

[permalink] [raw]
Subject: [RFC PATCH v11 02/51] fscrypt: export fscrypt_base64url_encode and fscrypt_base64url_decode

Ceph is going to add fscrypt support, but we still want encrypted
filenames to be composed of printable characters, so we can maintain
compatibility with clients that don't support fscrypt.

We could just adopt fscrypt's current nokey name format, but that is
subject to change in the future, and it also contains dirhash fields
that we don't need for cephfs. Because of this, we're going to concoct
our own scheme for encoding encrypted filenames. It's very similar to
fscrypt's current scheme, but doesn't bother with the dirhash fields.

The ceph encoding scheme will use base64 encoding as well, and we also
want it to avoid characters that are illegal in filenames. Export the
fscrypt base64 encoding/decoding routines so we can use them in ceph's
fscrypt implementation.

Acked-by: Eric Biggers <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
---
fs/crypto/fname.c | 8 ++++----
include/linux/fscrypt.h | 5 +++++
2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/fs/crypto/fname.c b/fs/crypto/fname.c
index a9be4bc74a94..1e4233c95005 100644
--- a/fs/crypto/fname.c
+++ b/fs/crypto/fname.c
@@ -182,8 +182,6 @@ static int fname_decrypt(const struct inode *inode,
static const char base64url_table[65] =
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_";

-#define FSCRYPT_BASE64URL_CHARS(nbytes) DIV_ROUND_UP((nbytes) * 4, 3)
-
/**
* fscrypt_base64url_encode() - base64url-encode some binary data
* @src: the binary data to encode
@@ -198,7 +196,7 @@ static const char base64url_table[65] =
* Return: the length of the resulting base64url-encoded string in bytes.
* This will be equal to FSCRYPT_BASE64URL_CHARS(srclen).
*/
-static int fscrypt_base64url_encode(const u8 *src, int srclen, char *dst)
+int fscrypt_base64url_encode(const u8 *src, int srclen, char *dst)
{
u32 ac = 0;
int bits = 0;
@@ -217,6 +215,7 @@ static int fscrypt_base64url_encode(const u8 *src, int srclen, char *dst)
*cp++ = base64url_table[(ac << (6 - bits)) & 0x3f];
return cp - dst;
}
+EXPORT_SYMBOL_GPL(fscrypt_base64url_encode);

/**
* fscrypt_base64url_decode() - base64url-decode a string
@@ -233,7 +232,7 @@ static int fscrypt_base64url_encode(const u8 *src, int srclen, char *dst)
* Return: the length of the resulting decoded binary data in bytes,
* or -1 if the string isn't a valid base64url string.
*/
-static int fscrypt_base64url_decode(const char *src, int srclen, u8 *dst)
+int fscrypt_base64url_decode(const char *src, int srclen, u8 *dst)
{
u32 ac = 0;
int bits = 0;
@@ -256,6 +255,7 @@ static int fscrypt_base64url_decode(const char *src, int srclen, u8 *dst)
return -1;
return bp - dst;
}
+EXPORT_SYMBOL_GPL(fscrypt_base64url_decode);

bool fscrypt_fname_encrypted_size(const union fscrypt_policy *policy,
u32 orig_len, u32 max_len,
diff --git a/include/linux/fscrypt.h b/include/linux/fscrypt.h
index 91ea9477e9bd..671181d196a8 100644
--- a/include/linux/fscrypt.h
+++ b/include/linux/fscrypt.h
@@ -46,6 +46,9 @@ struct fscrypt_name {
/* Maximum value for the third parameter of fscrypt_operations.set_context(). */
#define FSCRYPT_SET_CONTEXT_MAX_SIZE 40

+/* len of resulting string (sans NUL terminator) after base64 encoding nbytes */
+#define FSCRYPT_BASE64URL_CHARS(nbytes) DIV_ROUND_UP((nbytes) * 4, 3)
+
#ifdef CONFIG_FS_ENCRYPTION

/*
@@ -305,6 +308,8 @@ void fscrypt_free_inode(struct inode *inode);
int fscrypt_drop_inode(struct inode *inode);

/* fname.c */
+int fscrypt_base64url_encode(const u8 *src, int len, char *dst);
+int fscrypt_base64url_decode(const char *src, int len, u8 *dst);
int fscrypt_setup_filename(struct inode *inode, const struct qstr *iname,
int lookup, struct fscrypt_name *fname);

--
2.35.1

2022-03-23 17:05:39

by Jeffrey Layton

[permalink] [raw]
Subject: [RFC PATCH v11 24/51] ceph: add ceph_encode_encrypted_dname() helper

From: Xiubo Li <[email protected]>

Add a new helper that basically calls ceph_encode_encrypted_fname, but
with a qstr pointer instead of a dentry pointer. This will make it
simpler to decrypt names in a readdir reply, before we have a dentry.

Signed-off-by: Xiubo Li <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
---
fs/ceph/crypto.c | 11 ++++++++---
fs/ceph/crypto.h | 8 ++++++++
2 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c
index e56017d66354..420a2cc1a8e5 100644
--- a/fs/ceph/crypto.c
+++ b/fs/ceph/crypto.c
@@ -128,7 +128,7 @@ void ceph_fscrypt_as_ctx_to_req(struct ceph_mds_request *req, struct ceph_acl_se
swap(req->r_fscrypt_auth, as->fscrypt_auth);
}

-int ceph_encode_encrypted_fname(const struct inode *parent, struct dentry *dentry, char *buf)
+int ceph_encode_encrypted_dname(const struct inode *parent, struct qstr *d_name, char *buf)
{
u32 len;
int elen;
@@ -143,7 +143,7 @@ int ceph_encode_encrypted_fname(const struct inode *parent, struct dentry *dentr
*
* See: fscrypt_setup_filename
*/
- if (!fscrypt_fname_encrypted_size(parent, dentry->d_name.len, NAME_MAX, &len))
+ if (!fscrypt_fname_encrypted_size(parent, d_name->len, NAME_MAX, &len))
return -ENAMETOOLONG;

/* Allocate a buffer appropriate to hold the result */
@@ -151,7 +151,7 @@ int ceph_encode_encrypted_fname(const struct inode *parent, struct dentry *dentr
if (!cryptbuf)
return -ENOMEM;

- ret = fscrypt_fname_encrypt(parent, &dentry->d_name, cryptbuf, len);
+ ret = fscrypt_fname_encrypt(parent, d_name, cryptbuf, len);
if (ret) {
kfree(cryptbuf);
return ret;
@@ -175,6 +175,11 @@ int ceph_encode_encrypted_fname(const struct inode *parent, struct dentry *dentr
return elen;
}

+int ceph_encode_encrypted_fname(const struct inode *parent, struct dentry *dentry, char *buf)
+{
+ return ceph_encode_encrypted_dname(parent, &dentry->d_name, buf);
+}
+
/**
* ceph_fname_to_usr - convert a filename for userland presentation
* @fname: ceph_fname to be converted
diff --git a/fs/ceph/crypto.h b/fs/ceph/crypto.h
index 7e56aded5124..e54150260eba 100644
--- a/fs/ceph/crypto.h
+++ b/fs/ceph/crypto.h
@@ -67,6 +67,7 @@ void ceph_fscrypt_free_dummy_policy(struct ceph_fs_client *fsc);
int ceph_fscrypt_prepare_context(struct inode *dir, struct inode *inode,
struct ceph_acl_sec_ctx *as);
void ceph_fscrypt_as_ctx_to_req(struct ceph_mds_request *req, struct ceph_acl_sec_ctx *as);
+int ceph_encode_encrypted_dname(const struct inode *parent, struct qstr *d_name, char *buf);
int ceph_encode_encrypted_fname(const struct inode *parent, struct dentry *dentry, char *buf);

static inline int ceph_fname_alloc_buffer(struct inode *parent, struct fscrypt_str *fname)
@@ -108,6 +109,13 @@ static inline void ceph_fscrypt_as_ctx_to_req(struct ceph_mds_request *req,
{
}

+static inline int ceph_encode_encrypted_dname(const struct inode *parent,
+ struct qstr *d_name, char *buf)
+{
+ memcpy(buf, d_name->name, d_name->len);
+ return d_name->len;
+}
+
static inline int ceph_encode_encrypted_fname(const struct inode *parent,
struct dentry *dentry, char *buf)
{
--
2.35.1

2022-03-24 01:09:22

by Jeffrey Layton

[permalink] [raw]
Subject: [RFC PATCH v11 08/51] ceph: add support for fscrypt_auth/fscrypt_file to cap messages

Add support for new version 12 cap messages that carry the new
fscrypt_auth and fscrypt_file fields from the inode.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/ceph/caps.c | 76 +++++++++++++++++++++++++++++++++++++++++---------
1 file changed, 63 insertions(+), 13 deletions(-)

diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index 7d8ef67a1032..b0b7688331b4 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -13,6 +13,7 @@
#include "super.h"
#include "mds_client.h"
#include "cache.h"
+#include "crypto.h"
#include <linux/ceph/decode.h>
#include <linux/ceph/messenger.h>

@@ -1214,15 +1215,12 @@ struct cap_msg_args {
umode_t mode;
bool inline_data;
bool wake;
+ u32 fscrypt_auth_len;
+ u32 fscrypt_file_len;
+ u8 fscrypt_auth[sizeof(struct ceph_fscrypt_auth)]; // for context
+ u8 fscrypt_file[sizeof(u64)]; // for size
};

-/*
- * cap struct size + flock buffer size + inline version + inline data size +
- * osd_epoch_barrier + oldest_flush_tid
- */
-#define CAP_MSG_SIZE (sizeof(struct ceph_mds_caps) + \
- 4 + 8 + 4 + 4 + 8 + 4 + 4 + 4 + 8 + 8 + 4)
-
/* Marshal up the cap msg to the MDS */
static void encode_cap_msg(struct ceph_msg *msg, struct cap_msg_args *arg)
{
@@ -1238,7 +1236,7 @@ static void encode_cap_msg(struct ceph_msg *msg, struct cap_msg_args *arg)
arg->size, arg->max_size, arg->xattr_version,
arg->xattr_buf ? (int)arg->xattr_buf->vec.iov_len : 0);

- msg->hdr.version = cpu_to_le16(10);
+ msg->hdr.version = cpu_to_le16(12);
msg->hdr.tid = cpu_to_le64(arg->flush_tid);

fc = msg->front.iov_base;
@@ -1309,6 +1307,21 @@ static void encode_cap_msg(struct ceph_msg *msg, struct cap_msg_args *arg)

/* Advisory flags (version 10) */
ceph_encode_32(&p, arg->flags);
+
+ /* dirstats (version 11) - these are r/o on the client */
+ ceph_encode_64(&p, 0);
+ ceph_encode_64(&p, 0);
+
+#if IS_ENABLED(CONFIG_FS_ENCRYPTION)
+ /* fscrypt_auth and fscrypt_file (version 12) */
+ ceph_encode_32(&p, arg->fscrypt_auth_len);
+ ceph_encode_copy(&p, arg->fscrypt_auth, arg->fscrypt_auth_len);
+ ceph_encode_32(&p, arg->fscrypt_file_len);
+ ceph_encode_copy(&p, arg->fscrypt_file, arg->fscrypt_file_len);
+#else /* CONFIG_FS_ENCRYPTION */
+ ceph_encode_32(&p, 0);
+ ceph_encode_32(&p, 0);
+#endif /* CONFIG_FS_ENCRYPTION */
}

/*
@@ -1430,8 +1443,37 @@ static void __prep_cap(struct cap_msg_args *arg, struct ceph_cap *cap,
}
}
arg->flags = flags;
+#if IS_ENABLED(CONFIG_FS_ENCRYPTION)
+ if (ci->fscrypt_auth_len &&
+ WARN_ON_ONCE(ci->fscrypt_auth_len != sizeof(struct ceph_fscrypt_auth))) {
+ /* Don't set this if it isn't right size */
+ arg->fscrypt_auth_len = 0;
+ } else {
+ arg->fscrypt_auth_len = ci->fscrypt_auth_len;
+ memcpy(arg->fscrypt_auth, ci->fscrypt_auth,
+ min_t(size_t, ci->fscrypt_auth_len, sizeof(arg->fscrypt_auth)));
+ }
+ /* FIXME: use this to track "real" size */
+ arg->fscrypt_file_len = 0;
+#endif /* CONFIG_FS_ENCRYPTION */
}

+#define CAP_MSG_FIXED_FIELDS (sizeof(struct ceph_mds_caps) + \
+ 4 + 8 + 4 + 4 + 8 + 4 + 4 + 4 + 8 + 8 + 4 + 8 + 8 + 4 + 4)
+
+#if IS_ENABLED(CONFIG_FS_ENCRYPTION)
+static inline int cap_msg_size(struct cap_msg_args *arg)
+{
+ return CAP_MSG_FIXED_FIELDS + arg->fscrypt_auth_len +
+ arg->fscrypt_file_len;
+}
+#else
+static inline int cap_msg_size(struct cap_msg_args *arg)
+{
+ return CAP_MSG_FIXED_FIELDS;
+}
+#endif /* CONFIG_FS_ENCRYPTION */
+
/*
* Send a cap msg on the given inode.
*
@@ -1442,7 +1484,7 @@ static void __send_cap(struct cap_msg_args *arg, struct ceph_inode_info *ci)
struct ceph_msg *msg;
struct inode *inode = &ci->vfs_inode;

- msg = ceph_msg_new(CEPH_MSG_CLIENT_CAPS, CAP_MSG_SIZE, GFP_NOFS, false);
+ msg = ceph_msg_new(CEPH_MSG_CLIENT_CAPS, cap_msg_size(arg), GFP_NOFS, false);
if (!msg) {
pr_err("error allocating cap msg: ino (%llx.%llx) flushing %s tid %llu, requeuing cap.\n",
ceph_vinop(inode), ceph_cap_string(arg->dirty),
@@ -1468,10 +1510,6 @@ static inline int __send_flush_snap(struct inode *inode,
struct cap_msg_args arg;
struct ceph_msg *msg;

- msg = ceph_msg_new(CEPH_MSG_CLIENT_CAPS, CAP_MSG_SIZE, GFP_NOFS, false);
- if (!msg)
- return -ENOMEM;
-
arg.session = session;
arg.ino = ceph_vino(inode).ino;
arg.cid = 0;
@@ -1509,6 +1547,18 @@ static inline int __send_flush_snap(struct inode *inode,
arg.flags = 0;
arg.wake = false;

+ /*
+ * No fscrypt_auth changes from a capsnap. It will need
+ * to update fscrypt_file on size changes (TODO).
+ */
+ arg.fscrypt_auth_len = 0;
+ arg.fscrypt_file_len = 0;
+
+ msg = ceph_msg_new(CEPH_MSG_CLIENT_CAPS, cap_msg_size(&arg),
+ GFP_NOFS, false);
+ if (!msg)
+ return -ENOMEM;
+
encode_cap_msg(msg, &arg);
ceph_con_send(&arg.session->s_con, msg);
return 0;
--
2.35.1

2022-03-24 02:03:02

by Jeffrey Layton

[permalink] [raw]
Subject: [RFC PATCH v11 20/51] ceph: add helpers for converting names for userland presentation

Define a new ceph_fname struct that we can use to carry information
about encrypted dentry names. Add helpers for working with these
objects, including ceph_fname_to_usr which formats an encrypted filename
for userland presentation.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/ceph/crypto.c | 76 ++++++++++++++++++++++++++++++++++++++++++++++++
fs/ceph/crypto.h | 41 ++++++++++++++++++++++++++
2 files changed, 117 insertions(+)

diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c
index 3e3b12cd3413..86de8483032f 100644
--- a/fs/ceph/crypto.c
+++ b/fs/ceph/crypto.c
@@ -174,3 +174,79 @@ int ceph_encode_encrypted_fname(const struct inode *parent, struct dentry *dentr
dout("base64-encoded ciphertext name = %.*s\n", elen, buf);
return elen;
}
+
+/**
+ * ceph_fname_to_usr - convert a filename for userland presentation
+ * @fname: ceph_fname to be converted
+ * @tname: temporary name buffer to use for conversion (may be NULL)
+ * @oname: where converted name should be placed
+ * @is_nokey: set to true if key wasn't available during conversion (may be NULL)
+ *
+ * Given a filename (usually from the MDS), format it for presentation to
+ * userland. If @parent is not encrypted, just pass it back as-is.
+ *
+ * Otherwise, base64 decode the string, and then ask fscrypt to format it
+ * for userland presentation.
+ *
+ * Returns 0 on success or negative error code on error.
+ */
+int ceph_fname_to_usr(const struct ceph_fname *fname, struct fscrypt_str *tname,
+ struct fscrypt_str *oname, bool *is_nokey)
+{
+ int ret;
+ struct fscrypt_str _tname = FSTR_INIT(NULL, 0);
+ struct fscrypt_str iname;
+
+ if (!IS_ENCRYPTED(fname->dir)) {
+ oname->name = fname->name;
+ oname->len = fname->name_len;
+ return 0;
+ }
+
+ /* Sanity check that the resulting name will fit in the buffer */
+ if (fname->name_len > FSCRYPT_BASE64URL_CHARS(NAME_MAX))
+ return -EIO;
+
+ ret = __fscrypt_prepare_readdir(fname->dir);
+ if (ret)
+ return ret;
+
+ /*
+ * Use the raw dentry name as sent by the MDS instead of
+ * generating a nokey name via fscrypt.
+ */
+ if (!fscrypt_has_encryption_key(fname->dir)) {
+ memcpy(oname->name, fname->name, fname->name_len);
+ oname->len = fname->name_len;
+ if (is_nokey)
+ *is_nokey = true;
+ return 0;
+ }
+
+ if (fname->ctext_len == 0) {
+ int declen;
+
+ if (!tname) {
+ ret = fscrypt_fname_alloc_buffer(NAME_MAX, &_tname);
+ if (ret)
+ return ret;
+ tname = &_tname;
+ }
+
+ declen = fscrypt_base64url_decode(fname->name, fname->name_len, tname->name);
+ if (declen <= 0) {
+ ret = -EIO;
+ goto out;
+ }
+ iname.name = tname->name;
+ iname.len = declen;
+ } else {
+ iname.name = fname->ctext;
+ iname.len = fname->ctext_len;
+ }
+
+ ret = fscrypt_fname_disk_to_usr(fname->dir, 0, 0, &iname, oname);
+out:
+ fscrypt_fname_free_buffer(&_tname);
+ return ret;
+}
diff --git a/fs/ceph/crypto.h b/fs/ceph/crypto.h
index 9a66a29d5c8b..7e56aded5124 100644
--- a/fs/ceph/crypto.h
+++ b/fs/ceph/crypto.h
@@ -13,6 +13,14 @@ struct ceph_fs_client;
struct ceph_acl_sec_ctx;
struct ceph_mds_request;

+struct ceph_fname {
+ struct inode *dir;
+ char *name; // b64 encoded, possibly hashed
+ unsigned char *ctext; // binary crypttext (if any)
+ u32 name_len; // length of name buffer
+ u32 ctext_len; // length of crypttext
+};
+
struct ceph_fscrypt_auth {
__le32 cfa_version;
__le32 cfa_blob_len;
@@ -61,6 +69,22 @@ int ceph_fscrypt_prepare_context(struct inode *dir, struct inode *inode,
void ceph_fscrypt_as_ctx_to_req(struct ceph_mds_request *req, struct ceph_acl_sec_ctx *as);
int ceph_encode_encrypted_fname(const struct inode *parent, struct dentry *dentry, char *buf);

+static inline int ceph_fname_alloc_buffer(struct inode *parent, struct fscrypt_str *fname)
+{
+ if (!IS_ENCRYPTED(parent))
+ return 0;
+ return fscrypt_fname_alloc_buffer(NAME_MAX, fname);
+}
+
+static inline void ceph_fname_free_buffer(struct inode *parent, struct fscrypt_str *fname)
+{
+ if (IS_ENCRYPTED(parent))
+ fscrypt_fname_free_buffer(fname);
+}
+
+int ceph_fname_to_usr(const struct ceph_fname *fname, struct fscrypt_str *tname,
+ struct fscrypt_str *oname, bool *is_nokey);
+
#else /* CONFIG_FS_ENCRYPTION */

static inline void ceph_fscrypt_set_ops(struct super_block *sb)
@@ -89,6 +113,23 @@ static inline int ceph_encode_encrypted_fname(const struct inode *parent,
{
return -EOPNOTSUPP;
}
+
+static inline int ceph_fname_alloc_buffer(struct inode *parent, struct fscrypt_str *fname)
+{
+ return 0;
+}
+
+static inline void ceph_fname_free_buffer(struct inode *parent, struct fscrypt_str *fname)
+{
+}
+
+static inline int ceph_fname_to_usr(const struct ceph_fname *fname, struct fscrypt_str *tname,
+ struct fscrypt_str *oname, bool *is_nokey)
+{
+ oname->name = fname->name;
+ oname->len = fname->name_len;
+ return 0;
+}
#endif /* CONFIG_FS_ENCRYPTION */

#endif
--
2.35.1

2022-03-24 14:29:09

by Jeffrey Layton

[permalink] [raw]
Subject: [RFC PATCH v11 06/51] ceph: crypto context handling for ceph

Have set_context do a setattr that sets the fscrypt_auth value, and
get_context just return the contents of that field.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/ceph/Makefile | 1 +
fs/ceph/crypto.c | 76 ++++++++++++++++++++++++++++++++++++++++++++++++
fs/ceph/crypto.h | 29 ++++++++++++++++++
fs/ceph/inode.c | 3 ++
fs/ceph/super.c | 3 ++
5 files changed, 112 insertions(+)
create mode 100644 fs/ceph/crypto.c
create mode 100644 fs/ceph/crypto.h

diff --git a/fs/ceph/Makefile b/fs/ceph/Makefile
index 50c635dc7f71..1f77ca04c426 100644
--- a/fs/ceph/Makefile
+++ b/fs/ceph/Makefile
@@ -12,3 +12,4 @@ ceph-y := super.o inode.o dir.o file.o locks.o addr.o ioctl.o \

ceph-$(CONFIG_CEPH_FSCACHE) += cache.o
ceph-$(CONFIG_CEPH_FS_POSIX_ACL) += acl.o
+ceph-$(CONFIG_FS_ENCRYPTION) += crypto.o
diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c
new file mode 100644
index 000000000000..a513ff373b13
--- /dev/null
+++ b/fs/ceph/crypto.c
@@ -0,0 +1,76 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/ceph/ceph_debug.h>
+#include <linux/xattr.h>
+#include <linux/fscrypt.h>
+
+#include "super.h"
+#include "crypto.h"
+
+static int ceph_crypt_get_context(struct inode *inode, void *ctx, size_t len)
+{
+ struct ceph_inode_info *ci = ceph_inode(inode);
+ struct ceph_fscrypt_auth *cfa = (struct ceph_fscrypt_auth *)ci->fscrypt_auth;
+ u32 ctxlen;
+
+ /* Non existent or too short? */
+ if (!cfa || (ci->fscrypt_auth_len < (offsetof(struct ceph_fscrypt_auth, cfa_blob) + 1)))
+ return -ENOBUFS;
+
+ /* Some format we don't recognize? */
+ if (le32_to_cpu(cfa->cfa_version) != CEPH_FSCRYPT_AUTH_VERSION)
+ return -ENOBUFS;
+
+ ctxlen = le32_to_cpu(cfa->cfa_blob_len);
+ if (len < ctxlen)
+ return -ERANGE;
+
+ memcpy(ctx, cfa->cfa_blob, ctxlen);
+ return ctxlen;
+}
+
+static int ceph_crypt_set_context(struct inode *inode, const void *ctx, size_t len, void *fs_data)
+{
+ int ret;
+ struct iattr attr = { };
+ struct ceph_iattr cia = { };
+ struct ceph_fscrypt_auth *cfa;
+
+ WARN_ON_ONCE(fs_data);
+
+ if (len > FSCRYPT_SET_CONTEXT_MAX_SIZE)
+ return -EINVAL;
+
+ cfa = kzalloc(sizeof(*cfa), GFP_KERNEL);
+ if (!cfa)
+ return -ENOMEM;
+
+ cfa->cfa_version = cpu_to_le32(CEPH_FSCRYPT_AUTH_VERSION);
+ cfa->cfa_blob_len = cpu_to_le32(len);
+ memcpy(cfa->cfa_blob, ctx, len);
+
+ cia.fscrypt_auth = cfa;
+
+ ret = __ceph_setattr(inode, &attr, &cia);
+ if (ret == 0)
+ inode_set_flags(inode, S_ENCRYPTED, S_ENCRYPTED);
+ kfree(cia.fscrypt_auth);
+ return ret;
+}
+
+static bool ceph_crypt_empty_dir(struct inode *inode)
+{
+ struct ceph_inode_info *ci = ceph_inode(inode);
+
+ return ci->i_rsubdirs + ci->i_rfiles == 1;
+}
+
+static struct fscrypt_operations ceph_fscrypt_ops = {
+ .get_context = ceph_crypt_get_context,
+ .set_context = ceph_crypt_set_context,
+ .empty_dir = ceph_crypt_empty_dir,
+};
+
+void ceph_fscrypt_set_ops(struct super_block *sb)
+{
+ fscrypt_set_ops(sb, &ceph_fscrypt_ops);
+}
diff --git a/fs/ceph/crypto.h b/fs/ceph/crypto.h
new file mode 100644
index 000000000000..6c3831c57c8d
--- /dev/null
+++ b/fs/ceph/crypto.h
@@ -0,0 +1,29 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Ceph fscrypt functionality
+ */
+
+#ifndef _CEPH_CRYPTO_H
+#define _CEPH_CRYPTO_H
+
+#include <linux/fscrypt.h>
+
+struct ceph_fscrypt_auth {
+ __le32 cfa_version;
+ __le32 cfa_blob_len;
+ u8 cfa_blob[FSCRYPT_SET_CONTEXT_MAX_SIZE];
+} __packed;
+
+#ifdef CONFIG_FS_ENCRYPTION
+#define CEPH_FSCRYPT_AUTH_VERSION 1
+void ceph_fscrypt_set_ops(struct super_block *sb);
+
+#else /* CONFIG_FS_ENCRYPTION */
+
+static inline void ceph_fscrypt_set_ops(struct super_block *sb)
+{
+}
+
+#endif /* CONFIG_FS_ENCRYPTION */
+
+#endif
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 7547b7de170f..2e0e321a58cb 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -14,10 +14,12 @@
#include <linux/random.h>
#include <linux/sort.h>
#include <linux/iversion.h>
+#include <linux/fscrypt.h>

#include "super.h"
#include "mds_client.h"
#include "cache.h"
+#include "crypto.h"
#include <linux/ceph/decode.h>

/*
@@ -644,6 +646,7 @@ void ceph_evict_inode(struct inode *inode)
clear_inode(inode);

ceph_fscache_unregister_inode_cookie(ci);
+ fscrypt_put_encryption_info(inode);

__ceph_remove_caps(ci);

diff --git a/fs/ceph/super.c b/fs/ceph/super.c
index a859921bbe96..52ff78f0462a 100644
--- a/fs/ceph/super.c
+++ b/fs/ceph/super.c
@@ -20,6 +20,7 @@
#include "super.h"
#include "mds_client.h"
#include "cache.h"
+#include "crypto.h"

#include <linux/ceph/ceph_features.h>
#include <linux/ceph/decode.h>
@@ -1128,6 +1129,8 @@ static int ceph_set_super(struct super_block *s, struct fs_context *fc)
s->s_time_min = 0;
s->s_time_max = U32_MAX;

+ ceph_fscrypt_set_ops(s);
+
ret = set_anon_super_fc(s, fc);
if (ret != 0)
fsc->sb = NULL;
--
2.35.1

2022-03-24 19:02:42

by Jeffrey Layton

[permalink] [raw]
Subject: [RFC PATCH v11 37/51] ceph: add __ceph_sync_read helper support

From: Xiubo Li <[email protected]>

Turn the guts of ceph_sync_read into a new helper that takes an inode
and an offset instead of a kiocb struct, and make ceph_sync_read call
the helper as a wrapper.

Signed-off-by: Xiubo Li <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
---
fs/ceph/file.c | 33 +++++++++++++++++++++------------
fs/ceph/super.h | 2 ++
2 files changed, 23 insertions(+), 12 deletions(-)

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index df790317bedb..0e91ae995f78 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -923,22 +923,19 @@ enum {
* If we get a short result from the OSD, check against i_size; we need to
* only return a short read to the caller if we hit EOF.
*/
-static ssize_t ceph_sync_read(struct kiocb *iocb, struct iov_iter *to,
- int *retry_op)
+ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
+ struct iov_iter *to, int *retry_op)
{
- struct file *file = iocb->ki_filp;
- struct inode *inode = file_inode(file);
struct ceph_inode_info *ci = ceph_inode(inode);
struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
struct ceph_osd_client *osdc = &fsc->client->osdc;
ssize_t ret;
- u64 off = iocb->ki_pos;
+ u64 off = *ki_pos;
u64 len = iov_iter_count(to);
u64 i_size = i_size_read(inode);
bool sparse = ceph_test_mount_opt(fsc, SPARSEREAD);

- dout("sync_read on file %p %llu~%u %s\n", file, off, (unsigned)len,
- (file->f_flags & O_DIRECT) ? "O_DIRECT" : "");
+ dout("sync_read on inode %p %llx~%llx\n", inode, *ki_pos, len);

if (!len)
return 0;
@@ -1057,14 +1054,14 @@ static ssize_t ceph_sync_read(struct kiocb *iocb, struct iov_iter *to,
break;
}

- if (off > iocb->ki_pos) {
+ if (off > *ki_pos) {
if (off >= i_size) {
*retry_op = CHECK_EOF;
- ret = i_size - iocb->ki_pos;
- iocb->ki_pos = i_size;
+ ret = i_size - *ki_pos;
+ *ki_pos = i_size;
} else {
- ret = off - iocb->ki_pos;
- iocb->ki_pos = off;
+ ret = off - *ki_pos;
+ *ki_pos = off;
}
}

@@ -1072,6 +1069,18 @@ static ssize_t ceph_sync_read(struct kiocb *iocb, struct iov_iter *to,
return ret;
}

+static ssize_t ceph_sync_read(struct kiocb *iocb, struct iov_iter *to,
+ int *retry_op)
+{
+ struct file *file = iocb->ki_filp;
+ struct inode *inode = file_inode(file);
+
+ dout("sync_read on file %p %llx~%zx %s\n", file, iocb->ki_pos,
+ iov_iter_count(to), (file->f_flags & O_DIRECT) ? "O_DIRECT" : "");
+
+ return __ceph_sync_read(inode, &iocb->ki_pos, to, retry_op);
+}
+
struct ceph_aio_request {
struct kiocb *iocb;
size_t total_len;
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index fef4cda44861..339284e90cb3 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -1266,6 +1266,8 @@ extern int ceph_renew_caps(struct inode *inode, int fmode);
extern int ceph_open(struct inode *inode, struct file *file);
extern int ceph_atomic_open(struct inode *dir, struct dentry *dentry,
struct file *file, unsigned flags, umode_t mode);
+extern ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
+ struct iov_iter *to, int *retry_op);
extern int ceph_release(struct inode *inode, struct file *filp);
extern void ceph_fill_inline_data(struct inode *inode, struct page *locked_page,
char *data, size_t len);
--
2.35.1

2022-03-24 21:08:47

by Luis Henriques

[permalink] [raw]
Subject: Re: [RFC PATCH v11 02/51] fscrypt: export fscrypt_base64url_encode and fscrypt_base64url_decode

Hi Eric,

Jeff Layton <[email protected]> writes:

> Ceph is going to add fscrypt support, but we still want encrypted
> filenames to be composed of printable characters, so we can maintain
> compatibility with clients that don't support fscrypt.
>
> We could just adopt fscrypt's current nokey name format, but that is
> subject to change in the future, and it also contains dirhash fields
> that we don't need for cephfs. Because of this, we're going to concoct
> our own scheme for encoding encrypted filenames. It's very similar to
> fscrypt's current scheme, but doesn't bother with the dirhash fields.
>
> The ceph encoding scheme will use base64 encoding as well, and we also
> want it to avoid characters that are illegal in filenames. Export the
> fscrypt base64 encoding/decoding routines so we can use them in ceph's
> fscrypt implementation.
>
> Acked-by: Eric Biggers <[email protected]>
> Signed-off-by: Jeff Layton <[email protected]>
> ---
> fs/crypto/fname.c | 8 ++++----
> include/linux/fscrypt.h | 5 +++++
> 2 files changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/fs/crypto/fname.c b/fs/crypto/fname.c
> index a9be4bc74a94..1e4233c95005 100644
> --- a/fs/crypto/fname.c
> +++ b/fs/crypto/fname.c
> @@ -182,8 +182,6 @@ static int fname_decrypt(const struct inode *inode,
> static const char base64url_table[65] =
> "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_";
>
> -#define FSCRYPT_BASE64URL_CHARS(nbytes) DIV_ROUND_UP((nbytes) * 4, 3)
> -
> /**
> * fscrypt_base64url_encode() - base64url-encode some binary data
> * @src: the binary data to encode
> @@ -198,7 +196,7 @@ static const char base64url_table[65] =
> * Return: the length of the resulting base64url-encoded string in bytes.
> * This will be equal to FSCRYPT_BASE64URL_CHARS(srclen).
> */
> -static int fscrypt_base64url_encode(const u8 *src, int srclen, char *dst)
> +int fscrypt_base64url_encode(const u8 *src, int srclen, char *dst)

I know you've ACK'ed this patch already, but I was wondering if you'd be
open to change these encode/decode interfaces so that they could be used
for non-url base64 too.

My motivation is that ceph has this odd limitation where snapshot names
can not start with the '_' character. And I've an RFC that adds snapshot
names encryption support which, unfortunately, can end up starting with
this char after base64 encoding.

So, my current proposal is to use a different encoding table. I was
thinking about the IMAP mailboxes naming which uses '+' and ',' instead of
the '-' and '_', but any other charset would be OK (except those that
include '/' of course). So, instead of adding yet another base64
implementation to the kernel, I was wondering if you'd be OK accepting a
patch to add an optional arg to these encoding/decoding functions to pass
an alternative table. Or, if you'd prefer, keep the existing interface
but turning these functions into wrappers to more generic functions.

Obviously, Jeff, please feel free to comment too if you have any reserves
regarding this approach.

Cheers,
--
Luís

> {
> u32 ac = 0;
> int bits = 0;
> @@ -217,6 +215,7 @@ static int fscrypt_base64url_encode(const u8 *src, int srclen, char *dst)
> *cp++ = base64url_table[(ac << (6 - bits)) & 0x3f];
> return cp - dst;
> }
> +EXPORT_SYMBOL_GPL(fscrypt_base64url_encode);
>
> /**
> * fscrypt_base64url_decode() - base64url-decode a string
> @@ -233,7 +232,7 @@ static int fscrypt_base64url_encode(const u8 *src, int srclen, char *dst)
> * Return: the length of the resulting decoded binary data in bytes,
> * or -1 if the string isn't a valid base64url string.
> */
> -static int fscrypt_base64url_decode(const char *src, int srclen, u8 *dst)
> +int fscrypt_base64url_decode(const char *src, int srclen, u8 *dst)
> {
> u32 ac = 0;
> int bits = 0;
> @@ -256,6 +255,7 @@ static int fscrypt_base64url_decode(const char *src, int srclen, u8 *dst)
> return -1;
> return bp - dst;
> }
> +EXPORT_SYMBOL_GPL(fscrypt_base64url_decode);
>
> bool fscrypt_fname_encrypted_size(const union fscrypt_policy *policy,
> u32 orig_len, u32 max_len,
> diff --git a/include/linux/fscrypt.h b/include/linux/fscrypt.h
> index 91ea9477e9bd..671181d196a8 100644
> --- a/include/linux/fscrypt.h
> +++ b/include/linux/fscrypt.h
> @@ -46,6 +46,9 @@ struct fscrypt_name {
> /* Maximum value for the third parameter of fscrypt_operations.set_context(). */
> #define FSCRYPT_SET_CONTEXT_MAX_SIZE 40
>
> +/* len of resulting string (sans NUL terminator) after base64 encoding nbytes */
> +#define FSCRYPT_BASE64URL_CHARS(nbytes) DIV_ROUND_UP((nbytes) * 4, 3)
> +
> #ifdef CONFIG_FS_ENCRYPTION
>
> /*
> @@ -305,6 +308,8 @@ void fscrypt_free_inode(struct inode *inode);
> int fscrypt_drop_inode(struct inode *inode);
>
> /* fname.c */
> +int fscrypt_base64url_encode(const u8 *src, int len, char *dst);
> +int fscrypt_base64url_decode(const char *src, int len, u8 *dst);
> int fscrypt_setup_filename(struct inode *inode, const struct qstr *iname,
> int lookup, struct fscrypt_name *fname);
>
> --
>
> 2.35.1
>

2022-03-25 15:25:57

by Colin Walters

[permalink] [raw]
Subject: Re: [RFC PATCH v11 02/51] fscrypt: export fscrypt_base64url_encode and fscrypt_base64url_decode



On Wed, Mar 23, 2022, at 10:33 AM, Luís Henriques wrote:

> So, my current proposal is to use a different encoding table.

Another alternative is https://en.wikipedia.org/wiki/Base62

2022-03-25 17:41:45

by Jeffrey Layton

[permalink] [raw]
Subject: Re: [RFC PATCH v11 00/51] ceph+fscrypt : full support

On Tue, 2022-03-22 at 10:12 -0400, Jeff Layton wrote:
> This patchset represents a (mostly) working prototype of the
> ceph+fscrypt work. With this, I'm able run xfstests with
> test_dummy_encryption, and most of the tests that pass on ceph without
> fscrypt now pass on it.
>
> When I made the last posting of this series [1], I mentioned that proper
> support for sparse read support would be necessary to do this. Thus, the
> biggest difference from the v10 set is that this is now based on top of
> the patch series that I posted yesterday to implement sparse reads [2].
>
> Aside from that, there are also numerous cleanups all over the tree, as
> well as an overhaul of the readdir handling by Xiubo.
>
> This series is not yet bug-free, but it's at a point where it is quite
> usable, providing you're running against the Quincy release of ceph
> (which should ship sometime in the next few months).
>
> Next Steps:
> ===========
> I'm not going to sugar-coat it. This is a huge, invasive patch series
> that touches a lot of the most sensitive code in ceph.
>
> Eric Biggers has acked the changes we need in fscrypt infrastructure. I
> still need Al to ack exporting the new_inode_pseudo symbol. The rest is
> pretty much all ceph and libceph code.
>
> The main piece missing at this point is support for sparse reads with
> ms_mode settings other than "crc". Once that's complete, I want to merge
> that and this series into the ceph "testing" branch so we can start
> running tests against it in teuthology with fscrypt enabled.
>
> If that goes well, I think we could probably merge this into mainline
> for v5.20 or v5.21. There is also some incoming support for netfs write
> and DIO read helpers that we may want to convert to as well [3]. That
> may alter the timing as well.
>
> Review, comments and questions are welcome...
>
> [1]: https://lore.kernel.org/ceph-devel/[email protected]/
>
> [2]: https://lore.kernel.org/ceph-devel/[email protected]/
>
> [3]: https://lore.kernel.org/ceph-devel/[email protected]/T/#maec7e3579f13a45171ad23d7a49183d169fcfcca
>
> Jeff Layton (41):
> vfs: export new_inode_pseudo
> fscrypt: export fscrypt_base64url_encode and fscrypt_base64url_decode
> fscrypt: export fscrypt_fname_encrypt and fscrypt_fname_encrypted_size
> fscrypt: add fscrypt_context_for_new_inode
> ceph: preallocate inode for ops that may create one
> ceph: crypto context handling for ceph
> ceph: parse new fscrypt_auth and fscrypt_file fields in inode traces
> ceph: add support for fscrypt_auth/fscrypt_file to cap messages
> ceph: add ability to set fscrypt_auth via setattr
> ceph: implement -o test_dummy_encryption mount option
> ceph: decode alternate_name in lease info
> ceph: add fscrypt ioctls
> ceph: make ceph_msdc_build_path use ref-walk
> ceph: add encrypted fname handling to ceph_mdsc_build_path
> ceph: send altname in MClientRequest
> ceph: encode encrypted name in dentry release
> ceph: properly set DCACHE_NOKEY_NAME flag in lookup
> ceph: make d_revalidate call fscrypt revalidator for encrypted
> dentries
> ceph: add helpers for converting names for userland presentation
> ceph: add fscrypt support to ceph_fill_trace
> ceph: create symlinks with encrypted and base64-encoded targets
> ceph: make ceph_get_name decrypt filenames
> ceph: add a new ceph.fscrypt.auth vxattr
> ceph: add some fscrypt guardrails
> libceph: add CEPH_OSD_OP_ASSERT_VER support
> ceph: size handling for encrypted inodes in cap updates
> ceph: fscrypt_file field handling in MClientRequest messages
> ceph: get file size from fscrypt_file when present in inode traces
> ceph: handle fscrypt fields in cap messages from MDS
> ceph: add infrastructure for file encryption and decryption
> libceph: allow ceph_osdc_new_request to accept a multi-op read
> ceph: disable fallocate for encrypted inodes
> ceph: disable copy offload on encrypted inodes
> ceph: don't use special DIO path for encrypted inodes
> ceph: align data in pages in ceph_sync_write
> ceph: add read/modify/write to ceph_sync_write
> ceph: plumb in decryption during sync reads
> ceph: add fscrypt decryption support to ceph_netfs_issue_op
> ceph: set i_blkbits to crypto block size for encrypted inodes
> ceph: add encryption support to writepage
> ceph: fscrypt support for writepages
>
> Luis Henriques (1):
> ceph: don't allow changing layout on encrypted files/directories
>
> Xiubo Li (9):
> ceph: make the ioctl cmd more readable in debug log
> ceph: fix base64 encoded name's length check in ceph_fname_to_usr()
> ceph: pass the request to parse_reply_info_readdir()
> ceph: add ceph_encode_encrypted_dname() helper
> ceph: add support to readdir for encrypted filenames
> ceph: add __ceph_get_caps helper support
> ceph: add __ceph_sync_read helper support
> ceph: add object version support for sync read
> ceph: add truncate size handling support for fscrypt
>
> fs/ceph/Makefile | 1 +
> fs/ceph/acl.c | 4 +-
> fs/ceph/addr.c | 128 ++++++--
> fs/ceph/caps.c | 212 +++++++++++--
> fs/ceph/crypto.c | 432 +++++++++++++++++++++++++
> fs/ceph/crypto.h | 256 +++++++++++++++
> fs/ceph/dir.c | 182 ++++++++---
> fs/ceph/export.c | 44 ++-
> fs/ceph/file.c | 530 ++++++++++++++++++++++++++-----
> fs/ceph/inode.c | 546 +++++++++++++++++++++++++++++---
> fs/ceph/ioctl.c | 126 +++++++-
> fs/ceph/mds_client.c | 455 ++++++++++++++++++++++----
> fs/ceph/mds_client.h | 24 +-
> fs/ceph/super.c | 91 +++++-
> fs/ceph/super.h | 43 ++-
> fs/ceph/xattr.c | 29 ++
> fs/crypto/fname.c | 44 ++-
> fs/crypto/fscrypt_private.h | 9 +-
> fs/crypto/hooks.c | 6 +-
> fs/crypto/policy.c | 35 +-
> fs/inode.c | 1 +
> include/linux/ceph/ceph_fs.h | 21 +-
> include/linux/ceph/osd_client.h | 6 +-
> include/linux/ceph/rados.h | 4 +
> include/linux/fscrypt.h | 10 +
> net/ceph/osd_client.c | 32 +-
> 26 files changed, 2907 insertions(+), 364 deletions(-)
> create mode 100644 fs/ceph/crypto.c
> create mode 100644 fs/ceph/crypto.h
>


I was able to get the sparse reads working on other transports
yesterday, and I've gone ahead and updated the wip-fscrypt branch with
the newest sparse read and fscrypt changes.

For the record, the final diffstat with both patch series is:

30 files changed, 3706 insertions(+), 400 deletions(-)

I'll probably plan to move these into the testing branch next week,
after I do bit more testing locally today. Another thing we'll need to
sort out is how to enable fscrypt for teuthology tests.

As always, more testing and review would definitely be welcome.

Thanks!
--
Jeff Layton <[email protected]>

2022-03-25 17:41:45

by Eric Biggers

[permalink] [raw]
Subject: Re: [RFC PATCH v11 02/51] fscrypt: export fscrypt_base64url_encode and fscrypt_base64url_decode

On Wed, Mar 23, 2022 at 02:33:17PM +0000, Lu?s Henriques wrote:
> Hi Eric,
>
> Jeff Layton <[email protected]> writes:
>
> > Ceph is going to add fscrypt support, but we still want encrypted
> > filenames to be composed of printable characters, so we can maintain
> > compatibility with clients that don't support fscrypt.
> >
> > We could just adopt fscrypt's current nokey name format, but that is
> > subject to change in the future, and it also contains dirhash fields
> > that we don't need for cephfs. Because of this, we're going to concoct
> > our own scheme for encoding encrypted filenames. It's very similar to
> > fscrypt's current scheme, but doesn't bother with the dirhash fields.
> >
> > The ceph encoding scheme will use base64 encoding as well, and we also
> > want it to avoid characters that are illegal in filenames. Export the
> > fscrypt base64 encoding/decoding routines so we can use them in ceph's
> > fscrypt implementation.
> >
> > Acked-by: Eric Biggers <[email protected]>
> > Signed-off-by: Jeff Layton <[email protected]>
> > ---
> > fs/crypto/fname.c | 8 ++++----
> > include/linux/fscrypt.h | 5 +++++
> > 2 files changed, 9 insertions(+), 4 deletions(-)
> >
> > diff --git a/fs/crypto/fname.c b/fs/crypto/fname.c
> > index a9be4bc74a94..1e4233c95005 100644
> > --- a/fs/crypto/fname.c
> > +++ b/fs/crypto/fname.c
> > @@ -182,8 +182,6 @@ static int fname_decrypt(const struct inode *inode,
> > static const char base64url_table[65] =
> > "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_";
> >
> > -#define FSCRYPT_BASE64URL_CHARS(nbytes) DIV_ROUND_UP((nbytes) * 4, 3)
> > -
> > /**
> > * fscrypt_base64url_encode() - base64url-encode some binary data
> > * @src: the binary data to encode
> > @@ -198,7 +196,7 @@ static const char base64url_table[65] =
> > * Return: the length of the resulting base64url-encoded string in bytes.
> > * This will be equal to FSCRYPT_BASE64URL_CHARS(srclen).
> > */
> > -static int fscrypt_base64url_encode(const u8 *src, int srclen, char *dst)
> > +int fscrypt_base64url_encode(const u8 *src, int srclen, char *dst)
>
> I know you've ACK'ed this patch already, but I was wondering if you'd be
> open to change these encode/decode interfaces so that they could be used
> for non-url base64 too.
>
> My motivation is that ceph has this odd limitation where snapshot names
> can not start with the '_' character. And I've an RFC that adds snapshot
> names encryption support which, unfortunately, can end up starting with
> this char after base64 encoding.
>
> So, my current proposal is to use a different encoding table. I was
> thinking about the IMAP mailboxes naming which uses '+' and ',' instead of
> the '-' and '_', but any other charset would be OK (except those that
> include '/' of course). So, instead of adding yet another base64
> implementation to the kernel, I was wondering if you'd be OK accepting a
> patch to add an optional arg to these encoding/decoding functions to pass
> an alternative table. Or, if you'd prefer, keep the existing interface
> but turning these functions into wrappers to more generic functions.
>
> Obviously, Jeff, please feel free to comment too if you have any reserves
> regarding this approach.
>
> Cheers,
> --
> Lu?s
>

Base64 encoding/decoding is trivial enough that I think you should just add your
own functions to fs/ceph/ for now if you need yet another Base64 variant. If we
were to add general functions that allow "building your own" Base64 variant, I
think they'd belong in lib/, not fs/crypto/. (I objected to lib/ in the first
version of Jeff's patchset because that patchset proposed adding just the old,
idiosyncratic fscrypt Base64 variant to lib/ and just calling it "base64", which
was misleading. But, if there were to be properly documented functions to
"build your own" Base64 variant, allowing control over both the character set
and whether padding is done, lib/ would be the place...)

- Eric

2022-03-25 18:04:57

by Luis Henriques

[permalink] [raw]
Subject: Re: [RFC PATCH v11 02/51] fscrypt: export fscrypt_base64url_encode and fscrypt_base64url_decode

Eric Biggers <[email protected]> writes:

> On Wed, Mar 23, 2022 at 02:33:17PM +0000, Luís Henriques wrote:
>> Hi Eric,
>>
>> Jeff Layton <[email protected]> writes:
>>
>> > Ceph is going to add fscrypt support, but we still want encrypted
>> > filenames to be composed of printable characters, so we can maintain
>> > compatibility with clients that don't support fscrypt.
>> >
>> > We could just adopt fscrypt's current nokey name format, but that is
>> > subject to change in the future, and it also contains dirhash fields
>> > that we don't need for cephfs. Because of this, we're going to concoct
>> > our own scheme for encoding encrypted filenames. It's very similar to
>> > fscrypt's current scheme, but doesn't bother with the dirhash fields.
>> >
>> > The ceph encoding scheme will use base64 encoding as well, and we also
>> > want it to avoid characters that are illegal in filenames. Export the
>> > fscrypt base64 encoding/decoding routines so we can use them in ceph's
>> > fscrypt implementation.
>> >
>> > Acked-by: Eric Biggers <[email protected]>
>> > Signed-off-by: Jeff Layton <[email protected]>
>> > ---
>> > fs/crypto/fname.c | 8 ++++----
>> > include/linux/fscrypt.h | 5 +++++
>> > 2 files changed, 9 insertions(+), 4 deletions(-)
>> >
>> > diff --git a/fs/crypto/fname.c b/fs/crypto/fname.c
>> > index a9be4bc74a94..1e4233c95005 100644
>> > --- a/fs/crypto/fname.c
>> > +++ b/fs/crypto/fname.c
>> > @@ -182,8 +182,6 @@ static int fname_decrypt(const struct inode *inode,
>> > static const char base64url_table[65] =
>> > "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_";
>> >
>> > -#define FSCRYPT_BASE64URL_CHARS(nbytes) DIV_ROUND_UP((nbytes) * 4, 3)
>> > -
>> > /**
>> > * fscrypt_base64url_encode() - base64url-encode some binary data
>> > * @src: the binary data to encode
>> > @@ -198,7 +196,7 @@ static const char base64url_table[65] =
>> > * Return: the length of the resulting base64url-encoded string in bytes.
>> > * This will be equal to FSCRYPT_BASE64URL_CHARS(srclen).
>> > */
>> > -static int fscrypt_base64url_encode(const u8 *src, int srclen, char *dst)
>> > +int fscrypt_base64url_encode(const u8 *src, int srclen, char *dst)
>>
>> I know you've ACK'ed this patch already, but I was wondering if you'd be
>> open to change these encode/decode interfaces so that they could be used
>> for non-url base64 too.
>>
>> My motivation is that ceph has this odd limitation where snapshot names
>> can not start with the '_' character. And I've an RFC that adds snapshot
>> names encryption support which, unfortunately, can end up starting with
>> this char after base64 encoding.
>>
>> So, my current proposal is to use a different encoding table. I was
>> thinking about the IMAP mailboxes naming which uses '+' and ',' instead of
>> the '-' and '_', but any other charset would be OK (except those that
>> include '/' of course). So, instead of adding yet another base64
>> implementation to the kernel, I was wondering if you'd be OK accepting a
>> patch to add an optional arg to these encoding/decoding functions to pass
>> an alternative table. Or, if you'd prefer, keep the existing interface
>> but turning these functions into wrappers to more generic functions.
>>
>> Obviously, Jeff, please feel free to comment too if you have any reserves
>> regarding this approach.
>>
>> Cheers,
>> --
>> Luís
>>
>
> Base64 encoding/decoding is trivial enough that I think you should just add your
> own functions to fs/ceph/ for now if you need yet another Base64 variant. If we
> were to add general functions that allow "building your own" Base64 variant, I
> think they'd belong in lib/, not fs/crypto/. (I objected to lib/ in the first
> version of Jeff's patchset because that patchset proposed adding just the old,
> idiosyncratic fscrypt Base64 variant to lib/ and just calling it "base64", which
> was misleading. But, if there were to be properly documented functions to
> "build your own" Base64 variant, allowing control over both the character set
> and whether padding is done, lib/ would be the place...)

OK, that makes sense. I agree that the right place for a generic
implementation would be somewhere out of the fs/crypto/ directory. I
guess that, for now, I'll follow your advice and keep a local
implementation (in fact, the libceph *has* already an implementation!).

But adding a generic implementation and clean-up all the different
implementations in the kernel tree is probably a nice project. For the
future. Maybe. *sigh*

Cheers,
--
Luís

2022-03-25 18:06:41

by Jeffrey Layton

[permalink] [raw]
Subject: Re: [RFC PATCH v11 08/51] ceph: add support for fscrypt_auth/fscrypt_file to cap messages

On Tue, 2022-03-22 at 10:12 -0400, Jeff Layton wrote:
> Add support for new version 12 cap messages that carry the new
> fscrypt_auth and fscrypt_file fields from the inode.
>
> Signed-off-by: Jeff Layton <[email protected]>
> ---
> fs/ceph/caps.c | 76 +++++++++++++++++++++++++++++++++++++++++---------
> 1 file changed, 63 insertions(+), 13 deletions(-)
>
> diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
> index 7d8ef67a1032..b0b7688331b4 100644
> --- a/fs/ceph/caps.c
> +++ b/fs/ceph/caps.c
> @@ -13,6 +13,7 @@
> #include "super.h"
> #include "mds_client.h"
> #include "cache.h"
> +#include "crypto.h"
> #include <linux/ceph/decode.h>
> #include <linux/ceph/messenger.h>
>
> @@ -1214,15 +1215,12 @@ struct cap_msg_args {
> umode_t mode;
> bool inline_data;
> bool wake;
> + u32 fscrypt_auth_len;
> + u32 fscrypt_file_len;
> + u8 fscrypt_auth[sizeof(struct ceph_fscrypt_auth)]; // for context
> + u8 fscrypt_file[sizeof(u64)]; // for size
> };
>
> -/*
> - * cap struct size + flock buffer size + inline version + inline data size +
> - * osd_epoch_barrier + oldest_flush_tid
> - */
> -#define CAP_MSG_SIZE (sizeof(struct ceph_mds_caps) + \
> - 4 + 8 + 4 + 4 + 8 + 4 + 4 + 4 + 8 + 8 + 4)
> -
> /* Marshal up the cap msg to the MDS */
> static void encode_cap_msg(struct ceph_msg *msg, struct cap_msg_args *arg)
> {
> @@ -1238,7 +1236,7 @@ static void encode_cap_msg(struct ceph_msg *msg, struct cap_msg_args *arg)
> arg->size, arg->max_size, arg->xattr_version,
> arg->xattr_buf ? (int)arg->xattr_buf->vec.iov_len : 0);
>
> - msg->hdr.version = cpu_to_le16(10);
> + msg->hdr.version = cpu_to_le16(12);
> msg->hdr.tid = cpu_to_le64(arg->flush_tid);
>
> fc = msg->front.iov_base;
> @@ -1309,6 +1307,21 @@ static void encode_cap_msg(struct ceph_msg *msg, struct cap_msg_args *arg)
>
> /* Advisory flags (version 10) */
> ceph_encode_32(&p, arg->flags);
> +
> + /* dirstats (version 11) - these are r/o on the client */
> + ceph_encode_64(&p, 0);
> + ceph_encode_64(&p, 0);
> +
> +#if IS_ENABLED(CONFIG_FS_ENCRYPTION)
> + /* fscrypt_auth and fscrypt_file (version 12) */
> + ceph_encode_32(&p, arg->fscrypt_auth_len);
> + ceph_encode_copy(&p, arg->fscrypt_auth, arg->fscrypt_auth_len);
> + ceph_encode_32(&p, arg->fscrypt_file_len);
> + ceph_encode_copy(&p, arg->fscrypt_file, arg->fscrypt_file_len);
> +#else /* CONFIG_FS_ENCRYPTION */
> + ceph_encode_32(&p, 0);
> + ceph_encode_32(&p, 0);
> +#endif /* CONFIG_FS_ENCRYPTION */
> }
>
> /*
> @@ -1430,8 +1443,37 @@ static void __prep_cap(struct cap_msg_args *arg, struct ceph_cap *cap,
> }
> }
> arg->flags = flags;
> +#if IS_ENABLED(CONFIG_FS_ENCRYPTION)
> + if (ci->fscrypt_auth_len &&
> + WARN_ON_ONCE(ci->fscrypt_auth_len != sizeof(struct ceph_fscrypt_auth))) {

The above WARN_ON_ONCE is too strict, and causes the client to reject v1
fscrypt contexts (as well as throw the warning). That should be a ">"
instead. I've fixed this in my tree and pushed the fix into wip-fscrypt.


> + /* Don't set this if it isn't right size */
> + arg->fscrypt_auth_len = 0;
> + } else {
> + arg->fscrypt_auth_len = ci->fscrypt_auth_len;
> + memcpy(arg->fscrypt_auth, ci->fscrypt_auth,
> + min_t(size_t, ci->fscrypt_auth_len, sizeof(arg->fscrypt_auth)));
> + }
> + /* FIXME: use this to track "real" size */
> + arg->fscrypt_file_len = 0;
> +#endif /* CONFIG_FS_ENCRYPTION */
> }
>
> +#define CAP_MSG_FIXED_FIELDS (sizeof(struct ceph_mds_caps) + \
> + 4 + 8 + 4 + 4 + 8 + 4 + 4 + 4 + 8 + 8 + 4 + 8 + 8 + 4 + 4)
> +
> +#if IS_ENABLED(CONFIG_FS_ENCRYPTION)
> +static inline int cap_msg_size(struct cap_msg_args *arg)
> +{
> + return CAP_MSG_FIXED_FIELDS + arg->fscrypt_auth_len +
> + arg->fscrypt_file_len;
> +}
> +#else
> +static inline int cap_msg_size(struct cap_msg_args *arg)
> +{
> + return CAP_MSG_FIXED_FIELDS;
> +}
> +#endif /* CONFIG_FS_ENCRYPTION */
> +
> /*
> * Send a cap msg on the given inode.
> *
> @@ -1442,7 +1484,7 @@ static void __send_cap(struct cap_msg_args *arg, struct ceph_inode_info *ci)
> struct ceph_msg *msg;
> struct inode *inode = &ci->vfs_inode;
>
> - msg = ceph_msg_new(CEPH_MSG_CLIENT_CAPS, CAP_MSG_SIZE, GFP_NOFS, false);
> + msg = ceph_msg_new(CEPH_MSG_CLIENT_CAPS, cap_msg_size(arg), GFP_NOFS, false);
> if (!msg) {
> pr_err("error allocating cap msg: ino (%llx.%llx) flushing %s tid %llu, requeuing cap.\n",
> ceph_vinop(inode), ceph_cap_string(arg->dirty),
> @@ -1468,10 +1510,6 @@ static inline int __send_flush_snap(struct inode *inode,
> struct cap_msg_args arg;
> struct ceph_msg *msg;
>
> - msg = ceph_msg_new(CEPH_MSG_CLIENT_CAPS, CAP_MSG_SIZE, GFP_NOFS, false);
> - if (!msg)
> - return -ENOMEM;
> -
> arg.session = session;
> arg.ino = ceph_vino(inode).ino;
> arg.cid = 0;
> @@ -1509,6 +1547,18 @@ static inline int __send_flush_snap(struct inode *inode,
> arg.flags = 0;
> arg.wake = false;
>
> + /*
> + * No fscrypt_auth changes from a capsnap. It will need
> + * to update fscrypt_file on size changes (TODO).
> + */
> + arg.fscrypt_auth_len = 0;
> + arg.fscrypt_file_len = 0;
> +
> + msg = ceph_msg_new(CEPH_MSG_CLIENT_CAPS, cap_msg_size(&arg),
> + GFP_NOFS, false);
> + if (!msg)
> + return -ENOMEM;
> +
> encode_cap_msg(msg, &arg);
> ceph_con_send(&arg.session->s_con, msg);
> return 0;

--
Jeff Layton <[email protected]>