2022-04-01 01:00:12

by Jeffrey Layton

[permalink] [raw]
Subject: [PATCH v12 00/54] ceph+fscrypt: fully-working prototype

This patchset represents a fully-working prototype of fscrypt
integration into cephfs. The main difference from the series I posted
last week is a pile of bugfixes. At this point most of the xfstests that
pass without -o test_dummy_encryption work with it as well, and most of
the "encrypt" group also passes.

The main reason I'm calling this a prototype is because this doesn't yet
support proper integration with fscache. For that, the plan is to
migrate more of the ceph I/O paths to use netfs helpers, and to plumb
the necessary support for fscrypt integration into there.

We could either wait for that support before merging this, or go ahead
and take what we have now and adapt to the netfs helpers later. I'm open
to suggestions here.

Note that this series relies on the sparse read support for the client
that is currently sitting in the "testing" branch in the ceph-client
tree [1]. At this point I'm planning to push this series into the
testing branch as well, so we start getting some testing coverage in
teuthology.

Currently, if you wish to test fscrypt functionality, you will need ceph
built from branch that tracks the master branch. I'm hoping to get one
last required OSD fix in for Quincy, but it may not make GA [2].

Finally, special thanks to Luis and Xiubo, for all of their help with
this, and to Eric Biggers for advice and review. David Howells has also
been instrumental with the netfs work.

We might actually finish this!

[1] https://github.com/ceph/ceph-client
[2] https://github.com/ceph/ceph/pull/45736

Jeff Layton (43):
vfs: export new_inode_pseudo
fscrypt: export fscrypt_base64url_encode and fscrypt_base64url_decode
fscrypt: export fscrypt_fname_encrypt and fscrypt_fname_encrypted_size
fscrypt: add fscrypt_context_for_new_inode
ceph: preallocate inode for ops that may create one
ceph: crypto context handling for ceph
ceph: add a has_stable_inodes operation for ceph
ceph: ensure that we accept a new context from MDS for new inodes
ceph: add support for fscrypt_auth/fscrypt_file to cap messages
ceph: add ability to set fscrypt_auth via setattr
ceph: implement -o test_dummy_encryption mount option
ceph: decode alternate_name in lease info
ceph: add fscrypt ioctls
ceph: make ceph_msdc_build_path use ref-walk
ceph: add encrypted fname handling to ceph_mdsc_build_path
ceph: send altname in MClientRequest
ceph: encode encrypted name in dentry release
ceph: properly set DCACHE_NOKEY_NAME flag in lookup
ceph: set DCACHE_NOKEY_NAME in atomic open
ceph: make d_revalidate call fscrypt revalidator for encrypted
dentries
ceph: add helpers for converting names for userland presentation
ceph: add fscrypt support to ceph_fill_trace
ceph: create symlinks with encrypted and base64-encoded targets
ceph: make ceph_get_name decrypt filenames
ceph: add a new ceph.fscrypt.auth vxattr
ceph: add some fscrypt guardrails
libceph: add CEPH_OSD_OP_ASSERT_VER support
ceph: size handling for encrypted inodes in cap updates
ceph: fscrypt_file field handling in MClientRequest messages
ceph: get file size from fscrypt_file when present in inode traces
ceph: handle fscrypt fields in cap messages from MDS
ceph: add infrastructure for file encryption and decryption
libceph: allow ceph_osdc_new_request to accept a multi-op read
ceph: disable fallocate for encrypted inodes
ceph: disable copy offload on encrypted inodes
ceph: don't use special DIO path for encrypted inodes
ceph: align data in pages in ceph_sync_write
ceph: add read/modify/write to ceph_sync_write
ceph: plumb in decryption during sync reads
ceph: add fscrypt decryption support to ceph_netfs_issue_op
ceph: set i_blkbits to crypto block size for encrypted inodes
ceph: add encryption support to writepage
ceph: fscrypt support for writepages

Luis Henriques (1):
ceph: don't allow changing layout on encrypted files/directories

Luís Henriques (1):
ceph: support legacy v1 encryption policy keysetup

Xiubo Li (9):
ceph: make the ioctl cmd more readable in debug log
ceph: fix base64 encoded name's length check in ceph_fname_to_usr()
ceph: pass the request to parse_reply_info_readdir()
ceph: add ceph_encode_encrypted_dname() helper
ceph: add support to readdir for encrypted filenames
ceph: add __ceph_get_caps helper support
ceph: add __ceph_sync_read helper support
ceph: add object version support for sync read
ceph: add truncate size handling support for fscrypt

fs/ceph/Makefile | 1 +
fs/ceph/acl.c | 4 +-
fs/ceph/addr.c | 128 ++++++--
fs/ceph/caps.c | 216 +++++++++++--
fs/ceph/crypto.c | 449 ++++++++++++++++++++++++++
fs/ceph/crypto.h | 256 +++++++++++++++
fs/ceph/dir.c | 182 ++++++++---
fs/ceph/export.c | 44 ++-
fs/ceph/file.c | 536 ++++++++++++++++++++++++++-----
fs/ceph/inode.c | 547 +++++++++++++++++++++++++++++---
fs/ceph/ioctl.c | 126 +++++++-
fs/ceph/mds_client.c | 465 +++++++++++++++++++++++----
fs/ceph/mds_client.h | 24 +-
fs/ceph/super.c | 91 +++++-
fs/ceph/super.h | 43 ++-
fs/ceph/xattr.c | 29 ++
fs/crypto/fname.c | 44 ++-
fs/crypto/fscrypt_private.h | 9 +-
fs/crypto/hooks.c | 6 +-
fs/crypto/policy.c | 35 +-
fs/inode.c | 1 +
include/linux/ceph/ceph_fs.h | 21 +-
include/linux/ceph/osd_client.h | 6 +-
include/linux/ceph/rados.h | 4 +
include/linux/fscrypt.h | 10 +
net/ceph/osd_client.c | 32 +-
26 files changed, 2944 insertions(+), 365 deletions(-)
create mode 100644 fs/ceph/crypto.c
create mode 100644 fs/ceph/crypto.h

--
2.35.1


2022-04-01 03:06:18

by Jeffrey Layton

[permalink] [raw]
Subject: [PATCH v12 51/54] ceph: add fscrypt decryption support to ceph_netfs_issue_op

Force the use of sparse reads when the inode is encrypted, and add the
appropriate code to decrypt the extent map after receiving.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/ceph/addr.c | 32 +++++++++++++++++++++++---------
1 file changed, 23 insertions(+), 9 deletions(-)

diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 15bc455bc87f..13a37a568a1d 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -18,6 +18,7 @@
#include "mds_client.h"
#include "cache.h"
#include "metric.h"
+#include "crypto.h"
#include <linux/ceph/osd_client.h>
#include <linux/ceph/striper.h>

@@ -217,7 +218,8 @@ static bool ceph_netfs_clamp_length(struct netfs_read_subrequest *subreq)

static void finish_netfs_read(struct ceph_osd_request *req)
{
- struct ceph_fs_client *fsc = ceph_inode_to_client(req->r_inode);
+ struct inode *inode = req->r_inode;
+ struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
struct ceph_osd_data *osd_data = osd_req_op_extent_osd_data(req, 0);
struct netfs_read_subrequest *subreq = req->r_priv;
struct ceph_osd_req_op *op = &req->r_ops[0];
@@ -232,15 +234,24 @@ static void finish_netfs_read(struct ceph_osd_request *req)
subreq->len, i_size_read(req->r_inode));

/* no object means success but no data */
- if (sparse && err >= 0)
- err = ceph_sparse_ext_map_end(op);
- else if (err == -ENOENT)
+ if (err == -ENOENT)
err = 0;
else if (err == -EBLOCKLISTED)
fsc->blocklisted = true;

- if (err >= 0 && err < subreq->len)
- __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
+ if (err >= 0) {
+ if (sparse && err > 0)
+ err = ceph_sparse_ext_map_end(op);
+ if (err < subreq->len)
+ __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
+ if (IS_ENCRYPTED(inode) && err > 0) {
+ err = ceph_fscrypt_decrypt_extents(inode, osd_data->pages,
+ subreq->start, op->extent.sparse_ext,
+ op->extent.sparse_ext_cnt);
+ if (err > subreq->len)
+ err = subreq->len;
+ }
+ }

netfs_subreq_terminated(subreq, err, true);

@@ -315,13 +326,16 @@ static void ceph_netfs_issue_op(struct netfs_read_subrequest *subreq)
size_t page_off;
int err = 0;
u64 len = subreq->len;
- bool sparse = ceph_test_mount_opt(fsc, SPARSEREAD);
+ bool sparse = IS_ENCRYPTED(inode) || ceph_test_mount_opt(fsc, SPARSEREAD);
+ u64 off = subreq->start;

if (ci->i_inline_version != CEPH_INLINE_NONE &&
ceph_netfs_issue_op_inline(subreq))
return;

- req = ceph_osdc_new_request(&fsc->client->osdc, &ci->i_layout, vino, subreq->start, &len,
+ ceph_fscrypt_adjust_off_and_len(inode, &off, &len);
+
+ req = ceph_osdc_new_request(&fsc->client->osdc, &ci->i_layout, vino, off, &len,
0, 1, sparse ? CEPH_OSD_OP_SPARSE_READ : CEPH_OSD_OP_READ,
CEPH_OSD_FLAG_READ | fsc->client->osdc.client->options->read_from_replica,
NULL, ci->i_truncate_seq, ci->i_truncate_size, false);
@@ -341,7 +355,7 @@ static void ceph_netfs_issue_op(struct netfs_read_subrequest *subreq)
}

dout("%s: pos=%llu orig_len=%zu len=%llu\n", __func__, subreq->start, subreq->len, len);
- iov_iter_xarray(&iter, READ, &rreq->mapping->i_pages, subreq->start, len);
+ iov_iter_xarray(&iter, READ, &rreq->mapping->i_pages, off, len);
err = iov_iter_get_pages_alloc(&iter, &pages, len, &page_off);
if (err < 0) {
dout("%s: iov_ter_get_pages_alloc returned %d\n", __func__, err);
--
2.35.1

2022-04-01 03:30:16

by Jeffrey Layton

[permalink] [raw]
Subject: [PATCH v12 29/54] ceph: create symlinks with encrypted and base64-encoded targets

When creating symlinks in encrypted directories, encrypt and
base64-encode the target with the new inode's key before sending to the
MDS.

When filling a symlinked inode, base64-decode it into a buffer that
we'll keep in ci->i_symlink. When get_link is called, decrypt the buffer
into a new one that will hang off i_link.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/ceph/dir.c | 51 ++++++++++++++++++++---
fs/ceph/inode.c | 107 ++++++++++++++++++++++++++++++++++++++++++------
2 files changed, 141 insertions(+), 17 deletions(-)

diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
index 5ce2a6384e55..82a5f37e9d4a 100644
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -942,6 +942,40 @@ static int ceph_create(struct user_namespace *mnt_userns, struct inode *dir,
return ceph_mknod(mnt_userns, dir, dentry, mode, 0);
}

+#if IS_ENABLED(CONFIG_FS_ENCRYPTION)
+static int prep_encrypted_symlink_target(struct ceph_mds_request *req, const char *dest)
+{
+ int err;
+ int len = strlen(dest);
+ struct fscrypt_str osd_link = FSTR_INIT(NULL, 0);
+
+ err = fscrypt_prepare_symlink(req->r_parent, dest, len, PATH_MAX, &osd_link);
+ if (err)
+ goto out;
+
+ err = fscrypt_encrypt_symlink(req->r_new_inode, dest, len, &osd_link);
+ if (err)
+ goto out;
+
+ req->r_path2 = kmalloc(FSCRYPT_BASE64URL_CHARS(osd_link.len) + 1, GFP_KERNEL);
+ if (!req->r_path2) {
+ err = -ENOMEM;
+ goto out;
+ }
+
+ len = fscrypt_base64url_encode(osd_link.name, osd_link.len, req->r_path2);
+ req->r_path2[len] = '\0';
+out:
+ fscrypt_fname_free_buffer(&osd_link);
+ return err;
+}
+#else
+static int prep_encrypted_symlink_target(struct ceph_mds_request *req, const char *dest)
+{
+ return -EOPNOTSUPP;
+}
+#endif
+
static int ceph_symlink(struct user_namespace *mnt_userns, struct inode *dir,
struct dentry *dentry, const char *dest)
{
@@ -973,14 +1007,21 @@ static int ceph_symlink(struct user_namespace *mnt_userns, struct inode *dir,
goto out_req;
}

- req->r_path2 = kstrdup(dest, GFP_KERNEL);
- if (!req->r_path2) {
- err = -ENOMEM;
- goto out_req;
- }
req->r_parent = dir;
ihold(dir);

+ if (IS_ENCRYPTED(req->r_new_inode)) {
+ err = prep_encrypted_symlink_target(req, dest);
+ if (err)
+ goto out_req;
+ } else {
+ req->r_path2 = kstrdup(dest, GFP_KERNEL);
+ if (!req->r_path2) {
+ err = -ENOMEM;
+ goto out_req;
+ }
+ }
+
set_bit(CEPH_MDS_R_PARENT_LOCKED, &req->r_req_flags);
req->r_dentry = dget(dentry);
req->r_num_caps = 2;
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 98ac1369b353..08a0885a49f8 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -35,6 +35,7 @@
*/

static const struct inode_operations ceph_symlink_iops;
+static const struct inode_operations ceph_encrypted_symlink_iops;

static void ceph_inode_work(struct work_struct *work);

@@ -638,6 +639,7 @@ void ceph_free_inode(struct inode *inode)
#ifdef CONFIG_FS_ENCRYPTION
kfree(ci->fscrypt_auth);
#endif
+ fscrypt_free_inode(inode);
kmem_cache_free(ceph_inode_cachep, ci);
}

@@ -835,6 +837,34 @@ void ceph_fill_file_time(struct inode *inode, int issued,
inode, time_warp_seq, ci->i_time_warp_seq);
}

+#if IS_ENABLED(CONFIG_FS_ENCRYPTION)
+static int decode_encrypted_symlink(const char *encsym, int enclen, u8 **decsym)
+{
+ int declen;
+ u8 *sym;
+
+ sym = kmalloc(enclen + 1, GFP_NOFS);
+ if (!sym)
+ return -ENOMEM;
+
+ declen = fscrypt_base64url_decode(encsym, enclen, sym);
+ if (declen < 0) {
+ pr_err("%s: can't decode symlink (%d). Content: %.*s\n",
+ __func__, declen, enclen, encsym);
+ kfree(sym);
+ return -EIO;
+ }
+ sym[declen + 1] = '\0';
+ *decsym = sym;
+ return declen;
+}
+#else
+static int decode_encrypted_symlink(const char *encsym, int symlen, u8 **decsym)
+{
+ return -EOPNOTSUPP;
+}
+#endif
+
/*
* Populate an inode based on info from mds. May be called on new or
* existing inodes.
@@ -1069,26 +1099,39 @@ int ceph_fill_inode(struct inode *inode, struct page *locked_page,
inode->i_fop = &ceph_file_fops;
break;
case S_IFLNK:
- inode->i_op = &ceph_symlink_iops;
if (!ci->i_symlink) {
u32 symlen = iinfo->symlink_len;
char *sym;

spin_unlock(&ci->i_ceph_lock);

- if (symlen != i_size_read(inode)) {
- pr_err("%s %llx.%llx BAD symlink "
- "size %lld\n", __func__,
- ceph_vinop(inode),
- i_size_read(inode));
+ if (IS_ENCRYPTED(inode)) {
+ if (symlen != i_size_read(inode))
+ pr_err("%s %llx.%llx BAD symlink size %lld\n",
+ __func__, ceph_vinop(inode), i_size_read(inode));
+
+ err = decode_encrypted_symlink(iinfo->symlink, symlen, (u8 **)&sym);
+ if (err < 0) {
+ pr_err("%s decoding encrypted symlink failed: %d\n",
+ __func__, err);
+ goto out;
+ }
+ symlen = err;
i_size_write(inode, symlen);
inode->i_blocks = calc_inode_blocks(symlen);
- }
+ } else {
+ if (symlen != i_size_read(inode)) {
+ pr_err("%s %llx.%llx BAD symlink size %lld\n",
+ __func__, ceph_vinop(inode), i_size_read(inode));
+ i_size_write(inode, symlen);
+ inode->i_blocks = calc_inode_blocks(symlen);
+ }

- err = -ENOMEM;
- sym = kstrndup(iinfo->symlink, symlen, GFP_NOFS);
- if (!sym)
- goto out;
+ err = -ENOMEM;
+ sym = kstrndup(iinfo->symlink, symlen, GFP_NOFS);
+ if (!sym)
+ goto out;
+ }

spin_lock(&ci->i_ceph_lock);
if (!ci->i_symlink)
@@ -1096,7 +1139,17 @@ int ceph_fill_inode(struct inode *inode, struct page *locked_page,
else
kfree(sym); /* lost a race */
}
- inode->i_link = ci->i_symlink;
+
+ if (IS_ENCRYPTED(inode)) {
+ /*
+ * Encrypted symlinks need to be decrypted before we can
+ * cache their targets in i_link. Don't touch it here.
+ */
+ inode->i_op = &ceph_encrypted_symlink_iops;
+ } else {
+ inode->i_link = ci->i_symlink;
+ inode->i_op = &ceph_symlink_iops;
+ }
break;
case S_IFDIR:
inode->i_op = &ceph_dir_iops;
@@ -2125,6 +2178,29 @@ static void ceph_inode_work(struct work_struct *work)
iput(inode);
}

+static const char *ceph_encrypted_get_link(struct dentry *dentry, struct inode *inode,
+ struct delayed_call *done)
+{
+ struct ceph_inode_info *ci = ceph_inode(inode);
+
+ if (!dentry)
+ return ERR_PTR(-ECHILD);
+
+ return fscrypt_get_symlink(inode, ci->i_symlink, i_size_read(inode), done);
+}
+
+static int ceph_encrypted_symlink_getattr(struct user_namespace *mnt_userns,
+ const struct path *path, struct kstat *stat,
+ u32 request_mask, unsigned int query_flags)
+{
+ int ret;
+
+ ret = ceph_getattr(mnt_userns, path, stat, request_mask, query_flags);
+ if (ret)
+ return ret;
+ return fscrypt_symlink_getattr(path, stat);
+}
+
/*
* symlinks
*/
@@ -2135,6 +2211,13 @@ static const struct inode_operations ceph_symlink_iops = {
.listxattr = ceph_listxattr,
};

+static const struct inode_operations ceph_encrypted_symlink_iops = {
+ .get_link = ceph_encrypted_get_link,
+ .setattr = ceph_setattr,
+ .getattr = ceph_encrypted_symlink_getattr,
+ .listxattr = ceph_listxattr,
+};
+
int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *cia)
{
struct ceph_inode_info *ci = ceph_inode(inode);
--
2.35.1

2022-04-01 03:57:58

by Jeffrey Layton

[permalink] [raw]
Subject: [PATCH v12 22/54] ceph: make d_revalidate call fscrypt revalidator for encrypted dentries

If we have a dentry which represents a no-key name, then we need to test
whether the parent directory's encryption key has since been added. Do
that before we test anything else about the dentry.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/ceph/dir.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
index 897f8618151b..caf2547c3fe1 100644
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -1709,6 +1709,10 @@ static int ceph_d_revalidate(struct dentry *dentry, unsigned int flags)
struct inode *dir, *inode;
struct ceph_mds_client *mdsc;

+ valid = fscrypt_d_revalidate(dentry, flags);
+ if (valid <= 0)
+ return valid;
+
if (flags & LOOKUP_RCU) {
parent = READ_ONCE(dentry->d_parent);
dir = d_inode_rcu(parent);
@@ -1721,8 +1725,8 @@ static int ceph_d_revalidate(struct dentry *dentry, unsigned int flags)
inode = d_inode(dentry);
}

- dout("d_revalidate %p '%pd' inode %p offset 0x%llx\n", dentry,
- dentry, inode, ceph_dentry(dentry)->offset);
+ dout("d_revalidate %p '%pd' inode %p offset 0x%llx nokey %d\n", dentry,
+ dentry, inode, ceph_dentry(dentry)->offset, !!(dentry->d_flags & DCACHE_NOKEY_NAME));

mdsc = ceph_sb_to_client(dir->i_sb)->mdsc;

--
2.35.1

2022-04-01 06:17:41

by Jeffrey Layton

[permalink] [raw]
Subject: [PATCH v12 07/54] ceph: support legacy v1 encryption policy keysetup

From: Luís Henriques <[email protected]>

fstests make use of legacy keysetup where the key description uses a
filesystem-specific prefix. Add this ceph-specific prefix to the
fscrypt_operations data structure.

Signed-off-by: Luís Henriques <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
---
fs/ceph/crypto.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c
index a513ff373b13..d1a6595a810f 100644
--- a/fs/ceph/crypto.c
+++ b/fs/ceph/crypto.c
@@ -65,6 +65,7 @@ static bool ceph_crypt_empty_dir(struct inode *inode)
}

static struct fscrypt_operations ceph_fscrypt_ops = {
+ .key_prefix = "ceph:",
.get_context = ceph_crypt_get_context,
.set_context = ceph_crypt_set_context,
.empty_dir = ceph_crypt_empty_dir,
--
2.35.1

2022-04-01 06:57:29

by Jeffrey Layton

[permalink] [raw]
Subject: [PATCH v12 32/54] ceph: add some fscrypt guardrails

Add the appropriate calls into fscrypt for various actions, including
link, rename, setattr, and the open codepaths.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/ceph/dir.c | 8 ++++++++
fs/ceph/file.c | 14 +++++++++++++-
fs/ceph/inode.c | 4 ++++
3 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
index 82a5f37e9d4a..8a9f916bfc6c 100644
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -1121,6 +1121,10 @@ static int ceph_link(struct dentry *old_dentry, struct inode *dir,
if (ceph_snap(dir) != CEPH_NOSNAP)
return -EROFS;

+ err = fscrypt_prepare_link(old_dentry, dir, dentry);
+ if (err)
+ return err;
+
dout("link in dir %p old_dentry %p dentry %p\n", dir,
old_dentry, dentry);
req = ceph_mdsc_create_request(mdsc, CEPH_MDS_OP_LINK, USE_AUTH_MDS);
@@ -1318,6 +1322,10 @@ static int ceph_rename(struct user_namespace *mnt_userns, struct inode *old_dir,
(!ceph_quota_is_same_realm(old_dir, new_dir)))
return -EXDEV;

+ err = fscrypt_prepare_rename(old_dir, old_dentry, new_dir, new_dentry, flags);
+ if (err)
+ return err;
+
dout("rename dir %p dentry %p to dir %p dentry %p\n",
old_dir, old_dentry, new_dir, new_dentry);
req = ceph_mdsc_create_request(mdsc, op, USE_AUTH_MDS);
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index f9f7dc4c902d..cb5962998f8b 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -372,8 +372,13 @@ int ceph_open(struct inode *inode, struct file *file)

/* filter out O_CREAT|O_EXCL; vfs did that already. yuck. */
flags = file->f_flags & ~(O_CREAT|O_EXCL);
- if (S_ISDIR(inode->i_mode))
+ if (S_ISDIR(inode->i_mode)) {
flags = O_DIRECTORY; /* mds likes to know */
+ } else if (S_ISREG(inode->i_mode)) {
+ err = fscrypt_file_open(inode, file);
+ if (err)
+ return err;
+ }

dout("open inode %p ino %llx.%llx file %p flags %d (%d)\n", inode,
ceph_vinop(inode), file, flags, file->f_flags);
@@ -849,6 +854,13 @@ int ceph_atomic_open(struct inode *dir, struct dentry *dentry,
dout("atomic_open finish_no_open on dn %p\n", dn);
err = finish_no_open(file, dn);
} else {
+ if (IS_ENCRYPTED(dir) &&
+ !fscrypt_has_permitted_context(dir, d_inode(dentry))) {
+ pr_warn("Inconsistent encryption context (parent %llx:%llx child %llx:%llx)\n",
+ ceph_vinop(dir), ceph_vinop(d_inode(dentry)));
+ goto out_req;
+ }
+
dout("atomic_open finish_open on dn %p\n", dn);
if (req->r_op == CEPH_MDS_OP_CREATE && req->r_reply_info.has_create_ino) {
struct inode *newino = d_inode(dentry);
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 08a0885a49f8..733f2a409346 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -2486,6 +2486,10 @@ int ceph_setattr(struct user_namespace *mnt_userns, struct dentry *dentry,
if (ceph_inode_is_shutdown(inode))
return -ESTALE;

+ err = fscrypt_prepare_setattr(dentry, attr);
+ if (err)
+ return err;
+
err = setattr_prepare(&init_user_ns, dentry, attr);
if (err != 0)
return err;
--
2.35.1

2022-04-01 07:10:16

by Jeffrey Layton

[permalink] [raw]
Subject: [PATCH v12 47/54] ceph: don't use special DIO path for encrypted inodes

Eventually I want to merge the synchronous and direct read codepaths,
possibly via new netfs infrastructure. For now, the direct path is not
crypto-enabled, so use the sync read/write paths instead.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/ceph/file.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 29cc65222ddd..fdd59477af5b 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -1711,7 +1711,9 @@ static ssize_t ceph_read_iter(struct kiocb *iocb, struct iov_iter *to)
ceph_cap_string(got));

if (ci->i_inline_version == CEPH_INLINE_NONE) {
- if (!retry_op && (iocb->ki_flags & IOCB_DIRECT)) {
+ if (!retry_op &&
+ (iocb->ki_flags & IOCB_DIRECT) &&
+ !IS_ENCRYPTED(inode)) {
ret = ceph_direct_read_write(iocb, to,
NULL, NULL);
if (ret >= 0 && ret < len)
@@ -1937,7 +1939,7 @@ static ssize_t ceph_write_iter(struct kiocb *iocb, struct iov_iter *from)

/* we might need to revert back to that point */
data = *from;
- if (iocb->ki_flags & IOCB_DIRECT)
+ if ((iocb->ki_flags & IOCB_DIRECT) && !IS_ENCRYPTED(inode))
written = ceph_direct_read_write(iocb, &data, snapc,
&prealloc_cf);
else
--
2.35.1

2022-04-01 07:23:53

by Jeffrey Layton

[permalink] [raw]
Subject: [PATCH v12 36/54] ceph: fscrypt_file field handling in MClientRequest messages

For encrypted inodes, transmit a rounded-up size to the MDS as the
normal file size and send the real inode size in fscrypt_file field.

Also, fix up creates and truncates to also transmit fscrypt_file.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/ceph/dir.c | 3 +++
fs/ceph/file.c | 1 +
fs/ceph/inode.c | 18 ++++++++++++++++--
fs/ceph/mds_client.c | 9 ++++++++-
fs/ceph/mds_client.h | 2 ++
5 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
index 8a9f916bfc6c..5ccf6453f02f 100644
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -910,6 +910,9 @@ static int ceph_mknod(struct user_namespace *mnt_userns, struct inode *dir,
goto out_req;
}

+ if (S_ISREG(mode) && IS_ENCRYPTED(dir))
+ set_bit(CEPH_MDS_R_FSCRYPT_FILE, &req->r_req_flags);
+
req->r_dentry = dget(dentry);
req->r_num_caps = 2;
req->r_parent = dir;
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index cb5962998f8b..ca370c628b7f 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -766,6 +766,7 @@ int ceph_atomic_open(struct inode *dir, struct dentry *dentry,
req->r_parent = dir;
ihold(dir);
if (IS_ENCRYPTED(dir)) {
+ set_bit(CEPH_MDS_R_FSCRYPT_FILE, &req->r_req_flags);
if (!fscrypt_has_encryption_key(dir)) {
spin_lock(&dentry->d_lock);
dentry->d_flags |= DCACHE_NOKEY_NAME;
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 733f2a409346..d53f771128ed 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -2377,11 +2377,25 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c
}
} else if ((issued & CEPH_CAP_FILE_SHARED) == 0 ||
attr->ia_size != isize) {
- req->r_args.setattr.size = cpu_to_le64(attr->ia_size);
- req->r_args.setattr.old_size = cpu_to_le64(isize);
mask |= CEPH_SETATTR_SIZE;
release |= CEPH_CAP_FILE_SHARED | CEPH_CAP_FILE_EXCL |
CEPH_CAP_FILE_RD | CEPH_CAP_FILE_WR;
+ if (IS_ENCRYPTED(inode) && attr->ia_size) {
+ set_bit(CEPH_MDS_R_FSCRYPT_FILE, &req->r_req_flags);
+ mask |= CEPH_SETATTR_FSCRYPT_FILE;
+ req->r_args.setattr.size =
+ cpu_to_le64(round_up(attr->ia_size,
+ CEPH_FSCRYPT_BLOCK_SIZE));
+ req->r_args.setattr.old_size =
+ cpu_to_le64(round_up(isize,
+ CEPH_FSCRYPT_BLOCK_SIZE));
+ req->r_fscrypt_file = attr->ia_size;
+ /* FIXME: client must zero out any partial blocks! */
+ } else {
+ req->r_args.setattr.size = cpu_to_le64(attr->ia_size);
+ req->r_args.setattr.old_size = cpu_to_le64(isize);
+ req->r_fscrypt_file = 0;
+ }
}
}
if (ia_valid & ATTR_MTIME) {
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 50fe77768295..0da85c9ce73a 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -2752,7 +2752,12 @@ static void encode_mclientrequest_tail(void **p, const struct ceph_mds_request *
} else {
ceph_encode_32(p, 0);
}
- ceph_encode_32(p, 0); // fscrypt_file for now
+ if (test_bit(CEPH_MDS_R_FSCRYPT_FILE, &req->r_req_flags)) {
+ ceph_encode_32(p, sizeof(__le64));
+ ceph_encode_64(p, req->r_fscrypt_file);
+ } else {
+ ceph_encode_32(p, 0);
+ }
}

/*
@@ -2838,6 +2843,8 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,

/* fscrypt_file */
len += sizeof(u32);
+ if (test_bit(CEPH_MDS_R_FSCRYPT_FILE, &req->r_req_flags))
+ len += sizeof(__le64);

msg = ceph_msg_new2(CEPH_MSG_CLIENT_REQUEST, len, 1, GFP_NOFS, false);
if (!msg) {
diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
index 046a9368c4a9..e297bf98c39f 100644
--- a/fs/ceph/mds_client.h
+++ b/fs/ceph/mds_client.h
@@ -282,6 +282,7 @@ struct ceph_mds_request {
#define CEPH_MDS_R_DID_PREPOPULATE (6) /* prepopulated readdir */
#define CEPH_MDS_R_PARENT_LOCKED (7) /* is r_parent->i_rwsem wlocked? */
#define CEPH_MDS_R_ASYNC (8) /* async request */
+#define CEPH_MDS_R_FSCRYPT_FILE (9) /* must marshal fscrypt_file field */
unsigned long r_req_flags;

struct mutex r_fill_mutex;
@@ -289,6 +290,7 @@ struct ceph_mds_request {
union ceph_mds_request_args r_args;

struct ceph_fscrypt_auth *r_fscrypt_auth;
+ u64 r_fscrypt_file;

u8 *r_altname; /* fscrypt binary crypttext for long filenames */
u32 r_altname_len; /* length of r_altname */
--
2.35.1

2022-04-01 08:11:08

by Jeffrey Layton

[permalink] [raw]
Subject: [PATCH v12 41/54] ceph: add object version support for sync read

From: Xiubo Li <[email protected]>

Always return the last object's version.

Signed-off-by: Xiubo Li <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
---
fs/ceph/file.c | 12 ++++++++++--
fs/ceph/super.h | 3 ++-
2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 9aa997bcfacc..b3a5333c0a44 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -930,7 +930,8 @@ enum {
* only return a short read to the caller if we hit EOF.
*/
ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
- struct iov_iter *to, int *retry_op)
+ struct iov_iter *to, int *retry_op,
+ u64 *last_objver)
{
struct ceph_inode_info *ci = ceph_inode(inode);
struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
@@ -940,6 +941,7 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
u64 len = iov_iter_count(to);
u64 i_size = i_size_read(inode);
bool sparse = ceph_test_mount_opt(fsc, SPARSEREAD);
+ u64 objver = 0;

dout("sync_read on inode %p %llx~%llx\n", inode, *ki_pos, len);

@@ -1010,6 +1012,9 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
req->r_end_latency,
len, ret);

+ if (ret > 0)
+ objver = req->r_version;
+
i_size = i_size_read(inode);
dout("sync_read %llu~%llu got %zd i_size %llu%s\n",
off, len, ret, i_size, (more ? " MORE" : ""));
@@ -1071,6 +1076,9 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
}
}

+ if (last_objver && ret > 0)
+ *last_objver = objver;
+
dout("sync_read result %zd retry_op %d\n", ret, *retry_op);
return ret;
}
@@ -1084,7 +1092,7 @@ static ssize_t ceph_sync_read(struct kiocb *iocb, struct iov_iter *to,
dout("sync_read on file %p %llx~%zx %s\n", file, iocb->ki_pos,
iov_iter_count(to), (file->f_flags & O_DIRECT) ? "O_DIRECT" : "");

- return __ceph_sync_read(inode, &iocb->ki_pos, to, retry_op);
+ return __ceph_sync_read(inode, &iocb->ki_pos, to, retry_op, NULL);
}

struct ceph_aio_request {
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index 339284e90cb3..e3d63b727e52 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -1267,7 +1267,8 @@ extern int ceph_open(struct inode *inode, struct file *file);
extern int ceph_atomic_open(struct inode *dir, struct dentry *dentry,
struct file *file, unsigned flags, umode_t mode);
extern ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
- struct iov_iter *to, int *retry_op);
+ struct iov_iter *to, int *retry_op,
+ u64 *last_objver);
extern int ceph_release(struct inode *inode, struct file *filp);
extern void ceph_fill_inline_data(struct inode *inode, struct page *locked_page,
char *data, size_t len);
--
2.35.1

2022-04-01 08:38:43

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH v12 07/54] ceph: support legacy v1 encryption policy keysetup

On Thu, Mar 31, 2022 at 11:30:43AM -0400, Jeff Layton wrote:
> From: Lu?s Henriques <[email protected]>
>
> fstests make use of legacy keysetup where the key description uses a
> filesystem-specific prefix. Add this ceph-specific prefix to the
> fscrypt_operations data structure.
>
> Signed-off-by: Lu?s Henriques <[email protected]>
> Signed-off-by: Jeff Layton <[email protected]>
> ---
> fs/ceph/crypto.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c
> index a513ff373b13..d1a6595a810f 100644
> --- a/fs/ceph/crypto.c
> +++ b/fs/ceph/crypto.c
> @@ -65,6 +65,7 @@ static bool ceph_crypt_empty_dir(struct inode *inode)
> }
>
> static struct fscrypt_operations ceph_fscrypt_ops = {
> + .key_prefix = "ceph:",
> .get_context = ceph_crypt_get_context,
> .set_context = ceph_crypt_set_context,
> .empty_dir = ceph_crypt_empty_dir,
> --
> 2.35.1
>

As I mentioned before
(https://lore.kernel.org/r/[email protected]), I don't
think you should do this, given that the filesystem-specific key description
prefixes are deprecated. In fact, they're sort of doubly deprecated, since
first they were superseded by "fscrypt:", and then "login" keys were
superseded by FS_IOC_ADD_ENCRYPTION_KEY.

How about updating fstests to use "fscrypt:" instead of "$FSTYP:" if $FSTYP is
not ext4 or f2fs?

Or maybe fstests should just use "fscrypt:" unconditionally, given that this has
been supported by ext4 and f2fs since 4.8, and 4.9 is now the oldest supported
LTS kernel.

- Eric

2022-04-01 09:49:34

by Jeffrey Layton

[permalink] [raw]
Subject: [PATCH v12 21/54] ceph: set DCACHE_NOKEY_NAME in atomic open

Atomic open can act as a lookup if handed a dentry that is negative on
the MDS. Ensure that we set DCACHE_NOKEY_NAME on the dentry in
atomic_open, if we don't have the key for the parent. Otherwise, we can
end up validating the dentry inappropriately if someone later adds a
key.

Reviewed-by: Xiubo Li <[email protected]>
Reviewed-by: Luís Henriques <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
---
fs/ceph/file.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 5832dcea2d8c..f9f7dc4c902d 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -760,6 +760,13 @@ int ceph_atomic_open(struct inode *dir, struct dentry *dentry,
req->r_args.open.mask = cpu_to_le32(mask);
req->r_parent = dir;
ihold(dir);
+ if (IS_ENCRYPTED(dir)) {
+ if (!fscrypt_has_encryption_key(dir)) {
+ spin_lock(&dentry->d_lock);
+ dentry->d_flags |= DCACHE_NOKEY_NAME;
+ spin_unlock(&dentry->d_lock);
+ }
+ }

if (flags & O_CREAT) {
struct ceph_file_layout lo;
--
2.35.1

2022-04-01 10:37:34

by Jeffrey Layton

[permalink] [raw]
Subject: [PATCH v12 11/54] ceph: add ability to set fscrypt_auth via setattr

Allow the client to use a setattr request for setting the fscrypt_auth
field. Since this is not a standard setattr request from the VFS, we add
a new field to __ceph_setattr that carries ceph-specific inode attrs.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/ceph/acl.c | 4 +--
fs/ceph/crypto.h | 9 +++++-
fs/ceph/inode.c | 42 ++++++++++++++++++++++++++--
fs/ceph/mds_client.c | 54 ++++++++++++++++++++++++++++++------
fs/ceph/mds_client.h | 3 ++
fs/ceph/super.h | 7 ++++-
include/linux/ceph/ceph_fs.h | 21 ++++++++------
7 files changed, 117 insertions(+), 23 deletions(-)

diff --git a/fs/ceph/acl.c b/fs/ceph/acl.c
index f4fc8e0b847c..427724c36316 100644
--- a/fs/ceph/acl.c
+++ b/fs/ceph/acl.c
@@ -139,7 +139,7 @@ int ceph_set_acl(struct user_namespace *mnt_userns, struct inode *inode,
newattrs.ia_ctime = current_time(inode);
newattrs.ia_mode = new_mode;
newattrs.ia_valid = ATTR_MODE | ATTR_CTIME;
- ret = __ceph_setattr(inode, &newattrs);
+ ret = __ceph_setattr(inode, &newattrs, NULL);
if (ret)
goto out_free;
}
@@ -150,7 +150,7 @@ int ceph_set_acl(struct user_namespace *mnt_userns, struct inode *inode,
newattrs.ia_ctime = old_ctime;
newattrs.ia_mode = old_mode;
newattrs.ia_valid = ATTR_MODE | ATTR_CTIME;
- __ceph_setattr(inode, &newattrs);
+ __ceph_setattr(inode, &newattrs, NULL);
}
goto out_free;
}
diff --git a/fs/ceph/crypto.h b/fs/ceph/crypto.h
index 6c3831c57c8d..6dca674f79b8 100644
--- a/fs/ceph/crypto.h
+++ b/fs/ceph/crypto.h
@@ -14,8 +14,15 @@ struct ceph_fscrypt_auth {
u8 cfa_blob[FSCRYPT_SET_CONTEXT_MAX_SIZE];
} __packed;

-#ifdef CONFIG_FS_ENCRYPTION
#define CEPH_FSCRYPT_AUTH_VERSION 1
+static inline u32 ceph_fscrypt_auth_len(struct ceph_fscrypt_auth *fa)
+{
+ u32 ctxsize = le32_to_cpu(fa->cfa_blob_len);
+
+ return offsetof(struct ceph_fscrypt_auth, cfa_blob) + ctxsize;
+}
+
+#ifdef CONFIG_FS_ENCRYPTION
void ceph_fscrypt_set_ops(struct super_block *sb);

#else /* CONFIG_FS_ENCRYPTION */
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index bceb46568da3..81904cde2609 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -2099,7 +2099,7 @@ static const struct inode_operations ceph_symlink_iops = {
.listxattr = ceph_listxattr,
};

-int __ceph_setattr(struct inode *inode, struct iattr *attr)
+int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *cia)
{
struct ceph_inode_info *ci = ceph_inode(inode);
unsigned int ia_valid = attr->ia_valid;
@@ -2139,6 +2139,43 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr)
}

dout("setattr %p issued %s\n", inode, ceph_cap_string(issued));
+#if IS_ENABLED(CONFIG_FS_ENCRYPTION)
+ if (cia && cia->fscrypt_auth) {
+ u32 len = ceph_fscrypt_auth_len(cia->fscrypt_auth);
+
+ if (len > sizeof(*cia->fscrypt_auth)) {
+ err = -EINVAL;
+ spin_unlock(&ci->i_ceph_lock);
+ goto out;
+ }
+
+ dout("setattr %llx:%llx fscrypt_auth len %u to %u)\n",
+ ceph_vinop(inode), ci->fscrypt_auth_len, len);
+
+ /* It should never be re-set once set */
+ WARN_ON_ONCE(ci->fscrypt_auth);
+
+ if (issued & CEPH_CAP_AUTH_EXCL) {
+ dirtied |= CEPH_CAP_AUTH_EXCL;
+ kfree(ci->fscrypt_auth);
+ ci->fscrypt_auth = (u8 *)cia->fscrypt_auth;
+ ci->fscrypt_auth_len = len;
+ } else if ((issued & CEPH_CAP_AUTH_SHARED) == 0 ||
+ ci->fscrypt_auth_len != len ||
+ memcmp(ci->fscrypt_auth, cia->fscrypt_auth, len)) {
+ req->r_fscrypt_auth = cia->fscrypt_auth;
+ mask |= CEPH_SETATTR_FSCRYPT_AUTH;
+ release |= CEPH_CAP_AUTH_SHARED;
+ }
+ cia->fscrypt_auth = NULL;
+ }
+#else
+ if (cia && cia->fscrypt_auth) {
+ err = -EINVAL;
+ spin_unlock(&ci->i_ceph_lock);
+ goto out;
+ }
+#endif /* CONFIG_FS_ENCRYPTION */

if (ia_valid & ATTR_UID) {
dout("setattr %p uid %d -> %d\n", inode,
@@ -2301,6 +2338,7 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr)
req->r_stamp = attr->ia_ctime;
err = ceph_mdsc_do_request(mdsc, NULL, req);
}
+out:
dout("setattr %p result=%d (%s locally, %d remote)\n", inode, err,
ceph_cap_string(dirtied), mask);

@@ -2341,7 +2379,7 @@ int ceph_setattr(struct user_namespace *mnt_userns, struct dentry *dentry,
ceph_quota_is_max_bytes_exceeded(inode, attr->ia_size))
return -EDQUOT;

- err = __ceph_setattr(inode, attr);
+ err = __ceph_setattr(inode, attr, NULL);

if (err >= 0 && (attr->ia_valid & ATTR_MODE))
err = posix_acl_chmod(&init_user_ns, inode, attr->ia_mode);
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index cd7ad033492b..dcb800675dec 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -15,6 +15,7 @@

#include "super.h"
#include "mds_client.h"
+#include "crypto.h"

#include <linux/ceph/ceph_features.h>
#include <linux/ceph/messenger.h>
@@ -946,6 +947,7 @@ void ceph_mdsc_release_request(struct kref *kref)
put_cred(req->r_cred);
if (req->r_pagelist)
ceph_pagelist_release(req->r_pagelist);
+ kfree(req->r_fscrypt_auth);
put_request_session(req);
ceph_unreserve_caps(req->r_mdsc, &req->r_caps_reservation);
WARN_ON_ONCE(!list_empty(&req->r_wait));
@@ -2526,8 +2528,7 @@ static int set_request_path_attr(struct inode *rinode, struct dentry *rdentry,
return r;
}

-static void encode_timestamp_and_gids(void **p,
- const struct ceph_mds_request *req)
+static void encode_mclientrequest_tail(void **p, const struct ceph_mds_request *req)
{
struct ceph_timespec ts;
int i;
@@ -2540,6 +2541,20 @@ static void encode_timestamp_and_gids(void **p,
for (i = 0; i < req->r_cred->group_info->ngroups; i++)
ceph_encode_64(p, from_kgid(&init_user_ns,
req->r_cred->group_info->gid[i]));
+
+ /* v5: altname (TODO: skip for now) */
+ ceph_encode_32(p, 0);
+
+ /* v6: fscrypt_auth and fscrypt_file */
+ if (req->r_fscrypt_auth) {
+ u32 authlen = ceph_fscrypt_auth_len(req->r_fscrypt_auth);
+
+ ceph_encode_32(p, authlen);
+ ceph_encode_copy(p, req->r_fscrypt_auth, authlen);
+ } else {
+ ceph_encode_32(p, 0);
+ }
+ ceph_encode_32(p, 0); // fscrypt_file for now
}

/*
@@ -2584,12 +2599,14 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
goto out_free1;
}

+ /* head */
len = legacy ? sizeof(*head) : sizeof(struct ceph_mds_request_head);
- len += pathlen1 + pathlen2 + 2*(1 + sizeof(u32) + sizeof(u64)) +
- sizeof(struct ceph_timespec);
- len += sizeof(u32) + (sizeof(u64) * req->r_cred->group_info->ngroups);

- /* calculate (max) length for cap releases */
+ /* filepaths */
+ len += 2 * (1 + sizeof(u32) + sizeof(u64));
+ len += pathlen1 + pathlen2;
+
+ /* cap releases */
len += sizeof(struct ceph_mds_request_release) *
(!!req->r_inode_drop + !!req->r_dentry_drop +
!!req->r_old_inode_drop + !!req->r_old_dentry_drop);
@@ -2599,6 +2616,25 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
if (req->r_old_dentry_drop)
len += pathlen2;

+ /* MClientRequest tail */
+
+ /* req->r_stamp */
+ len += sizeof(struct ceph_timespec);
+
+ /* gid list */
+ len += sizeof(u32) + (sizeof(u64) * req->r_cred->group_info->ngroups);
+
+ /* alternate name */
+ len += sizeof(u32); // TODO
+
+ /* fscrypt_auth */
+ len += sizeof(u32); // fscrypt_auth
+ if (req->r_fscrypt_auth)
+ len += ceph_fscrypt_auth_len(req->r_fscrypt_auth);
+
+ /* fscrypt_file */
+ len += sizeof(u32);
+
msg = ceph_msg_new2(CEPH_MSG_CLIENT_REQUEST, len, 1, GFP_NOFS, false);
if (!msg) {
msg = ERR_PTR(-ENOMEM);
@@ -2618,7 +2654,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
} else {
struct ceph_mds_request_head *new_head = msg->front.iov_base;

- msg->hdr.version = cpu_to_le16(4);
+ msg->hdr.version = cpu_to_le16(6);
new_head->version = cpu_to_le16(CEPH_MDS_REQUEST_HEAD_VERSION);
head = (struct ceph_mds_request_head_old *)&new_head->oldest_client_tid;
p = msg->front.iov_base + sizeof(*new_head);
@@ -2669,7 +2705,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,

head->num_releases = cpu_to_le16(releases);

- encode_timestamp_and_gids(&p, req);
+ encode_mclientrequest_tail(&p, req);

if (WARN_ON_ONCE(p > end)) {
ceph_msg_put(msg);
@@ -2799,7 +2835,7 @@ static int __prepare_send_request(struct ceph_mds_session *session,
rhead->num_releases = 0;

p = msg->front.iov_base + req->r_request_release_offset;
- encode_timestamp_and_gids(&p, req);
+ encode_mclientrequest_tail(&p, req);

msg->front.iov_len = p - msg->front.iov_base;
msg->hdr.front_len = cpu_to_le32(msg->front.iov_len);
diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
index 96d726ee5250..aab3ab284fce 100644
--- a/fs/ceph/mds_client.h
+++ b/fs/ceph/mds_client.h
@@ -284,6 +284,9 @@ struct ceph_mds_request {
struct mutex r_fill_mutex;

union ceph_mds_request_args r_args;
+
+ struct ceph_fscrypt_auth *r_fscrypt_auth;
+
int r_fmode; /* file mode, if expecting cap */
int r_request_release_offset;
const struct cred *r_cred;
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index e12e5b484564..b05ae8899a1a 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -1052,7 +1052,12 @@ static inline int ceph_do_getattr(struct inode *inode, int mask, bool force)
}
extern int ceph_permission(struct user_namespace *mnt_userns,
struct inode *inode, int mask);
-extern int __ceph_setattr(struct inode *inode, struct iattr *attr);
+
+struct ceph_iattr {
+ struct ceph_fscrypt_auth *fscrypt_auth;
+};
+
+extern int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *cia);
extern int ceph_setattr(struct user_namespace *mnt_userns,
struct dentry *dentry, struct iattr *attr);
extern int ceph_getattr(struct user_namespace *mnt_userns,
diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h
index 86bf82dbd8b8..2810b214fa3f 100644
--- a/include/linux/ceph/ceph_fs.h
+++ b/include/linux/ceph/ceph_fs.h
@@ -359,14 +359,19 @@ enum {

extern const char *ceph_mds_op_name(int op);

-
-#define CEPH_SETATTR_MODE 1
-#define CEPH_SETATTR_UID 2
-#define CEPH_SETATTR_GID 4
-#define CEPH_SETATTR_MTIME 8
-#define CEPH_SETATTR_ATIME 16
-#define CEPH_SETATTR_SIZE 32
-#define CEPH_SETATTR_CTIME 64
+#define CEPH_SETATTR_MODE (1 << 0)
+#define CEPH_SETATTR_UID (1 << 1)
+#define CEPH_SETATTR_GID (1 << 2)
+#define CEPH_SETATTR_MTIME (1 << 3)
+#define CEPH_SETATTR_ATIME (1 << 4)
+#define CEPH_SETATTR_SIZE (1 << 5)
+#define CEPH_SETATTR_CTIME (1 << 6)
+#define CEPH_SETATTR_MTIME_NOW (1 << 7)
+#define CEPH_SETATTR_ATIME_NOW (1 << 8)
+#define CEPH_SETATTR_BTIME (1 << 9)
+#define CEPH_SETATTR_KILL_SGUID (1 << 10)
+#define CEPH_SETATTR_FSCRYPT_AUTH (1 << 11)
+#define CEPH_SETATTR_FSCRYPT_FILE (1 << 12)

/*
* Ceph setxattr request flags.
--
2.35.1

2022-04-01 10:44:28

by Jeffrey Layton

[permalink] [raw]
Subject: [PATCH v12 42/54] ceph: add infrastructure for file encryption and decryption

...and allow test_dummy_encryption to bypass content encryption
if mounted with test_dummy_encryption=clear.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/ceph/crypto.c | 177 +++++++++++++++++++++++++++++++++++++++++++++++
fs/ceph/crypto.h | 71 +++++++++++++++++++
fs/ceph/super.c | 8 +++
fs/ceph/super.h | 1 +
4 files changed, 257 insertions(+)

diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c
index c86cc4a7eaf6..8dfe1c30f93c 100644
--- a/fs/ceph/crypto.c
+++ b/fs/ceph/crypto.c
@@ -2,6 +2,7 @@
#include <linux/ceph/ceph_debug.h>
#include <linux/xattr.h>
#include <linux/fscrypt.h>
+#include <linux/ceph/striper.h>

#include "super.h"
#include "mds_client.h"
@@ -270,3 +271,179 @@ int ceph_fname_to_usr(const struct ceph_fname *fname, struct fscrypt_str *tname,
fscrypt_fname_free_buffer(&_tname);
return ret;
}
+
+int ceph_fscrypt_decrypt_block_inplace(const struct inode *inode,
+ struct page *page, unsigned int len,
+ unsigned int offs, u64 lblk_num)
+{
+ struct ceph_mount_options *opt = ceph_inode_to_client(inode)->mount_options;
+
+ if (opt->flags & CEPH_MOUNT_OPT_DUMMY_ENC_CLEAR)
+ return 0;
+
+ dout("%s: len %u offs %u blk %llu\n", __func__, len, offs, lblk_num);
+ return fscrypt_decrypt_block_inplace(inode, page, len, offs, lblk_num);
+}
+
+int ceph_fscrypt_encrypt_block_inplace(const struct inode *inode,
+ struct page *page, unsigned int len,
+ unsigned int offs, u64 lblk_num, gfp_t gfp_flags)
+{
+ struct ceph_mount_options *opt = ceph_inode_to_client(inode)->mount_options;
+
+ if (opt->flags & CEPH_MOUNT_OPT_DUMMY_ENC_CLEAR)
+ return 0;
+
+ dout("%s: len %u offs %u blk %llu\n", __func__, len, offs, lblk_num);
+ return fscrypt_encrypt_block_inplace(inode, page, len, offs, lblk_num, gfp_flags);
+}
+
+/**
+ * ceph_fscrypt_decrypt_pages - decrypt an array of pages
+ * @inode: pointer to inode associated with these pages
+ * @page: pointer to page array
+ * @off: offset into the file that the read data starts
+ * @len: max length to decrypt
+ *
+ * Decrypt an array of fscrypt'ed pages and return the amount of
+ * data decrypted. Any data in the page prior to the start of the
+ * first complete block in the read is ignored. Any incomplete
+ * crypto blocks at the end of the array are ignored (and should
+ * probably be zeroed by the caller).
+ *
+ * Returns the length of the decrypted data or a negative errno.
+ */
+int ceph_fscrypt_decrypt_pages(struct inode *inode, struct page **page, u64 off, int len)
+{
+ int i, num_blocks;
+ u64 baseblk = off >> CEPH_FSCRYPT_BLOCK_SHIFT;
+ int ret = 0;
+
+ /*
+ * We can't deal with partial blocks on an encrypted file, so mask off
+ * the last bit.
+ */
+ num_blocks = ceph_fscrypt_blocks(off, len & CEPH_FSCRYPT_BLOCK_MASK);
+
+ /* Decrypt each block */
+ for (i = 0; i < num_blocks; ++i) {
+ int blkoff = i << CEPH_FSCRYPT_BLOCK_SHIFT;
+ int pgidx = blkoff >> PAGE_SHIFT;
+ unsigned int pgoffs = offset_in_page(blkoff);
+ int fret;
+
+ fret = ceph_fscrypt_decrypt_block_inplace(inode, page[pgidx],
+ CEPH_FSCRYPT_BLOCK_SIZE, pgoffs,
+ baseblk + i);
+ if (fret < 0) {
+ if (ret == 0)
+ ret = fret;
+ break;
+ }
+ ret += CEPH_FSCRYPT_BLOCK_SIZE;
+ }
+ return ret;
+}
+
+/**
+ * ceph_fscrypt_decrypt_extents: decrypt received extents in given buffer
+ * @inode: inode associated with pages being decrypted
+ * @page: pointer to page array
+ * @off: offset into the file that the data in page[0] starts
+ * @map: pointer to extent array
+ * @ext_cnt: length of extent array
+ *
+ * Given an extent map and a page array, decrypt the received data in-place,
+ * skipping holes. Returns the offset into buffer of end of last decrypted
+ * block.
+ */
+int ceph_fscrypt_decrypt_extents(struct inode *inode, struct page **page, u64 off,
+ struct ceph_sparse_extent *map, u32 ext_cnt)
+{
+ int i, ret = 0;
+ struct ceph_inode_info *ci = ceph_inode(inode);
+ u64 objno, objoff;
+ u32 xlen;
+
+ /* Nothing to do for empty array */
+ if (ext_cnt == 0) {
+ dout("%s: empty array, ret 0\n", __func__);
+ return 0;
+ }
+
+ ceph_calc_file_object_mapping(&ci->i_layout, off, map[0].len,
+ &objno, &objoff, &xlen);
+
+ for (i = 0; i < ext_cnt; ++i) {
+ struct ceph_sparse_extent *ext = &map[i];
+ int pgsoff = ext->off - objoff;
+ int pgidx = pgsoff >> PAGE_SHIFT;
+ int fret;
+
+ if ((ext->off | ext->len) & ~CEPH_FSCRYPT_BLOCK_MASK) {
+ pr_warn("%s: bad encrypted sparse extent idx %d off %llx len %llx\n",
+ __func__, i, ext->off, ext->len);
+ return -EIO;
+ }
+ fret = ceph_fscrypt_decrypt_pages(inode, &page[pgidx],
+ off + pgsoff, ext->len);
+ dout("%s: [%d] 0x%llx~0x%llx fret %d\n", __func__, i,
+ ext->off, ext->len, fret);
+ if (fret < 0) {
+ if (ret == 0)
+ ret = fret;
+ break;
+ }
+ ret = pgsoff + fret;
+ }
+ dout("%s: ret %d\n", __func__, ret);
+ return ret;
+}
+
+/**
+ * ceph_fscrypt_encrypt_pages - encrypt an array of pages
+ * @inode: pointer to inode associated with these pages
+ * @page: pointer to page array
+ * @off: offset into the file that the data starts
+ * @len: max length to encrypt
+ * @gfp: gfp flags to use for allocation
+ *
+ * Decrypt an array of cleartext pages and return the amount of
+ * data encrypted. Any data in the page prior to the start of the
+ * first complete block in the read is ignored. Any incomplete
+ * crypto blocks at the end of the array are ignored.
+ *
+ * Returns the length of the encrypted data or a negative errno.
+ */
+int ceph_fscrypt_encrypt_pages(struct inode *inode, struct page **page, u64 off,
+ int len, gfp_t gfp)
+{
+ int i, num_blocks;
+ u64 baseblk = off >> CEPH_FSCRYPT_BLOCK_SHIFT;
+ int ret = 0;
+
+ /*
+ * We can't deal with partial blocks on an encrypted file, so mask off
+ * the last bit.
+ */
+ num_blocks = ceph_fscrypt_blocks(off, len & CEPH_FSCRYPT_BLOCK_MASK);
+
+ /* Encrypt each block */
+ for (i = 0; i < num_blocks; ++i) {
+ int blkoff = i << CEPH_FSCRYPT_BLOCK_SHIFT;
+ int pgidx = blkoff >> PAGE_SHIFT;
+ unsigned int pgoffs = offset_in_page(blkoff);
+ int fret;
+
+ fret = ceph_fscrypt_encrypt_block_inplace(inode, page[pgidx],
+ CEPH_FSCRYPT_BLOCK_SIZE, pgoffs,
+ baseblk + i, gfp);
+ if (fret < 0) {
+ if (ret == 0)
+ ret = fret;
+ break;
+ }
+ ret += CEPH_FSCRYPT_BLOCK_SIZE;
+ }
+ return ret;
+}
diff --git a/fs/ceph/crypto.h b/fs/ceph/crypto.h
index 56a61ba64edc..fdd73c50487f 100644
--- a/fs/ceph/crypto.h
+++ b/fs/ceph/crypto.h
@@ -91,6 +91,40 @@ static inline void ceph_fname_free_buffer(struct inode *parent, struct fscrypt_s
int ceph_fname_to_usr(const struct ceph_fname *fname, struct fscrypt_str *tname,
struct fscrypt_str *oname, bool *is_nokey);

+static inline unsigned int ceph_fscrypt_blocks(u64 off, u64 len)
+{
+ /* crypto blocks cannot span more than one page */
+ BUILD_BUG_ON(CEPH_FSCRYPT_BLOCK_SHIFT > PAGE_SHIFT);
+
+ return ((off+len+CEPH_FSCRYPT_BLOCK_SIZE-1) >> CEPH_FSCRYPT_BLOCK_SHIFT) -
+ (off >> CEPH_FSCRYPT_BLOCK_SHIFT);
+}
+
+/*
+ * If we have an encrypted inode then we must adjust the offset and
+ * range of the on-the-wire read to cover an entire encryption block.
+ * The copy will be done using the original offset and length, after
+ * we've decrypted the result.
+ */
+static inline void ceph_fscrypt_adjust_off_and_len(struct inode *inode, u64 *off, u64 *len)
+{
+ if (IS_ENCRYPTED(inode)) {
+ *len = ceph_fscrypt_blocks(*off, *len) * CEPH_FSCRYPT_BLOCK_SIZE;
+ *off &= CEPH_FSCRYPT_BLOCK_MASK;
+ }
+}
+
+int ceph_fscrypt_decrypt_block_inplace(const struct inode *inode,
+ struct page *page, unsigned int len,
+ unsigned int offs, u64 lblk_num);
+int ceph_fscrypt_encrypt_block_inplace(const struct inode *inode,
+ struct page *page, unsigned int len,
+ unsigned int offs, u64 lblk_num, gfp_t gfp_flags);
+int ceph_fscrypt_decrypt_pages(struct inode *inode, struct page **page, u64 off, int len);
+int ceph_fscrypt_decrypt_extents(struct inode *inode, struct page **page, u64 off,
+ struct ceph_sparse_extent *map, u32 ext_cnt);
+int ceph_fscrypt_encrypt_pages(struct inode *inode, struct page **page, u64 off,
+ int len, gfp_t gfp);
#else /* CONFIG_FS_ENCRYPTION */

static inline void ceph_fscrypt_set_ops(struct super_block *sb)
@@ -143,6 +177,43 @@ static inline int ceph_fname_to_usr(const struct ceph_fname *fname, struct fscry
oname->len = fname->name_len;
return 0;
}
+
+static inline void ceph_fscrypt_adjust_off_and_len(struct inode *inode, u64 *off, u64 *len)
+{
+}
+
+static inline int ceph_fscrypt_decrypt_block_inplace(const struct inode *inode,
+ struct page *page, unsigned int len,
+ unsigned int offs, u64 lblk_num)
+{
+ return 0;
+}
+
+static inline int ceph_fscrypt_encrypt_block_inplace(const struct inode *inode,
+ struct page *page, unsigned int len,
+ unsigned int offs, u64 lblk_num, gfp_t gfp_flags)
+{
+ return 0;
+}
+
+static inline int ceph_fscrypt_decrypt_pages(struct inode *inode, struct page **page,
+ u64 off, int len)
+{
+ return 0;
+}
+
+static inline int ceph_fscrypt_decrypt_extents(struct inode *inode, struct page **page,
+ u64 off, struct ceph_sparse_extent *map,
+ u32 ext_cnt)
+{
+ return 0;
+}
+
+static inline int ceph_fscrypt_encrypt_pages(struct inode *inode, struct page **page,
+ u64 off, int len, gfp_t gfp)
+{
+ return 0;
+}
#endif /* CONFIG_FS_ENCRYPTION */

#endif
diff --git a/fs/ceph/super.c b/fs/ceph/super.c
index 6a2044d61420..77c7906090bc 100644
--- a/fs/ceph/super.c
+++ b/fs/ceph/super.c
@@ -1092,6 +1092,14 @@ static int ceph_set_test_dummy_encryption(struct super_block *sb, struct fs_cont
return -EEXIST;
}

+ /* HACK: allow for cleartext "encryption" in files for testing */
+ if (fsc->mount_options->test_dummy_encryption &&
+ !strcmp(fsc->mount_options->test_dummy_encryption, "clear")) {
+ fsopt->flags |= CEPH_MOUNT_OPT_DUMMY_ENC_CLEAR;
+ kfree(fsc->mount_options->test_dummy_encryption);
+ fsc->mount_options->test_dummy_encryption = NULL;
+ }
+
err = fscrypt_set_test_dummy_encryption(sb,
fsc->mount_options->test_dummy_encryption,
&fsc->dummy_enc_policy);
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index e3d63b727e52..af59066071a6 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -48,6 +48,7 @@
#define CEPH_MOUNT_OPT_NOPAGECACHE (1<<16) /* bypass pagecache altogether */
#define CEPH_MOUNT_OPT_SPARSEREAD (1<<17) /* always do sparse reads */
#define CEPH_MOUNT_OPT_TEST_DUMMY_ENC (1<<18) /* enable dummy encryption (for testing) */
+#define CEPH_MOUNT_OPT_DUMMY_ENC_CLEAR (1<<19) /* don't actually encrypt content */

#define CEPH_MOUNT_OPT_DEFAULT \
(CEPH_MOUNT_OPT_DCACHE | \
--
2.35.1

2022-04-01 10:48:50

by Jeffrey Layton

[permalink] [raw]
Subject: [PATCH v12 31/54] ceph: add a new ceph.fscrypt.auth vxattr

Give the client a way to get at the xattr from userland, mostly for
future debugging purposes.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/ceph/xattr.c | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)

diff --git a/fs/ceph/xattr.c b/fs/ceph/xattr.c
index 58628cef4207..e080116608b2 100644
--- a/fs/ceph/xattr.c
+++ b/fs/ceph/xattr.c
@@ -352,6 +352,23 @@ static ssize_t ceph_vxattrcb_auth_mds(struct ceph_inode_info *ci,
return ret;
}

+#if IS_ENABLED(CONFIG_FS_ENCRYPTION)
+static bool ceph_vxattrcb_fscrypt_auth_exists(struct ceph_inode_info *ci)
+{
+ return ci->fscrypt_auth_len;
+}
+
+static ssize_t ceph_vxattrcb_fscrypt_auth(struct ceph_inode_info *ci, char *val, size_t size)
+{
+ if (size) {
+ if (size < ci->fscrypt_auth_len)
+ return -ERANGE;
+ memcpy(val, ci->fscrypt_auth, ci->fscrypt_auth_len);
+ }
+ return ci->fscrypt_auth_len;
+}
+#endif /* CONFIG_FS_ENCRYPTION */
+
#define CEPH_XATTR_NAME(_type, _name) XATTR_CEPH_PREFIX #_type "." #_name
#define CEPH_XATTR_NAME2(_type, _name, _name2) \
XATTR_CEPH_PREFIX #_type "." #_name "." #_name2
@@ -500,6 +517,15 @@ static struct ceph_vxattr ceph_common_vxattrs[] = {
.exists_cb = NULL,
.flags = VXATTR_FLAG_READONLY,
},
+#if IS_ENABLED(CONFIG_FS_ENCRYPTION)
+ {
+ .name = "ceph.fscrypt.auth",
+ .name_size = sizeof("ceph.fscrypt.auth"),
+ .getxattr_cb = ceph_vxattrcb_fscrypt_auth,
+ .exists_cb = ceph_vxattrcb_fscrypt_auth_exists,
+ .flags = VXATTR_FLAG_READONLY,
+ },
+#endif /* CONFIG_FS_ENCRYPTION */
{ .name = NULL, 0 } /* Required table terminator */
};

--
2.35.1

2022-04-01 10:58:44

by Jeffrey Layton

[permalink] [raw]
Subject: [PATCH v12 34/54] libceph: add CEPH_OSD_OP_ASSERT_VER support

...and record the user_version in the reply in a new field in
ceph_osd_request, so we can populate the assert_ver appropriately.
Shuffle the fields a bit too so that the new field fits in an
existing hole on x86_64.

Signed-off-by: Jeff Layton <[email protected]>
---
include/linux/ceph/osd_client.h | 6 +++++-
include/linux/ceph/rados.h | 4 ++++
net/ceph/osd_client.c | 5 +++++
3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/include/linux/ceph/osd_client.h b/include/linux/ceph/osd_client.h
index cf663423d51f..df092b678d58 100644
--- a/include/linux/ceph/osd_client.h
+++ b/include/linux/ceph/osd_client.h
@@ -196,6 +196,9 @@ struct ceph_osd_req_op {
u32 src_fadvise_flags;
struct ceph_osd_data osd_data;
} copy_from;
+ struct {
+ u64 ver;
+ } assert_ver;
};
};

@@ -250,6 +253,7 @@ struct ceph_osd_request {
struct ceph_osd_client *r_osdc;
struct kref r_kref;
bool r_mempool;
+ bool r_linger; /* don't resend on failure */
struct completion r_completion; /* private to osd_client.c */
ceph_osdc_callback_t r_callback;

@@ -262,9 +266,9 @@ struct ceph_osd_request {
struct ceph_snap_context *r_snapc; /* for writes */
struct timespec64 r_mtime; /* ditto */
u64 r_data_offset; /* ditto */
- bool r_linger; /* don't resend on failure */

/* internal */
+ u64 r_version; /* data version sent in reply */
unsigned long r_stamp; /* jiffies, send or check time */
unsigned long r_start_stamp; /* jiffies */
ktime_t r_start_latency; /* ktime_t */
diff --git a/include/linux/ceph/rados.h b/include/linux/ceph/rados.h
index 43a7a1573b51..73c3efbec36c 100644
--- a/include/linux/ceph/rados.h
+++ b/include/linux/ceph/rados.h
@@ -523,6 +523,10 @@ struct ceph_osd_op {
struct {
__le64 cookie;
} __attribute__ ((packed)) notify;
+ struct {
+ __le64 unused;
+ __le64 ver;
+ } __attribute__ ((packed)) assert_ver;
struct {
__le64 offset, length;
__le64 src_offset;
diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index 740f243264d0..47d18e9821d4 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -1042,6 +1042,10 @@ static u32 osd_req_encode_op(struct ceph_osd_op *dst,
dst->copy_from.src_fadvise_flags =
cpu_to_le32(src->copy_from.src_fadvise_flags);
break;
+ case CEPH_OSD_OP_ASSERT_VER:
+ dst->assert_ver.unused = cpu_to_le64(0);
+ dst->assert_ver.ver = cpu_to_le64(src->assert_ver.ver);
+ break;
default:
pr_err("unsupported osd opcode %s\n",
ceph_osd_op_name(src->op));
@@ -3804,6 +3808,7 @@ static void handle_reply(struct ceph_osd *osd, struct ceph_msg *msg)
* one (type of) reply back.
*/
WARN_ON(!(m.flags & CEPH_OSD_FLAG_ONDISK));
+ req->r_version = m.user_version;
req->r_result = m.result ?: data_len;
finish_request(req);
mutex_unlock(&osd->lock);
--
2.35.1

2022-04-01 11:34:50

by Jeffrey Layton

[permalink] [raw]
Subject: [PATCH v12 01/54] vfs: export new_inode_pseudo

Ceph needs to be able to allocate inodes ahead of a create that might
involve a fscrypt-encrypted inode. new_inode() almost fits the bill,
but it puts the inode on the sb->s_inodes list and when we go to hash
it, that might be done again.

We could work around that by setting I_CREATING on the new inode, but
that causes ilookup5 to return -ESTALE if something tries to find it
before I_NEW is cleared. This is desirable behavior for most
filesystems, but doesn't work for ceph.

To work around all of this, just use new_inode_pseudo which doesn't add
it to the sb->s_inodes list.

Cc: Al Viro <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
---
fs/inode.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/fs/inode.c b/fs/inode.c
index 63324df6fa27..9ddf7d1a7359 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1025,6 +1025,7 @@ struct inode *new_inode_pseudo(struct super_block *sb)
}
return inode;
}
+EXPORT_SYMBOL(new_inode_pseudo);

/**
* new_inode - obtain an inode
--
2.35.1

2022-04-01 13:45:05

by Jeffrey Layton

[permalink] [raw]
Subject: [PATCH v12 05/54] ceph: preallocate inode for ops that may create one

When creating a new inode, we need to determine the crypto context
before we can transmit the RPC. The fscrypt API has a routine for getting
a crypto context before a create occurs, but it requires an inode.

Change the ceph code to preallocate an inode in advance of a create of
any sort (open(), mknod(), symlink(), etc). Move the existing code that
generates the ACL and SELinux blobs into this routine since that's
mostly common across all the different codepaths.

In most cases, we just want to allow ceph_fill_trace to use that inode
after the reply comes in, so add a new field to the MDS request for it
(r_new_inode).

The async create codepath is a bit different though. In that case, we
want to hash the inode in advance of the RPC so that it can be used
before the reply comes in. If the call subsequently fails with
-EJUKEBOX, then just put the references and clean up the as_ctx. Note
that with this change, we now need to regenerate the as_ctx when this
occurs, but it's quite rare for it to happen.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/ceph/dir.c | 70 ++++++++++++++++++++-----------------
fs/ceph/file.c | 62 ++++++++++++++++++++-------------
fs/ceph/inode.c | 82 ++++++++++++++++++++++++++++++++++++++++----
fs/ceph/mds_client.c | 13 +++++--
fs/ceph/mds_client.h | 1 +
fs/ceph/super.h | 7 +++-
6 files changed, 169 insertions(+), 66 deletions(-)

diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
index eae417d71136..8cc7a49ee508 100644
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -861,13 +861,6 @@ static int ceph_mknod(struct user_namespace *mnt_userns, struct inode *dir,
goto out;
}

- err = ceph_pre_init_acls(dir, &mode, &as_ctx);
- if (err < 0)
- goto out;
- err = ceph_security_init_secctx(dentry, mode, &as_ctx);
- if (err < 0)
- goto out;
-
dout("mknod in dir %p dentry %p mode 0%ho rdev %d\n",
dir, dentry, mode, rdev);
req = ceph_mdsc_create_request(mdsc, CEPH_MDS_OP_MKNOD, USE_AUTH_MDS);
@@ -875,6 +868,14 @@ static int ceph_mknod(struct user_namespace *mnt_userns, struct inode *dir,
err = PTR_ERR(req);
goto out;
}
+
+ req->r_new_inode = ceph_new_inode(dir, dentry, &mode, &as_ctx);
+ if (IS_ERR(req->r_new_inode)) {
+ err = PTR_ERR(req->r_new_inode);
+ req->r_new_inode = NULL;
+ goto out_req;
+ }
+
req->r_dentry = dget(dentry);
req->r_num_caps = 2;
req->r_parent = dir;
@@ -884,13 +885,13 @@ static int ceph_mknod(struct user_namespace *mnt_userns, struct inode *dir,
req->r_args.mknod.rdev = cpu_to_le32(rdev);
req->r_dentry_drop = CEPH_CAP_FILE_SHARED | CEPH_CAP_AUTH_EXCL;
req->r_dentry_unless = CEPH_CAP_FILE_EXCL;
- if (as_ctx.pagelist) {
- req->r_pagelist = as_ctx.pagelist;
- as_ctx.pagelist = NULL;
- }
+
+ ceph_as_ctx_to_req(req, &as_ctx);
+
err = ceph_mdsc_do_request(mdsc, dir, req);
if (!err && !req->r_reply_info.head->is_dentry)
err = ceph_handle_notrace_create(dir, dentry);
+out_req:
ceph_mdsc_put_request(req);
out:
if (!err)
@@ -913,6 +914,7 @@ static int ceph_symlink(struct user_namespace *mnt_userns, struct inode *dir,
struct ceph_mds_client *mdsc = ceph_sb_to_mdsc(dir->i_sb);
struct ceph_mds_request *req;
struct ceph_acl_sec_ctx as_ctx = {};
+ umode_t mode = S_IFLNK | 0777;
int err;

if (ceph_snap(dir) != CEPH_NOSNAP)
@@ -923,21 +925,24 @@ static int ceph_symlink(struct user_namespace *mnt_userns, struct inode *dir,
goto out;
}

- err = ceph_security_init_secctx(dentry, S_IFLNK | 0777, &as_ctx);
- if (err < 0)
- goto out;
-
dout("symlink in dir %p dentry %p to '%s'\n", dir, dentry, dest);
req = ceph_mdsc_create_request(mdsc, CEPH_MDS_OP_SYMLINK, USE_AUTH_MDS);
if (IS_ERR(req)) {
err = PTR_ERR(req);
goto out;
}
+
+ req->r_new_inode = ceph_new_inode(dir, dentry, &mode, &as_ctx);
+ if (IS_ERR(req->r_new_inode)) {
+ err = PTR_ERR(req->r_new_inode);
+ req->r_new_inode = NULL;
+ goto out_req;
+ }
+
req->r_path2 = kstrdup(dest, GFP_KERNEL);
if (!req->r_path2) {
err = -ENOMEM;
- ceph_mdsc_put_request(req);
- goto out;
+ goto out_req;
}
req->r_parent = dir;
ihold(dir);
@@ -947,13 +952,13 @@ static int ceph_symlink(struct user_namespace *mnt_userns, struct inode *dir,
req->r_num_caps = 2;
req->r_dentry_drop = CEPH_CAP_FILE_SHARED | CEPH_CAP_AUTH_EXCL;
req->r_dentry_unless = CEPH_CAP_FILE_EXCL;
- if (as_ctx.pagelist) {
- req->r_pagelist = as_ctx.pagelist;
- as_ctx.pagelist = NULL;
- }
+
+ ceph_as_ctx_to_req(req, &as_ctx);
+
err = ceph_mdsc_do_request(mdsc, dir, req);
if (!err && !req->r_reply_info.head->is_dentry)
err = ceph_handle_notrace_create(dir, dentry);
+out_req:
ceph_mdsc_put_request(req);
out:
if (err)
@@ -989,13 +994,6 @@ static int ceph_mkdir(struct user_namespace *mnt_userns, struct inode *dir,
goto out;
}

- mode |= S_IFDIR;
- err = ceph_pre_init_acls(dir, &mode, &as_ctx);
- if (err < 0)
- goto out;
- err = ceph_security_init_secctx(dentry, mode, &as_ctx);
- if (err < 0)
- goto out;

req = ceph_mdsc_create_request(mdsc, op, USE_AUTH_MDS);
if (IS_ERR(req)) {
@@ -1003,6 +1001,14 @@ static int ceph_mkdir(struct user_namespace *mnt_userns, struct inode *dir,
goto out;
}

+ mode |= S_IFDIR;
+ req->r_new_inode = ceph_new_inode(dir, dentry, &mode, &as_ctx);
+ if (IS_ERR(req->r_new_inode)) {
+ err = PTR_ERR(req->r_new_inode);
+ req->r_new_inode = NULL;
+ goto out_req;
+ }
+
req->r_dentry = dget(dentry);
req->r_num_caps = 2;
req->r_parent = dir;
@@ -1011,15 +1017,15 @@ static int ceph_mkdir(struct user_namespace *mnt_userns, struct inode *dir,
req->r_args.mkdir.mode = cpu_to_le32(mode);
req->r_dentry_drop = CEPH_CAP_FILE_SHARED | CEPH_CAP_AUTH_EXCL;
req->r_dentry_unless = CEPH_CAP_FILE_EXCL;
- if (as_ctx.pagelist) {
- req->r_pagelist = as_ctx.pagelist;
- as_ctx.pagelist = NULL;
- }
+
+ ceph_as_ctx_to_req(req, &as_ctx);
+
err = ceph_mdsc_do_request(mdsc, dir, req);
if (!err &&
!req->r_reply_info.head->is_target &&
!req->r_reply_info.head->is_dentry)
err = ceph_handle_notrace_create(dir, dentry);
+out_req:
ceph_mdsc_put_request(req);
out:
if (!err)
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 3444a3b748e8..cccf729b55a8 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -601,7 +601,8 @@ static void ceph_async_create_cb(struct ceph_mds_client *mdsc,
ceph_mdsc_release_dir_caps(req);
}

-static int ceph_finish_async_create(struct inode *dir, struct dentry *dentry,
+static int ceph_finish_async_create(struct inode *dir, struct inode *inode,
+ struct dentry *dentry,
struct file *file, umode_t mode,
struct ceph_mds_request *req,
struct ceph_acl_sec_ctx *as_ctx,
@@ -612,7 +613,6 @@ static int ceph_finish_async_create(struct inode *dir, struct dentry *dentry,
struct ceph_mds_reply_inode in = { };
struct ceph_mds_reply_info_in iinfo = { .in = &in };
struct ceph_inode_info *ci = ceph_inode(dir);
- struct inode *inode;
struct timespec64 now;
struct ceph_string *pool_ns;
struct ceph_mds_client *mdsc = ceph_sb_to_mdsc(dir->i_sb);
@@ -621,10 +621,6 @@ static int ceph_finish_async_create(struct inode *dir, struct dentry *dentry,

ktime_get_real_ts64(&now);

- inode = ceph_get_inode(dentry->d_sb, vino);
- if (IS_ERR(inode))
- return PTR_ERR(inode);
-
iinfo.inline_version = CEPH_INLINE_NONE;
iinfo.change_attr = 1;
ceph_encode_timespec64(&iinfo.btime, &now);
@@ -680,8 +676,7 @@ static int ceph_finish_async_create(struct inode *dir, struct dentry *dentry,
ceph_dir_clear_complete(dir);
if (!d_unhashed(dentry))
d_drop(dentry);
- if (inode->i_state & I_NEW)
- discard_new_inode(inode);
+ discard_new_inode(inode);
} else {
struct dentry *dn;

@@ -721,6 +716,7 @@ int ceph_atomic_open(struct inode *dir, struct dentry *dentry,
struct ceph_fs_client *fsc = ceph_sb_to_client(dir->i_sb);
struct ceph_mds_client *mdsc = fsc->mdsc;
struct ceph_mds_request *req;
+ struct inode *new_inode = NULL;
struct dentry *dn;
struct ceph_acl_sec_ctx as_ctx = {};
bool try_async = ceph_test_mount_opt(fsc, ASYNC_DIROPS);
@@ -733,21 +729,21 @@ int ceph_atomic_open(struct inode *dir, struct dentry *dentry,

if (dentry->d_name.len > NAME_MAX)
return -ENAMETOOLONG;
-
+retry:
if (flags & O_CREAT) {
if (ceph_quota_is_max_files_exceeded(dir))
return -EDQUOT;
- err = ceph_pre_init_acls(dir, &mode, &as_ctx);
- if (err < 0)
- return err;
- err = ceph_security_init_secctx(dentry, mode, &as_ctx);
- if (err < 0)
+
+ new_inode = ceph_new_inode(dir, dentry, &mode, &as_ctx);
+ if (IS_ERR(new_inode)) {
+ err = PTR_ERR(new_inode);
goto out_ctx;
+ }
} else if (!d_in_lookup(dentry)) {
/* If it's not being looked up, it's negative */
return -ENOENT;
}
-retry:
+
/* do the open */
req = prepare_open_request(dir->i_sb, flags, mode);
if (IS_ERR(req)) {
@@ -768,25 +764,40 @@ int ceph_atomic_open(struct inode *dir, struct dentry *dentry,

req->r_dentry_drop = CEPH_CAP_FILE_SHARED | CEPH_CAP_AUTH_EXCL;
req->r_dentry_unless = CEPH_CAP_FILE_EXCL;
- if (as_ctx.pagelist) {
- req->r_pagelist = as_ctx.pagelist;
- as_ctx.pagelist = NULL;
- }
- if (try_async &&
- (req->r_dir_caps =
- try_prep_async_create(dir, dentry, &lo,
- &req->r_deleg_ino))) {
+
+ ceph_as_ctx_to_req(req, &as_ctx);
+
+ if (try_async && (req->r_dir_caps =
+ try_prep_async_create(dir, dentry, &lo, &req->r_deleg_ino))) {
+ struct ceph_vino vino = { .ino = req->r_deleg_ino,
+ .snap = CEPH_NOSNAP };
+
set_bit(CEPH_MDS_R_ASYNC, &req->r_req_flags);
req->r_args.open.flags |= cpu_to_le32(CEPH_O_EXCL);
req->r_callback = ceph_async_create_cb;
+
+ /* Hash inode before RPC */
+ new_inode = ceph_get_inode(dir->i_sb, vino, new_inode);
+ if (IS_ERR(new_inode)) {
+ err = PTR_ERR(new_inode);
+ new_inode = NULL;
+ goto out_req;
+ }
+ WARN_ON_ONCE(!(new_inode->i_state & I_NEW));
+
err = ceph_mdsc_submit_request(mdsc, dir, req);
if (!err) {
- err = ceph_finish_async_create(dir, dentry,
+ err = ceph_finish_async_create(dir, new_inode, dentry,
file, mode, req,
&as_ctx, &lo);
+ new_inode = NULL;
} else if (err == -EJUKEBOX) {
restore_deleg_ino(dir, req->r_deleg_ino);
ceph_mdsc_put_request(req);
+ discard_new_inode(new_inode);
+ ceph_release_acl_sec_ctx(&as_ctx);
+ memset(&as_ctx, 0, sizeof(as_ctx));
+ new_inode = NULL;
try_async = false;
ceph_put_string(rcu_dereference_raw(lo.pool_ns));
goto retry;
@@ -797,6 +808,8 @@ int ceph_atomic_open(struct inode *dir, struct dentry *dentry,
}

set_bit(CEPH_MDS_R_PARENT_LOCKED, &req->r_req_flags);
+ req->r_new_inode = new_inode;
+ new_inode = NULL;
err = ceph_mdsc_do_request(mdsc,
(flags & (O_CREAT|O_TRUNC)) ? dir : NULL,
req);
@@ -839,6 +852,7 @@ int ceph_atomic_open(struct inode *dir, struct dentry *dentry,
}
out_req:
ceph_mdsc_put_request(req);
+ iput(new_inode);
out_ctx:
ceph_release_acl_sec_ctx(&as_ctx);
dout("atomic_open result=%d\n", err);
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 8cf55e6e609e..bff2f31c3a56 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -52,17 +52,85 @@ static int ceph_set_ino_cb(struct inode *inode, void *data)
return 0;
}

-struct inode *ceph_get_inode(struct super_block *sb, struct ceph_vino vino)
+/**
+ * ceph_new_inode - allocate a new inode in advance of an expected create
+ * @dir: parent directory for new inode
+ * @dentry: dentry that may eventually point to new inode
+ * @mode: mode of new inode
+ * @as_ctx: pointer to inherited security context
+ *
+ * Allocate a new inode in advance of an operation to create a new inode.
+ * This allocates the inode and sets up the acl_sec_ctx with appropriate
+ * info for the new inode.
+ *
+ * Returns a pointer to the new inode or an ERR_PTR.
+ */
+struct inode *ceph_new_inode(struct inode *dir, struct dentry *dentry,
+ umode_t *mode, struct ceph_acl_sec_ctx *as_ctx)
+{
+ int err;
+ struct inode *inode;
+
+ inode = new_inode_pseudo(dir->i_sb);
+ if (!inode)
+ return ERR_PTR(-ENOMEM);
+
+ if (!S_ISLNK(*mode)) {
+ err = ceph_pre_init_acls(dir, mode, as_ctx);
+ if (err < 0)
+ goto out_err;
+ }
+
+ err = ceph_security_init_secctx(dentry, *mode, as_ctx);
+ if (err < 0)
+ goto out_err;
+
+ inode->i_state = 0;
+ inode->i_mode = *mode;
+ return inode;
+out_err:
+ iput(inode);
+ return ERR_PTR(err);
+}
+
+void ceph_as_ctx_to_req(struct ceph_mds_request *req, struct ceph_acl_sec_ctx *as_ctx)
+{
+ if (as_ctx->pagelist) {
+ req->r_pagelist = as_ctx->pagelist;
+ as_ctx->pagelist = NULL;
+ }
+}
+
+/**
+ * ceph_get_inode - find or create/hash a new inode
+ * @sb: superblock to search and allocate in
+ * @vino: vino to search for
+ * @newino: optional new inode to insert if one isn't found (may be NULL)
+ *
+ * Search for or insert a new inode into the hash for the given vino, and return a
+ * reference to it. If new is non-NULL, its reference is consumed.
+ */
+struct inode *ceph_get_inode(struct super_block *sb, struct ceph_vino vino, struct inode *newino)
{
struct inode *inode;

if (ceph_vino_is_reserved(vino))
return ERR_PTR(-EREMOTEIO);

- inode = iget5_locked(sb, (unsigned long)vino.ino, ceph_ino_compare,
- ceph_set_ino_cb, &vino);
- if (!inode)
+ if (newino) {
+ inode = inode_insert5(newino, (unsigned long)vino.ino, ceph_ino_compare,
+ ceph_set_ino_cb, &vino);
+ if (inode != newino)
+ iput(newino);
+ } else {
+ inode = iget5_locked(sb, (unsigned long)vino.ino, ceph_ino_compare,
+ ceph_set_ino_cb, &vino);
+ }
+
+ if (!inode) {
+ dout("No inode found for %llx.%llx\n", vino.ino, vino.snap);
return ERR_PTR(-ENOMEM);
+ }

dout("get_inode on %llu=%llx.%llx got %p new %d\n", ceph_present_inode(inode),
ceph_vinop(inode), inode, !!(inode->i_state & I_NEW));
@@ -78,7 +146,7 @@ struct inode *ceph_get_snapdir(struct inode *parent)
.ino = ceph_ino(parent),
.snap = CEPH_SNAPDIR,
};
- struct inode *inode = ceph_get_inode(parent->i_sb, vino);
+ struct inode *inode = ceph_get_inode(parent->i_sb, vino, NULL);
struct ceph_inode_info *ci = ceph_inode(inode);

if (IS_ERR(inode))
@@ -1552,7 +1620,7 @@ static int readdir_prepopulate_inodes_only(struct ceph_mds_request *req,
vino.ino = le64_to_cpu(rde->inode.in->ino);
vino.snap = le64_to_cpu(rde->inode.in->snapid);

- in = ceph_get_inode(req->r_dentry->d_sb, vino);
+ in = ceph_get_inode(req->r_dentry->d_sb, vino, NULL);
if (IS_ERR(in)) {
err = PTR_ERR(in);
dout("new_inode badness got %d\n", err);
@@ -1754,7 +1822,7 @@ int ceph_readdir_prepopulate(struct ceph_mds_request *req,
if (d_really_is_positive(dn)) {
in = d_inode(dn);
} else {
- in = ceph_get_inode(parent->d_sb, tvino);
+ in = ceph_get_inode(parent->d_sb, tvino, NULL);
if (IS_ERR(in)) {
dout("new_inode badness\n");
d_drop(dn);
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index f476c65fb985..7aa253be7edc 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -868,6 +868,7 @@ void ceph_mdsc_release_request(struct kref *kref)
iput(req->r_parent);
}
iput(req->r_target_inode);
+ iput(req->r_new_inode);
if (req->r_dentry)
dput(req->r_dentry);
if (req->r_old_dentry)
@@ -3208,13 +3209,21 @@ static void handle_reply(struct ceph_mds_session *session, struct ceph_msg *msg)

/* Must find target inode outside of mutexes to avoid deadlocks */
if ((err >= 0) && rinfo->head->is_target) {
- struct inode *in;
+ struct inode *in = xchg(&req->r_new_inode, NULL);
struct ceph_vino tvino = {
.ino = le64_to_cpu(rinfo->targeti.in->ino),
.snap = le64_to_cpu(rinfo->targeti.in->snapid)
};

- in = ceph_get_inode(mdsc->fsc->sb, tvino);
+ /* If we ended up opening an existing inode, discard r_new_inode */
+ if (req->r_op == CEPH_MDS_OP_CREATE && !req->r_reply_info.has_create_ino) {
+ /* This should never happen on an async create */
+ WARN_ON_ONCE(req->r_deleg_ino);
+ iput(in);
+ in = NULL;
+ }
+
+ in = ceph_get_inode(mdsc->fsc->sb, tvino, in);
if (IS_ERR(in)) {
err = PTR_ERR(in);
mutex_lock(&session->s_mutex);
diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
index 33497846e47e..2e945979a2e0 100644
--- a/fs/ceph/mds_client.h
+++ b/fs/ceph/mds_client.h
@@ -265,6 +265,7 @@ struct ceph_mds_request {

struct inode *r_parent; /* parent dir inode */
struct inode *r_target_inode; /* resulting inode */
+ struct inode *r_new_inode; /* new inode (for creates) */

#define CEPH_MDS_R_DIRECT_IS_HASH (1) /* r_direct_hash is valid */
#define CEPH_MDS_R_ABORTED (2) /* call was aborted */
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index 4999207a5466..f23e49f46440 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -975,6 +975,7 @@ static inline bool __ceph_have_pending_cap_snap(struct ceph_inode_info *ci)
/* inode.c */
struct ceph_mds_reply_info_in;
struct ceph_mds_reply_dirfrag;
+struct ceph_acl_sec_ctx;

extern const struct inode_operations ceph_file_iops;

@@ -982,8 +983,12 @@ extern struct inode *ceph_alloc_inode(struct super_block *sb);
extern void ceph_evict_inode(struct inode *inode);
extern void ceph_free_inode(struct inode *inode);

+struct inode *ceph_new_inode(struct inode *dir, struct dentry *dentry,
+ umode_t *mode, struct ceph_acl_sec_ctx *as_ctx);
+void ceph_as_ctx_to_req(struct ceph_mds_request *req, struct ceph_acl_sec_ctx *as_ctx);
+
extern struct inode *ceph_get_inode(struct super_block *sb,
- struct ceph_vino vino);
+ struct ceph_vino vino, struct inode *newino);
extern struct inode *ceph_get_snapdir(struct inode *parent);
extern int ceph_fill_file_size(struct inode *inode, int issued,
u32 truncate_seq, u64 truncate_size, u64 size);
--
2.35.1

2022-04-01 13:55:12

by Jeffrey Layton

[permalink] [raw]
Subject: [PATCH v12 03/54] fscrypt: export fscrypt_fname_encrypt and fscrypt_fname_encrypted_size

For ceph, we want to use our own scheme for handling filenames that are
are longer than NAME_MAX after encryption and Base64 encoding. This
allows us to have a consistent view of the encrypted filenames for
clients that don't support fscrypt and clients that do but that don't
have the key.

Currently, fs/crypto only supports encrypting filenames using
fscrypt_setup_filename, but that also handles encoding nokey names. Ceph
can't use that because it handles nokey names in a different way.

Export fscrypt_fname_encrypt. Rename fscrypt_fname_encrypted_size to
__fscrypt_fname_encrypted_size and add a new wrapper called
fscrypt_fname_encrypted_size that takes an inode argument rather than a
pointer to a fscrypt_policy union.

Acked-by: Eric Biggers <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
---
fs/crypto/fname.c | 36 ++++++++++++++++++++++++++++++------
fs/crypto/fscrypt_private.h | 9 +++------
fs/crypto/hooks.c | 6 +++---
include/linux/fscrypt.h | 4 ++++
4 files changed, 40 insertions(+), 15 deletions(-)

diff --git a/fs/crypto/fname.c b/fs/crypto/fname.c
index 1e4233c95005..77d38188a168 100644
--- a/fs/crypto/fname.c
+++ b/fs/crypto/fname.c
@@ -79,7 +79,8 @@ static inline bool fscrypt_is_dot_dotdot(const struct qstr *str)
/**
* fscrypt_fname_encrypt() - encrypt a filename
* @inode: inode of the parent directory (for regular filenames)
- * or of the symlink (for symlink targets)
+ * or of the symlink (for symlink targets). Key must already be
+ * set up.
* @iname: the filename to encrypt
* @out: (output) the encrypted filename
* @olen: size of the encrypted filename. It must be at least @iname->len.
@@ -130,6 +131,7 @@ int fscrypt_fname_encrypt(const struct inode *inode, const struct qstr *iname,

return 0;
}
+EXPORT_SYMBOL_GPL(fscrypt_fname_encrypt);

/**
* fname_decrypt() - decrypt a filename
@@ -257,9 +259,9 @@ int fscrypt_base64url_decode(const char *src, int srclen, u8 *dst)
}
EXPORT_SYMBOL_GPL(fscrypt_base64url_decode);

-bool fscrypt_fname_encrypted_size(const union fscrypt_policy *policy,
- u32 orig_len, u32 max_len,
- u32 *encrypted_len_ret)
+bool __fscrypt_fname_encrypted_size(const union fscrypt_policy *policy,
+ u32 orig_len, u32 max_len,
+ u32 *encrypted_len_ret)
{
int padding = 4 << (fscrypt_policy_flags(policy) &
FSCRYPT_POLICY_FLAGS_PAD_MASK);
@@ -273,6 +275,29 @@ bool fscrypt_fname_encrypted_size(const union fscrypt_policy *policy,
return true;
}

+/**
+ * fscrypt_fname_encrypted_size() - calculate length of encrypted filename
+ * @inode: parent inode of dentry name being encrypted. Key must
+ * already be set up.
+ * @orig_len: length of the original filename
+ * @max_len: maximum length to return
+ * @encrypted_len_ret: where calculated length should be returned (on success)
+ *
+ * Filenames that are shorter than the maximum length may have their lengths
+ * increased slightly by encryption, due to padding that is applied.
+ *
+ * Return: false if the orig_len is greater than max_len. Otherwise, true and
+ * fill out encrypted_len_ret with the length (up to max_len).
+ */
+bool fscrypt_fname_encrypted_size(const struct inode *inode, u32 orig_len,
+ u32 max_len, u32 *encrypted_len_ret)
+{
+ return __fscrypt_fname_encrypted_size(&inode->i_crypt_info->ci_policy,
+ orig_len, max_len,
+ encrypted_len_ret);
+}
+EXPORT_SYMBOL_GPL(fscrypt_fname_encrypted_size);
+
/**
* fscrypt_fname_alloc_buffer() - allocate a buffer for presented filenames
* @max_encrypted_len: maximum length of encrypted filenames the buffer will be
@@ -428,8 +453,7 @@ int fscrypt_setup_filename(struct inode *dir, const struct qstr *iname,
return ret;

if (fscrypt_has_encryption_key(dir)) {
- if (!fscrypt_fname_encrypted_size(&dir->i_crypt_info->ci_policy,
- iname->len, NAME_MAX,
+ if (!fscrypt_fname_encrypted_size(dir, iname->len, NAME_MAX,
&fname->crypto_buf.len))
return -ENAMETOOLONG;
fname->crypto_buf.name = kmalloc(fname->crypto_buf.len,
diff --git a/fs/crypto/fscrypt_private.h b/fs/crypto/fscrypt_private.h
index 5b0a9e6478b5..f3e6e566daff 100644
--- a/fs/crypto/fscrypt_private.h
+++ b/fs/crypto/fscrypt_private.h
@@ -297,14 +297,11 @@ void fscrypt_generate_iv(union fscrypt_iv *iv, u64 lblk_num,
const struct fscrypt_info *ci);

/* fname.c */
-int fscrypt_fname_encrypt(const struct inode *inode, const struct qstr *iname,
- u8 *out, unsigned int olen);
-bool fscrypt_fname_encrypted_size(const union fscrypt_policy *policy,
- u32 orig_len, u32 max_len,
- u32 *encrypted_len_ret);
+bool __fscrypt_fname_encrypted_size(const union fscrypt_policy *policy,
+ u32 orig_len, u32 max_len,
+ u32 *encrypted_len_ret);

/* hkdf.c */
-
struct fscrypt_hkdf {
struct crypto_shash *hmac_tfm;
};
diff --git a/fs/crypto/hooks.c b/fs/crypto/hooks.c
index af74599ae1cf..7c01025879b3 100644
--- a/fs/crypto/hooks.c
+++ b/fs/crypto/hooks.c
@@ -228,9 +228,9 @@ int fscrypt_prepare_symlink(struct inode *dir, const char *target,
* counting it (even though it is meaningless for ciphertext) is simpler
* for now since filesystems will assume it is there and subtract it.
*/
- if (!fscrypt_fname_encrypted_size(policy, len,
- max_len - sizeof(struct fscrypt_symlink_data),
- &disk_link->len))
+ if (!__fscrypt_fname_encrypted_size(policy, len,
+ max_len - sizeof(struct fscrypt_symlink_data),
+ &disk_link->len))
return -ENAMETOOLONG;
disk_link->len += sizeof(struct fscrypt_symlink_data);

diff --git a/include/linux/fscrypt.h b/include/linux/fscrypt.h
index 671181d196a8..c90e176b5843 100644
--- a/include/linux/fscrypt.h
+++ b/include/linux/fscrypt.h
@@ -308,8 +308,12 @@ void fscrypt_free_inode(struct inode *inode);
int fscrypt_drop_inode(struct inode *inode);

/* fname.c */
+int fscrypt_fname_encrypt(const struct inode *inode, const struct qstr *iname,
+ u8 *out, unsigned int olen);
int fscrypt_base64url_encode(const u8 *src, int len, char *dst);
int fscrypt_base64url_decode(const char *src, int len, u8 *dst);
+bool fscrypt_fname_encrypted_size(const struct inode *inode, u32 orig_len,
+ u32 max_len, u32 *encrypted_len_ret);
int fscrypt_setup_filename(struct inode *inode, const struct qstr *iname,
int lookup, struct fscrypt_name *fname);

--
2.35.1

2022-04-01 14:34:05

by Jeffrey Layton

[permalink] [raw]
Subject: [PATCH v12 40/54] ceph: add __ceph_sync_read helper support

From: Xiubo Li <[email protected]>

Turn the guts of ceph_sync_read into a new helper that takes an inode
and an offset instead of a kiocb struct, and make ceph_sync_read call
the helper as a wrapper.

Signed-off-by: Xiubo Li <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
---
fs/ceph/file.c | 33 +++++++++++++++++++++------------
fs/ceph/super.h | 2 ++
2 files changed, 23 insertions(+), 12 deletions(-)

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index ca370c628b7f..9aa997bcfacc 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -929,22 +929,19 @@ enum {
* If we get a short result from the OSD, check against i_size; we need to
* only return a short read to the caller if we hit EOF.
*/
-static ssize_t ceph_sync_read(struct kiocb *iocb, struct iov_iter *to,
- int *retry_op)
+ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
+ struct iov_iter *to, int *retry_op)
{
- struct file *file = iocb->ki_filp;
- struct inode *inode = file_inode(file);
struct ceph_inode_info *ci = ceph_inode(inode);
struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
struct ceph_osd_client *osdc = &fsc->client->osdc;
ssize_t ret;
- u64 off = iocb->ki_pos;
+ u64 off = *ki_pos;
u64 len = iov_iter_count(to);
u64 i_size = i_size_read(inode);
bool sparse = ceph_test_mount_opt(fsc, SPARSEREAD);

- dout("sync_read on file %p %llu~%u %s\n", file, off, (unsigned)len,
- (file->f_flags & O_DIRECT) ? "O_DIRECT" : "");
+ dout("sync_read on inode %p %llx~%llx\n", inode, *ki_pos, len);

if (!len)
return 0;
@@ -1063,14 +1060,14 @@ static ssize_t ceph_sync_read(struct kiocb *iocb, struct iov_iter *to,
break;
}

- if (off > iocb->ki_pos) {
+ if (off > *ki_pos) {
if (off >= i_size) {
*retry_op = CHECK_EOF;
- ret = i_size - iocb->ki_pos;
- iocb->ki_pos = i_size;
+ ret = i_size - *ki_pos;
+ *ki_pos = i_size;
} else {
- ret = off - iocb->ki_pos;
- iocb->ki_pos = off;
+ ret = off - *ki_pos;
+ *ki_pos = off;
}
}

@@ -1078,6 +1075,18 @@ static ssize_t ceph_sync_read(struct kiocb *iocb, struct iov_iter *to,
return ret;
}

+static ssize_t ceph_sync_read(struct kiocb *iocb, struct iov_iter *to,
+ int *retry_op)
+{
+ struct file *file = iocb->ki_filp;
+ struct inode *inode = file_inode(file);
+
+ dout("sync_read on file %p %llx~%zx %s\n", file, iocb->ki_pos,
+ iov_iter_count(to), (file->f_flags & O_DIRECT) ? "O_DIRECT" : "");
+
+ return __ceph_sync_read(inode, &iocb->ki_pos, to, retry_op);
+}
+
struct ceph_aio_request {
struct kiocb *iocb;
size_t total_len;
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index fef4cda44861..339284e90cb3 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -1266,6 +1266,8 @@ extern int ceph_renew_caps(struct inode *inode, int fmode);
extern int ceph_open(struct inode *inode, struct file *file);
extern int ceph_atomic_open(struct inode *dir, struct dentry *dentry,
struct file *file, unsigned flags, umode_t mode);
+extern ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
+ struct iov_iter *to, int *retry_op);
extern int ceph_release(struct inode *inode, struct file *filp);
extern void ceph_fill_inline_data(struct inode *inode, struct page *locked_page,
char *data, size_t len);
--
2.35.1

2022-04-01 14:36:04

by Jeffrey Layton

[permalink] [raw]
Subject: [PATCH v12 17/54] ceph: add encrypted fname handling to ceph_mdsc_build_path

Allow ceph_mdsc_build_path to encrypt and base64 encode the filename
when the parent is encrypted and we're sending the path to the MDS.

In most cases, we just encrypt the filenames and base64 encode them,
but when the name is longer than CEPH_NOHASH_NAME_MAX, we use a similar
scheme to fscrypt proper, and hash the remaning bits with sha256.

When doing this, we then send along the full crypttext of the name in
the new alternate_name field of the MClientRequest. The MDS can then
send that along in readdir responses and traces.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/ceph/crypto.c | 47 ++++++++++++++++++++++++++
fs/ceph/crypto.h | 32 ++++++++++++++++++
fs/ceph/mds_client.c | 80 ++++++++++++++++++++++++++++++++++----------
3 files changed, 141 insertions(+), 18 deletions(-)

diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c
index 2f6ea02ac22d..50d75341c5be 100644
--- a/fs/ceph/crypto.c
+++ b/fs/ceph/crypto.c
@@ -134,3 +134,50 @@ void ceph_fscrypt_as_ctx_to_req(struct ceph_mds_request *req, struct ceph_acl_se
{
swap(req->r_fscrypt_auth, as->fscrypt_auth);
}
+
+int ceph_encode_encrypted_fname(const struct inode *parent, struct dentry *dentry, char *buf)
+{
+ u32 len;
+ int elen;
+ int ret;
+ u8 *cryptbuf;
+
+ WARN_ON_ONCE(!fscrypt_has_encryption_key(parent));
+
+ /*
+ * Convert cleartext d_name to ciphertext. If result is longer than
+ * CEPH_NOHASH_NAME_MAX, sha256 the remaining bytes
+ *
+ * See: fscrypt_setup_filename
+ */
+ if (!fscrypt_fname_encrypted_size(parent, dentry->d_name.len, NAME_MAX, &len))
+ return -ENAMETOOLONG;
+
+ /* Allocate a buffer appropriate to hold the result */
+ cryptbuf = kmalloc(len > CEPH_NOHASH_NAME_MAX ? NAME_MAX : len, GFP_KERNEL);
+ if (!cryptbuf)
+ return -ENOMEM;
+
+ ret = fscrypt_fname_encrypt(parent, &dentry->d_name, cryptbuf, len);
+ if (ret) {
+ kfree(cryptbuf);
+ return ret;
+ }
+
+ /* hash the end if the name is long enough */
+ if (len > CEPH_NOHASH_NAME_MAX) {
+ u8 hash[SHA256_DIGEST_SIZE];
+ u8 *extra = cryptbuf + CEPH_NOHASH_NAME_MAX;
+
+ /* hash the extra bytes and overwrite crypttext beyond that point with it */
+ sha256(extra, len - CEPH_NOHASH_NAME_MAX, hash);
+ memcpy(extra, hash, SHA256_DIGEST_SIZE);
+ len = CEPH_NOHASH_NAME_MAX + SHA256_DIGEST_SIZE;
+ }
+
+ /* base64 encode the encrypted name */
+ elen = fscrypt_base64url_encode(cryptbuf, len, buf);
+ kfree(cryptbuf);
+ dout("base64-encoded ciphertext name = %.*s\n", elen, buf);
+ return elen;
+}
diff --git a/fs/ceph/crypto.h b/fs/ceph/crypto.h
index cb00fe42d5b7..9a66a29d5c8b 100644
--- a/fs/ceph/crypto.h
+++ b/fs/ceph/crypto.h
@@ -6,6 +6,7 @@
#ifndef _CEPH_CRYPTO_H
#define _CEPH_CRYPTO_H

+#include <crypto/sha2.h>
#include <linux/fscrypt.h>

struct ceph_fs_client;
@@ -27,6 +28,30 @@ static inline u32 ceph_fscrypt_auth_len(struct ceph_fscrypt_auth *fa)
}

#ifdef CONFIG_FS_ENCRYPTION
+/*
+ * We want to encrypt filenames when creating them, but the encrypted
+ * versions of those names may have illegal characters in them. To mitigate
+ * that, we base64 encode them, but that gives us a result that can exceed
+ * NAME_MAX.
+ *
+ * Follow a similar scheme to fscrypt itself, and cap the filename to a
+ * smaller size. If the ciphertext name is longer than the value below, then
+ * sha256 hash the remaining bytes.
+ *
+ * For the fscrypt_nokey_name struct the dirhash[2] member is useless in ceph
+ * so the corresponding struct will be:
+ *
+ * struct fscrypt_ceph_nokey_name {
+ * u8 bytes[157];
+ * u8 sha256[SHA256_DIGEST_SIZE];
+ * }; // 189 bytes => 252 bytes base64-encoded, which is <= NAME_MAX (255)
+ *
+ * Note that for long names that end up having their tail portion hashed, we
+ * must also store the full encrypted name (in the dentry's alternate_name
+ * field).
+ */
+#define CEPH_NOHASH_NAME_MAX (189 - SHA256_DIGEST_SIZE)
+
void ceph_fscrypt_set_ops(struct super_block *sb);

void ceph_fscrypt_free_dummy_policy(struct ceph_fs_client *fsc);
@@ -34,6 +59,7 @@ void ceph_fscrypt_free_dummy_policy(struct ceph_fs_client *fsc);
int ceph_fscrypt_prepare_context(struct inode *dir, struct inode *inode,
struct ceph_acl_sec_ctx *as);
void ceph_fscrypt_as_ctx_to_req(struct ceph_mds_request *req, struct ceph_acl_sec_ctx *as);
+int ceph_encode_encrypted_fname(const struct inode *parent, struct dentry *dentry, char *buf);

#else /* CONFIG_FS_ENCRYPTION */

@@ -57,6 +83,12 @@ static inline void ceph_fscrypt_as_ctx_to_req(struct ceph_mds_request *req,
struct ceph_acl_sec_ctx *as_ctx)
{
}
+
+static inline int ceph_encode_encrypted_fname(const struct inode *parent,
+ struct dentry *dentry, char *buf)
+{
+ return -EOPNOTSUPP;
+}
#endif /* CONFIG_FS_ENCRYPTION */

#endif
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 28315053e116..13367a358a85 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -14,6 +14,7 @@
#include <linux/bitmap.h>

#include "super.h"
+#include "crypto.h"
#include "mds_client.h"
#include "crypto.h"

@@ -2385,18 +2386,27 @@ static inline u64 __get_oldest_tid(struct ceph_mds_client *mdsc)
return mdsc->oldest_tid;
}

-/*
- * Build a dentry's path. Allocate on heap; caller must kfree. Based
- * on build_path_from_dentry in fs/cifs/dir.c.
+/**
+ * ceph_mdsc_build_path - build a path string to a given dentry
+ * @dentry: dentry to which path should be built
+ * @plen: returned length of string
+ * @pbase: returned base inode number
+ * @for_wire: is this path going to be sent to the MDS?
+ *
+ * Build a string that represents the path to the dentry. This is mostly called
+ * for two different purposes:
*
- * If @stop_on_nosnap, generate path relative to the first non-snapped
- * inode.
+ * 1) we need to build a path string to send to the MDS (for_wire == true)
+ * 2) we need a path string for local presentation (e.g. debugfs) (for_wire == false)
+ *
+ * The path is built in reverse, starting with the dentry. Walk back up toward
+ * the root, building the path until the first non-snapped inode is reached (for_wire)
+ * or the root inode is reached (!for_wire).
*
* Encode hidden .snap dirs as a double /, i.e.
* foo/.snap/bar -> foo//bar
*/
-char *ceph_mdsc_build_path(struct dentry *dentry, int *plen, u64 *pbase,
- int stop_on_nosnap)
+char *ceph_mdsc_build_path(struct dentry *dentry, int *plen, u64 *pbase, int for_wire)
{
struct dentry *cur;
struct inode *inode;
@@ -2418,30 +2428,65 @@ char *ceph_mdsc_build_path(struct dentry *dentry, int *plen, u64 *pbase,
seq = read_seqbegin(&rename_lock);
cur = dget(dentry);
for (;;) {
- struct dentry *temp;
+ struct dentry *parent;

spin_lock(&cur->d_lock);
inode = d_inode(cur);
if (inode && ceph_snap(inode) == CEPH_SNAPDIR) {
dout("build_path path+%d: %p SNAPDIR\n",
pos, cur);
- } else if (stop_on_nosnap && inode && dentry != cur &&
- ceph_snap(inode) == CEPH_NOSNAP) {
+ spin_unlock(&cur->d_lock);
+ parent = dget_parent(cur);
+ } else if (for_wire && inode && dentry != cur && ceph_snap(inode) == CEPH_NOSNAP) {
spin_unlock(&cur->d_lock);
pos++; /* get rid of any prepended '/' */
break;
- } else {
+ } else if (!for_wire || !IS_ENCRYPTED(d_inode(cur->d_parent))) {
pos -= cur->d_name.len;
if (pos < 0) {
spin_unlock(&cur->d_lock);
break;
}
memcpy(path + pos, cur->d_name.name, cur->d_name.len);
+ spin_unlock(&cur->d_lock);
+ parent = dget_parent(cur);
+ } else {
+ int len, ret;
+ char buf[NAME_MAX];
+
+ /*
+ * Proactively copy name into buf, in case we need to present
+ * it as-is.
+ */
+ memcpy(buf, cur->d_name.name, cur->d_name.len);
+ len = cur->d_name.len;
+ spin_unlock(&cur->d_lock);
+ parent = dget_parent(cur);
+
+ ret = __fscrypt_prepare_readdir(d_inode(parent));
+ if (ret < 0) {
+ dput(parent);
+ dput(cur);
+ return ERR_PTR(ret);
+ }
+
+ if (fscrypt_has_encryption_key(d_inode(parent))) {
+ len = ceph_encode_encrypted_fname(d_inode(parent), cur, buf);
+ if (len < 0) {
+ dput(parent);
+ dput(cur);
+ return ERR_PTR(len);
+ }
+ }
+ pos -= len;
+ if (pos < 0) {
+ dput(parent);
+ break;
+ }
+ memcpy(path + pos, buf, len);
}
- temp = cur;
- spin_unlock(&temp->d_lock);
- cur = dget_parent(temp);
- dput(temp);
+ dput(cur);
+ cur = parent;

/* Are we at the root? */
if (IS_ROOT(cur))
@@ -2465,8 +2510,7 @@ char *ceph_mdsc_build_path(struct dentry *dentry, int *plen, u64 *pbase,
* A rename didn't occur, but somehow we didn't end up where
* we thought we would. Throw a warning and try again.
*/
- pr_warn("build_path did not end path lookup where "
- "expected, pos is %d\n", pos);
+ pr_warn("build_path did not end path lookup where expected (pos = %d)\n", pos);
goto retry;
}

@@ -2486,7 +2530,7 @@ static int build_dentry_path(struct dentry *dentry, struct inode *dir,
rcu_read_lock();
if (!dir)
dir = d_inode_rcu(dentry->d_parent);
- if (dir && parent_locked && ceph_snap(dir) == CEPH_NOSNAP) {
+ if (dir && parent_locked && ceph_snap(dir) == CEPH_NOSNAP && !IS_ENCRYPTED(dir)) {
*pino = ceph_ino(dir);
rcu_read_unlock();
*ppath = dentry->d_name.name;
--
2.35.1

2022-04-01 14:39:19

by Jeffrey Layton

[permalink] [raw]
Subject: [PATCH v12 04/54] fscrypt: add fscrypt_context_for_new_inode

Most filesystems just call fscrypt_set_context on new inodes, which
usually causes a setxattr. That's a bit late for ceph, which can send
along a full set of attributes with the create request.

Doing so allows it to avoid race windows that where the new inode could
be seen by other clients without the crypto context attached. It also
avoids the separate round trip to the server.

Refactor the fscrypt code a bit to allow us to create a new crypto
context, attach it to the inode, and write it to the buffer, but without
calling set_context on it. ceph can later use this to marshal the
context into the attributes we send along with the create request.

Acked-by: Eric Biggers <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
---
fs/crypto/policy.c | 35 +++++++++++++++++++++++++++++------
include/linux/fscrypt.h | 1 +
2 files changed, 30 insertions(+), 6 deletions(-)

diff --git a/fs/crypto/policy.c b/fs/crypto/policy.c
index ed3d623724cd..ec861af96252 100644
--- a/fs/crypto/policy.c
+++ b/fs/crypto/policy.c
@@ -664,6 +664,32 @@ const union fscrypt_policy *fscrypt_policy_to_inherit(struct inode *dir)
return fscrypt_get_dummy_policy(dir->i_sb);
}

+/**
+ * fscrypt_context_for_new_inode() - create an encryption context for a new inode
+ * @ctx: where context should be written
+ * @inode: inode from which to fetch policy and nonce
+ *
+ * Given an in-core "prepared" (via fscrypt_prepare_new_inode) inode,
+ * generate a new context and write it to ctx. ctx _must_ be at least
+ * FSCRYPT_SET_CONTEXT_MAX_SIZE bytes.
+ *
+ * Return: size of the resulting context or a negative error code.
+ */
+int fscrypt_context_for_new_inode(void *ctx, struct inode *inode)
+{
+ struct fscrypt_info *ci = inode->i_crypt_info;
+
+ BUILD_BUG_ON(sizeof(union fscrypt_context) !=
+ FSCRYPT_SET_CONTEXT_MAX_SIZE);
+
+ /* fscrypt_prepare_new_inode() should have set up the key already. */
+ if (WARN_ON_ONCE(!ci))
+ return -ENOKEY;
+
+ return fscrypt_new_context(ctx, &ci->ci_policy, ci->ci_nonce);
+}
+EXPORT_SYMBOL_GPL(fscrypt_context_for_new_inode);
+
/**
* fscrypt_set_context() - Set the fscrypt context of a new inode
* @inode: a new inode
@@ -680,12 +706,9 @@ int fscrypt_set_context(struct inode *inode, void *fs_data)
union fscrypt_context ctx;
int ctxsize;

- /* fscrypt_prepare_new_inode() should have set up the key already. */
- if (WARN_ON_ONCE(!ci))
- return -ENOKEY;
-
- BUILD_BUG_ON(sizeof(ctx) != FSCRYPT_SET_CONTEXT_MAX_SIZE);
- ctxsize = fscrypt_new_context(&ctx, &ci->ci_policy, ci->ci_nonce);
+ ctxsize = fscrypt_context_for_new_inode(&ctx, inode);
+ if (ctxsize < 0)
+ return ctxsize;

/*
* This may be the first time the inode number is available, so do any
diff --git a/include/linux/fscrypt.h b/include/linux/fscrypt.h
index c90e176b5843..530433098f82 100644
--- a/include/linux/fscrypt.h
+++ b/include/linux/fscrypt.h
@@ -276,6 +276,7 @@ int fscrypt_ioctl_get_policy(struct file *filp, void __user *arg);
int fscrypt_ioctl_get_policy_ex(struct file *filp, void __user *arg);
int fscrypt_ioctl_get_nonce(struct file *filp, void __user *arg);
int fscrypt_has_permitted_context(struct inode *parent, struct inode *child);
+int fscrypt_context_for_new_inode(void *ctx, struct inode *inode);
int fscrypt_set_context(struct inode *inode, void *fs_data);

struct fscrypt_dummy_policy {
--
2.35.1

2022-04-01 15:05:50

by Jeffrey Layton

[permalink] [raw]
Subject: [PATCH v12 20/54] ceph: properly set DCACHE_NOKEY_NAME flag in lookup

This is required so that we know to invalidate these dentries when the
directory is unlocked.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/ceph/dir.c | 11 +++++++++++
1 file changed, 11 insertions(+)

diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
index 8cc7a49ee508..897f8618151b 100644
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -760,6 +760,17 @@ static struct dentry *ceph_lookup(struct inode *dir, struct dentry *dentry,
if (dentry->d_name.len > NAME_MAX)
return ERR_PTR(-ENAMETOOLONG);

+ if (IS_ENCRYPTED(dir)) {
+ err = __fscrypt_prepare_readdir(dir);
+ if (err)
+ return ERR_PTR(err);
+ if (!fscrypt_has_encryption_key(dir)) {
+ spin_lock(&dentry->d_lock);
+ dentry->d_flags |= DCACHE_NOKEY_NAME;
+ spin_unlock(&dentry->d_lock);
+ }
+ }
+
/* can we conclude ENOENT locally? */
if (d_really_is_negative(dentry)) {
struct ceph_inode_info *ci = ceph_inode(dir);
--
2.35.1

2022-04-01 15:21:18

by Jeffrey Layton

[permalink] [raw]
Subject: [PATCH v12 30/54] ceph: make ceph_get_name decrypt filenames

When we do a lookupino to the MDS, we get a filename in the trace.
ceph_get_name uses that name directly, so we must properly decrypt
it before copying it to the name buffer.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/ceph/export.c | 44 ++++++++++++++++++++++++++++++++------------
1 file changed, 32 insertions(+), 12 deletions(-)

diff --git a/fs/ceph/export.c b/fs/ceph/export.c
index e0fa66ac8b9f..0ebf2bd93055 100644
--- a/fs/ceph/export.c
+++ b/fs/ceph/export.c
@@ -7,6 +7,7 @@

#include "super.h"
#include "mds_client.h"
+#include "crypto.h"

/*
* Basic fh
@@ -534,7 +535,9 @@ static int ceph_get_name(struct dentry *parent, char *name,
{
struct ceph_mds_client *mdsc;
struct ceph_mds_request *req;
+ struct inode *dir = d_inode(parent);
struct inode *inode = d_inode(child);
+ struct ceph_mds_reply_info_parsed *rinfo;
int err;

if (ceph_snap(inode) != CEPH_NOSNAP)
@@ -546,30 +549,47 @@ static int ceph_get_name(struct dentry *parent, char *name,
if (IS_ERR(req))
return PTR_ERR(req);

- inode_lock(d_inode(parent));
-
+ inode_lock(dir);
req->r_inode = inode;
ihold(inode);
req->r_ino2 = ceph_vino(d_inode(parent));
- req->r_parent = d_inode(parent);
- ihold(req->r_parent);
+ req->r_parent = dir;
+ ihold(dir);
set_bit(CEPH_MDS_R_PARENT_LOCKED, &req->r_req_flags);
req->r_num_caps = 2;
err = ceph_mdsc_do_request(mdsc, NULL, req);
+ inode_unlock(dir);

- inode_unlock(d_inode(parent));
+ if (err)
+ goto out;

- if (!err) {
- struct ceph_mds_reply_info_parsed *rinfo = &req->r_reply_info;
+ rinfo = &req->r_reply_info;
+ if (!IS_ENCRYPTED(dir)) {
memcpy(name, rinfo->dname, rinfo->dname_len);
name[rinfo->dname_len] = 0;
- dout("get_name %p ino %llx.%llx name %s\n",
- child, ceph_vinop(inode), name);
} else {
- dout("get_name %p ino %llx.%llx err %d\n",
- child, ceph_vinop(inode), err);
- }
+ struct fscrypt_str oname = FSTR_INIT(NULL, 0);
+ struct ceph_fname fname = { .dir = dir,
+ .name = rinfo->dname,
+ .ctext = rinfo->altname,
+ .name_len = rinfo->dname_len,
+ .ctext_len = rinfo->altname_len };
+
+ err = ceph_fname_alloc_buffer(dir, &oname);
+ if (err < 0)
+ goto out;

+ err = ceph_fname_to_usr(&fname, NULL, &oname, NULL);
+ if (!err) {
+ memcpy(name, oname.name, oname.len);
+ name[oname.len] = 0;
+ }
+ ceph_fname_free_buffer(dir, &oname);
+ }
+out:
+ dout("get_name %p ino %llx.%llx err %d %s%s\n",
+ child, ceph_vinop(inode), err,
+ err ? "" : "name ", err ? "" : name);
ceph_mdsc_put_request(req);
return err;
}
--
2.35.1

2022-04-01 15:21:47

by Jeffrey Layton

[permalink] [raw]
Subject: [PATCH v12 48/54] ceph: align data in pages in ceph_sync_write

Encrypted files will need to be dealt with in block-sized chunks and
once we do that, the way that ceph_sync_write aligns the data in the
bounce buffer won't be acceptable.

Change it to align the data the same way it would be aligned in the
pagecache.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/ceph/file.c | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index fdd59477af5b..ec6324d23aa6 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -1553,6 +1553,7 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,
bool check_caps = false;
struct timespec64 mtime = current_time(inode);
size_t count = iov_iter_count(from);
+ size_t off;

if (ceph_snap(file_inode(file)) != CEPH_NOSNAP)
return -EROFS;
@@ -1590,12 +1591,7 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,
break;
}

- /*
- * write from beginning of first page,
- * regardless of io alignment
- */
- num_pages = (len + PAGE_SIZE - 1) >> PAGE_SHIFT;
-
+ num_pages = calc_pages_for(pos, len);
pages = ceph_alloc_page_vector(num_pages, GFP_KERNEL);
if (IS_ERR(pages)) {
ret = PTR_ERR(pages);
@@ -1603,9 +1599,12 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,
}

left = len;
+ off = offset_in_page(pos);
for (n = 0; n < num_pages; n++) {
- size_t plen = min_t(size_t, left, PAGE_SIZE);
- ret = copy_page_from_iter(pages[n], 0, plen, from);
+ size_t plen = min_t(size_t, left, PAGE_SIZE - off);
+
+ ret = copy_page_from_iter(pages[n], off, plen, from);
+ off = 0;
if (ret != plen) {
ret = -EFAULT;
break;
@@ -1620,8 +1619,9 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,

req->r_inode = inode;

- osd_req_op_extent_osd_data_pages(req, 0, pages, len, 0,
- false, true);
+ osd_req_op_extent_osd_data_pages(req, 0, pages, len,
+ offset_in_page(pos),
+ false, true);

req->r_mtime = mtime;
ret = ceph_osdc_start_request(&fsc->client->osdc, req, false);
--
2.35.1

2022-04-01 15:45:00

by Jeffrey Layton

[permalink] [raw]
Subject: [PATCH v12 24/54] ceph: fix base64 encoded name's length check in ceph_fname_to_usr()

From: Xiubo Li <[email protected]>

The fname->name is based64_encoded names and the max long shouldn't
exceed the NAME_MAX.

The FSCRYPT_BASE64URL_CHARS(NAME_MAX) will be 255 * 4 / 3.

Signed-off-by: Xiubo Li <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
---
fs/ceph/crypto.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c
index c58897cd30ca..7dee31e1e3bb 100644
--- a/fs/ceph/crypto.c
+++ b/fs/ceph/crypto.c
@@ -211,7 +211,7 @@ int ceph_fname_to_usr(const struct ceph_fname *fname, struct fscrypt_str *tname,
}

/* Sanity check that the resulting name will fit in the buffer */
- if (fname->name_len > FSCRYPT_BASE64URL_CHARS(NAME_MAX))
+ if (fname->name_len > NAME_MAX || fname->ctext_len > NAME_MAX)
return -EIO;

ret = __fscrypt_prepare_readdir(fname->dir);
--
2.35.1

2022-04-01 15:45:18

by Jeffrey Layton

[permalink] [raw]
Subject: [PATCH v12 53/54] ceph: add encryption support to writepage

Allow writepage to issue encrypted writes. Extend out the requested size
and offset to cover complete blocks, and then encrypt and write them to
the OSDs.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/ceph/addr.c | 34 +++++++++++++++++++++++++++-------
1 file changed, 27 insertions(+), 7 deletions(-)

diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 13a37a568a1d..403e7a960a4e 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -594,10 +594,12 @@ static int writepage_nounlock(struct page *page, struct writeback_control *wbc)
loff_t page_off = page_offset(page);
int err;
loff_t len = thp_size(page);
+ loff_t wlen;
struct ceph_writeback_ctl ceph_wbc;
struct ceph_osd_client *osdc = &fsc->client->osdc;
struct ceph_osd_request *req;
bool caching = ceph_is_cache_enabled(inode);
+ struct page *bounce_page = NULL;

dout("writepage %p idx %lu\n", page, page->index);

@@ -628,6 +630,8 @@ static int writepage_nounlock(struct page *page, struct writeback_control *wbc)

if (ceph_wbc.i_size < page_off + len)
len = ceph_wbc.i_size - page_off;
+ if (IS_ENCRYPTED(inode))
+ wlen = round_up(len, CEPH_FSCRYPT_BLOCK_SIZE);

dout("writepage %p page %p index %lu on %llu~%llu snapc %p seq %lld\n",
inode, page, page->index, page_off, len, snapc, snapc->seq);
@@ -636,22 +640,37 @@ static int writepage_nounlock(struct page *page, struct writeback_control *wbc)
CONGESTION_ON_THRESH(fsc->mount_options->congestion_kb))
set_bdi_congested(inode_to_bdi(inode), BLK_RW_ASYNC);

- req = ceph_osdc_new_request(osdc, &ci->i_layout, ceph_vino(inode), page_off, &len, 0, 1,
- CEPH_OSD_OP_WRITE, CEPH_OSD_FLAG_WRITE, snapc,
- ceph_wbc.truncate_seq, ceph_wbc.truncate_size,
- true);
+ req = ceph_osdc_new_request(osdc, &ci->i_layout, ceph_vino(inode),
+ page_off, &wlen, 0, 1, CEPH_OSD_OP_WRITE,
+ CEPH_OSD_FLAG_WRITE, snapc,
+ ceph_wbc.truncate_seq,
+ ceph_wbc.truncate_size, true);
if (IS_ERR(req))
return PTR_ERR(req);

+ if (wlen < len)
+ len = wlen;
+
set_page_writeback(page);
if (caching)
ceph_set_page_fscache(page);
ceph_fscache_write_to_cache(inode, page_off, len, caching);

+ if (IS_ENCRYPTED(inode)) {
+ bounce_page = fscrypt_encrypt_pagecache_blocks(page, CEPH_FSCRYPT_BLOCK_SIZE,
+ 0, GFP_NOFS);
+ if (IS_ERR(bounce_page)) {
+ err = PTR_ERR(bounce_page);
+ goto out;
+ }
+ }
/* it may be a short write due to an object boundary */
WARN_ON_ONCE(len > thp_size(page));
- osd_req_op_extent_osd_data_pages(req, 0, &page, len, 0, false, false);
- dout("writepage %llu~%llu (%llu bytes)\n", page_off, len, len);
+ osd_req_op_extent_osd_data_pages(req, 0,
+ bounce_page ? &bounce_page : &page, wlen, 0,
+ false, false);
+ dout("writepage %llu~%llu (%llu bytes, %sencrypted)\n",
+ page_off, len, wlen, IS_ENCRYPTED(inode) ? "" : "not ");

req->r_mtime = inode->i_mtime;
err = ceph_osdc_start_request(osdc, req, true);
@@ -660,7 +679,8 @@ static int writepage_nounlock(struct page *page, struct writeback_control *wbc)

ceph_update_write_metrics(&fsc->mdsc->metric, req->r_start_latency,
req->r_end_latency, len, err);
-
+ fscrypt_free_bounce_page(bounce_page);
+out:
ceph_osdc_put_request(req);
if (err == 0)
err = len;
--
2.35.1

2022-04-01 15:54:21

by Jeffrey Layton

[permalink] [raw]
Subject: [PATCH v12 44/54] libceph: allow ceph_osdc_new_request to accept a multi-op read

Currently we have some special-casing for multi-op writes, but in the
case of a read, we can't really handle it. All of the current multi-op
callers call it with CEPH_OSD_FLAG_WRITE set.

Have ceph_osdc_new_request check for CEPH_OSD_FLAG_READ and if it's set,
allocate multiple reply ops instead of multiple request ops. If neither
flag is set, return -EINVAL.

Signed-off-by: Jeff Layton <[email protected]>
---
net/ceph/osd_client.c | 27 +++++++++++++++++++++------
1 file changed, 21 insertions(+), 6 deletions(-)

diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index 47d18e9821d4..cb199a3f3caf 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -1130,15 +1130,30 @@ struct ceph_osd_request *ceph_osdc_new_request(struct ceph_osd_client *osdc,
if (flags & CEPH_OSD_FLAG_WRITE)
req->r_data_offset = off;

- if (num_ops > 1)
+ if (num_ops > 1) {
+ int num_req_ops, num_rep_ops;
+
/*
- * This is a special case for ceph_writepages_start(), but it
- * also covers ceph_uninline_data(). If more multi-op request
- * use cases emerge, we will need a separate helper.
+ * If this is a multi-op write request, assume that we'll need
+ * request ops. If it's a multi-op read then assume we'll need
+ * reply ops. Anything else and call it -EINVAL.
*/
- r = __ceph_osdc_alloc_messages(req, GFP_NOFS, num_ops, 0);
- else
+ if (flags & CEPH_OSD_FLAG_WRITE) {
+ num_req_ops = num_ops;
+ num_rep_ops = 0;
+ } else if (flags & CEPH_OSD_FLAG_READ) {
+ num_req_ops = 0;
+ num_rep_ops = num_ops;
+ } else {
+ r = -EINVAL;
+ goto fail;
+ }
+
+ r = __ceph_osdc_alloc_messages(req, GFP_NOFS, num_req_ops,
+ num_rep_ops);
+ } else {
r = ceph_osdc_alloc_messages(req, GFP_NOFS);
+ }
if (r)
goto fail;

--
2.35.1

2022-04-01 18:52:49

by Jeffrey Layton

[permalink] [raw]
Subject: [PATCH v12 09/54] ceph: ensure that we accept a new context from MDS for new inodes

Signed-off-by: Jeff Layton <[email protected]>
---
fs/ceph/inode.c | 21 +++++++++++----------
1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 216e90779a55..bceb46568da3 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -943,6 +943,17 @@ int ceph_fill_inode(struct inode *inode, struct page *locked_page,

__ceph_update_quota(ci, iinfo->max_bytes, iinfo->max_files);

+#ifdef CONFIG_FS_ENCRYPTION
+ if (iinfo->fscrypt_auth_len && (inode->i_state & I_NEW)) {
+ kfree(ci->fscrypt_auth);
+ ci->fscrypt_auth_len = iinfo->fscrypt_auth_len;
+ ci->fscrypt_auth = iinfo->fscrypt_auth;
+ iinfo->fscrypt_auth = NULL;
+ iinfo->fscrypt_auth_len = 0;
+ inode_set_flags(inode, S_ENCRYPTED, S_ENCRYPTED);
+ }
+#endif
+
if ((new_version || (new_issued & CEPH_CAP_AUTH_SHARED)) &&
(issued & CEPH_CAP_AUTH_EXCL) == 0) {
inode->i_mode = mode;
@@ -1032,16 +1043,6 @@ int ceph_fill_inode(struct inode *inode, struct page *locked_page,
xattr_blob = NULL;
}

-#ifdef CONFIG_FS_ENCRYPTION
- if (iinfo->fscrypt_auth_len && !ci->fscrypt_auth) {
- ci->fscrypt_auth_len = iinfo->fscrypt_auth_len;
- ci->fscrypt_auth = iinfo->fscrypt_auth;
- iinfo->fscrypt_auth = NULL;
- iinfo->fscrypt_auth_len = 0;
- inode_set_flags(inode, S_ENCRYPTED, S_ENCRYPTED);
- }
-#endif
-
/* finally update i_version */
if (le64_to_cpu(info->version) > ci->i_version)
ci->i_version = le64_to_cpu(info->version);
--
2.35.1

2022-04-01 20:19:47

by Jeffrey Layton

[permalink] [raw]
Subject: [PATCH v12 49/54] ceph: add read/modify/write to ceph_sync_write

When doing a synchronous write on an encrypted inode, we have no
guarantee that the caller is writing crypto block-aligned data. When
that happens, we must do a read/modify/write cycle.

First, expand the range to cover complete blocks. If we had to change
the original pos or length, issue a read to fill the first and/or last
pages, and fetch the version of the object from the result.

We then copy data into the pages as usual, encrypt the result and issue
a write prefixed by an assertion that the version hasn't changed. If it has
changed then we restart the whole thing again.

If there is no object at that position in the file (-ENOENT), we prefix
the write on an exclusive create of the object instead.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/ceph/file.c | 319 ++++++++++++++++++++++++++++++++++++++++++++-----
1 file changed, 290 insertions(+), 29 deletions(-)

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index ec6324d23aa6..aaa7a9d0c439 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -1542,18 +1542,16 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,
struct inode *inode = file_inode(file);
struct ceph_inode_info *ci = ceph_inode(inode);
struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
- struct ceph_vino vino;
+ struct ceph_osd_client *osdc = &fsc->client->osdc;
struct ceph_osd_request *req;
struct page **pages;
u64 len;
int num_pages;
int written = 0;
- int flags;
int ret;
bool check_caps = false;
struct timespec64 mtime = current_time(inode);
size_t count = iov_iter_count(from);
- size_t off;

if (ceph_snap(file_inode(file)) != CEPH_NOSNAP)
return -EROFS;
@@ -1573,29 +1571,236 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,
if (ret < 0)
dout("invalidate_inode_pages2_range returned %d\n", ret);

- flags = /* CEPH_OSD_FLAG_ORDERSNAP | */ CEPH_OSD_FLAG_WRITE;
-
while ((len = iov_iter_count(from)) > 0) {
size_t left;
int n;
+ u64 write_pos = pos;
+ u64 write_len = len;
+ u64 objnum, objoff;
+ u32 xlen;
+ u64 assert_ver;
+ bool rmw;
+ bool first, last;
+ struct iov_iter saved_iter = *from;
+ size_t off;
+
+ ceph_fscrypt_adjust_off_and_len(inode, &write_pos, &write_len);
+
+ /* clamp the length to the end of first object */
+ ceph_calc_file_object_mapping(&ci->i_layout, write_pos,
+ write_len, &objnum, &objoff,
+ &xlen);
+ write_len = xlen;
+
+ /* adjust len downward if it goes beyond current object */
+ if (pos + len > write_pos + write_len)
+ len = write_pos + write_len - pos;

- vino = ceph_vino(inode);
- req = ceph_osdc_new_request(&fsc->client->osdc, &ci->i_layout,
- vino, pos, &len, 0, 1,
- CEPH_OSD_OP_WRITE, flags, snapc,
- ci->i_truncate_seq,
- ci->i_truncate_size,
- false);
- if (IS_ERR(req)) {
- ret = PTR_ERR(req);
- break;
- }
+ /*
+ * If we had to adjust the length or position to align with a
+ * crypto block, then we must do a read/modify/write cycle. We
+ * use a version assertion to redrive the thing if something
+ * changes in between.
+ */
+ first = pos != write_pos;
+ last = (pos + len) != (write_pos + write_len);
+ rmw = first || last;

- num_pages = calc_pages_for(pos, len);
+ dout("sync_write ino %llx %lld~%llu adjusted %lld~%llu -- %srmw\n",
+ ci->i_vino.ino, pos, len, write_pos, write_len, rmw ? "" : "no ");
+
+ /*
+ * The data is emplaced into the page as it would be if it were in
+ * an array of pagecache pages.
+ */
+ num_pages = calc_pages_for(write_pos, write_len);
pages = ceph_alloc_page_vector(num_pages, GFP_KERNEL);
if (IS_ERR(pages)) {
ret = PTR_ERR(pages);
- goto out;
+ break;
+ }
+
+ /* Do we need to preload the pages? */
+ if (rmw) {
+ u64 first_pos = write_pos;
+ u64 last_pos = (write_pos + write_len) - CEPH_FSCRYPT_BLOCK_SIZE;
+ u64 read_len = CEPH_FSCRYPT_BLOCK_SIZE;
+ struct ceph_osd_req_op *op;
+
+ /* We should only need to do this for encrypted inodes */
+ WARN_ON_ONCE(!IS_ENCRYPTED(inode));
+
+ /* No need to do two reads if first and last blocks are same */
+ if (first && last_pos == first_pos)
+ last = false;
+
+ /*
+ * Allocate a read request for one or two extents, depending
+ * on how the request was aligned.
+ */
+ req = ceph_osdc_new_request(osdc, &ci->i_layout,
+ ci->i_vino, first ? first_pos : last_pos,
+ &read_len, 0, (first && last) ? 2 : 1,
+ CEPH_OSD_OP_SPARSE_READ, CEPH_OSD_FLAG_READ,
+ NULL, ci->i_truncate_seq,
+ ci->i_truncate_size, false);
+ if (IS_ERR(req)) {
+ ceph_release_page_vector(pages, num_pages);
+ ret = PTR_ERR(req);
+ break;
+ }
+
+ /* Something is misaligned! */
+ if (read_len != CEPH_FSCRYPT_BLOCK_SIZE) {
+ ceph_osdc_put_request(req);
+ ceph_release_page_vector(pages, num_pages);
+ ret = -EIO;
+ break;
+ }
+
+ /* Add extent for first block? */
+ op = &req->r_ops[0];
+
+ if (first) {
+ osd_req_op_extent_osd_data_pages(req, 0, pages,
+ CEPH_FSCRYPT_BLOCK_SIZE,
+ offset_in_page(first_pos),
+ false, false);
+ /* We only expect a single extent here */
+ ret = ceph_alloc_sparse_ext_map(op, 1);
+ if (ret) {
+ ceph_osdc_put_request(req);
+ ceph_release_page_vector(pages, num_pages);
+ break;
+ }
+ }
+
+ /* Add extent for last block */
+ if (last) {
+ /* Init the other extent if first extent has been used */
+ if (first) {
+ op = &req->r_ops[1];
+ osd_req_op_extent_init(req, 1, CEPH_OSD_OP_SPARSE_READ,
+ last_pos, CEPH_FSCRYPT_BLOCK_SIZE,
+ ci->i_truncate_size,
+ ci->i_truncate_seq);
+ }
+
+ ret = ceph_alloc_sparse_ext_map(op, 1);
+ if (ret) {
+ ceph_osdc_put_request(req);
+ ceph_release_page_vector(pages, num_pages);
+ break;
+ }
+
+ osd_req_op_extent_osd_data_pages(req, first ? 1 : 0,
+ &pages[num_pages - 1],
+ CEPH_FSCRYPT_BLOCK_SIZE,
+ offset_in_page(last_pos),
+ false, false);
+ }
+
+ ret = ceph_osdc_start_request(osdc, req, false);
+ if (!ret)
+ ret = ceph_osdc_wait_request(osdc, req);
+
+ /* FIXME: length field is wrong if there are 2 extents */
+ ceph_update_read_metrics(&fsc->mdsc->metric,
+ req->r_start_latency,
+ req->r_end_latency,
+ read_len, ret);
+
+ /* Ok if object is not already present */
+ if (ret == -ENOENT) {
+ /*
+ * If there is no object, then we can't assert
+ * on its version. Set it to 0, and we'll use an
+ * exclusive create instead.
+ */
+ ceph_osdc_put_request(req);
+ assert_ver = 0;
+ ret = 0;
+
+ /*
+ * zero out the soon-to-be uncopied parts of the
+ * first and last pages.
+ */
+ if (first)
+ zero_user_segment(pages[0], 0,
+ offset_in_page(first_pos));
+ if (last)
+ zero_user_segment(pages[num_pages - 1],
+ offset_in_page(last_pos),
+ PAGE_SIZE);
+ } else {
+ if (ret < 0) {
+ ceph_osdc_put_request(req);
+ ceph_release_page_vector(pages, num_pages);
+ break;
+ }
+
+ op = &req->r_ops[0];
+ if (op->extent.sparse_ext_cnt == 0) {
+ if (first)
+ zero_user_segment(pages[0], 0,
+ offset_in_page(first_pos));
+ else
+ zero_user_segment(pages[num_pages - 1],
+ offset_in_page(last_pos),
+ PAGE_SIZE);
+ } else if (op->extent.sparse_ext_cnt != 1 ||
+ ceph_sparse_ext_map_end(op) !=
+ CEPH_FSCRYPT_BLOCK_SIZE) {
+ ret = -EIO;
+ ceph_osdc_put_request(req);
+ ceph_release_page_vector(pages, num_pages);
+ break;
+ }
+
+ if (first && last) {
+ op = &req->r_ops[1];
+ if (op->extent.sparse_ext_cnt == 0) {
+ zero_user_segment(pages[num_pages - 1],
+ offset_in_page(last_pos),
+ PAGE_SIZE);
+ } else if (op->extent.sparse_ext_cnt != 1 ||
+ ceph_sparse_ext_map_end(op) !=
+ CEPH_FSCRYPT_BLOCK_SIZE) {
+ ret = -EIO;
+ ceph_osdc_put_request(req);
+ ceph_release_page_vector(pages, num_pages);
+ break;
+ }
+ }
+
+ /* Grab assert version. It must be non-zero. */
+ assert_ver = req->r_version;
+ WARN_ON_ONCE(ret > 0 && assert_ver == 0);
+
+ ceph_osdc_put_request(req);
+ if (first) {
+ ret = ceph_fscrypt_decrypt_block_inplace(inode,
+ pages[0],
+ CEPH_FSCRYPT_BLOCK_SIZE,
+ offset_in_page(first_pos),
+ first_pos >> CEPH_FSCRYPT_BLOCK_SHIFT);
+ if (ret < 0) {
+ ceph_release_page_vector(pages, num_pages);
+ break;
+ }
+ }
+ if (last) {
+ ret = ceph_fscrypt_decrypt_block_inplace(inode,
+ pages[num_pages - 1],
+ CEPH_FSCRYPT_BLOCK_SIZE,
+ offset_in_page(last_pos),
+ last_pos >> CEPH_FSCRYPT_BLOCK_SHIFT);
+ if (ret < 0) {
+ ceph_release_page_vector(pages, num_pages);
+ break;
+ }
+ }
+ }
}

left = len;
@@ -1603,43 +1808,98 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,
for (n = 0; n < num_pages; n++) {
size_t plen = min_t(size_t, left, PAGE_SIZE - off);

+ /* copy the data */
ret = copy_page_from_iter(pages[n], off, plen, from);
- off = 0;
if (ret != plen) {
ret = -EFAULT;
break;
}
+ off = 0;
left -= ret;
}
-
if (ret < 0) {
+ dout("sync_write write failed with %d\n", ret);
ceph_release_page_vector(pages, num_pages);
- goto out;
+ break;
}

- req->r_inode = inode;
+ if (IS_ENCRYPTED(inode)) {
+ ret = ceph_fscrypt_encrypt_pages(inode, pages,
+ write_pos, write_len,
+ GFP_KERNEL);
+ if (ret < 0) {
+ dout("encryption failed with %d\n", ret);
+ ceph_release_page_vector(pages, num_pages);
+ break;
+ }
+ }

- osd_req_op_extent_osd_data_pages(req, 0, pages, len,
- offset_in_page(pos),
- false, true);
+ req = ceph_osdc_new_request(osdc, &ci->i_layout,
+ ci->i_vino, write_pos, &write_len,
+ rmw ? 1 : 0, rmw ? 2 : 1,
+ CEPH_OSD_OP_WRITE,
+ CEPH_OSD_FLAG_WRITE,
+ snapc, ci->i_truncate_seq,
+ ci->i_truncate_size, false);
+ if (IS_ERR(req)) {
+ ret = PTR_ERR(req);
+ ceph_release_page_vector(pages, num_pages);
+ break;
+ }

+ dout("sync_write write op %lld~%llu\n", write_pos, write_len);
+ osd_req_op_extent_osd_data_pages(req, rmw ? 1 : 0, pages, write_len,
+ offset_in_page(write_pos), false,
+ true);
+ req->r_inode = inode;
req->r_mtime = mtime;
- ret = ceph_osdc_start_request(&fsc->client->osdc, req, false);
+
+ /* Set up the assertion */
+ if (rmw) {
+ /*
+ * Set up the assertion. If we don't have a version number,
+ * then the object doesn't exist yet. Use an exclusive create
+ * instead of a version assertion in that case.
+ */
+ if (assert_ver) {
+ osd_req_op_init(req, 0, CEPH_OSD_OP_ASSERT_VER, 0);
+ req->r_ops[0].assert_ver.ver = assert_ver;
+ } else {
+ osd_req_op_init(req, 0, CEPH_OSD_OP_CREATE,
+ CEPH_OSD_OP_FLAG_EXCL);
+ }
+ }
+
+ ret = ceph_osdc_start_request(osdc, req, false);
if (!ret)
- ret = ceph_osdc_wait_request(&fsc->client->osdc, req);
+ ret = ceph_osdc_wait_request(osdc, req);

ceph_update_write_metrics(&fsc->mdsc->metric, req->r_start_latency,
req->r_end_latency, len, ret);
-out:
ceph_osdc_put_request(req);
if (ret != 0) {
+ dout("sync_write osd write returned %d\n", ret);
+ /* Version changed! Must re-do the rmw cycle */
+ if ((assert_ver && (ret == -ERANGE || ret == -EOVERFLOW)) ||
+ (!assert_ver && ret == -EEXIST)) {
+ /* We should only ever see this on a rmw */
+ WARN_ON_ONCE(!rmw);
+
+ /* The version should never go backward */
+ WARN_ON_ONCE(ret == -EOVERFLOW);
+
+ *from = saved_iter;
+
+ /* FIXME: limit number of times we loop? */
+ continue;
+ }
ceph_set_error_write(ci);
break;
}
-
ceph_clear_error_write(ci);
pos += len;
written += len;
+ dout("sync_write written %d\n", written);
if (pos > i_size_read(inode)) {
check_caps = ceph_inode_set_size(inode, pos);
if (check_caps)
@@ -1654,6 +1914,7 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,
ret = written;
iocb->ki_pos = pos;
}
+ dout("sync_write returning %d\n", ret);
return ret;
}

--
2.35.1

2022-04-02 13:14:12

by Jeffrey Layton

[permalink] [raw]
Subject: [PATCH v12 18/54] ceph: send altname in MClientRequest

In the event that we have a filename longer than CEPH_NOHASH_NAME_MAX,
we'll need to hash the tail of the filename. The client however will
still need to know the full name of the file if it has a key.

To support this, the MClientRequest field has grown a new alternate_name
field that we populate with the full (binary) crypttext of the filename.
This is then transmitted to the clients in readdir or traces as part of
the dentry lease.

Add support for populating this field when the filenames are very long.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/ceph/mds_client.c | 75 +++++++++++++++++++++++++++++++++++++++++---
fs/ceph/mds_client.h | 3 ++
2 files changed, 73 insertions(+), 5 deletions(-)

diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 13367a358a85..0be1668b2c32 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -972,6 +972,7 @@ void ceph_mdsc_release_request(struct kref *kref)
if (req->r_pagelist)
ceph_pagelist_release(req->r_pagelist);
kfree(req->r_fscrypt_auth);
+ kfree(req->r_altname);
put_request_session(req);
ceph_unreserve_caps(req->r_mdsc, &req->r_caps_reservation);
WARN_ON_ONCE(!list_empty(&req->r_wait));
@@ -2386,6 +2387,63 @@ static inline u64 __get_oldest_tid(struct ceph_mds_client *mdsc)
return mdsc->oldest_tid;
}

+#if IS_ENABLED(CONFIG_FS_ENCRYPTION)
+static u8 *get_fscrypt_altname(const struct ceph_mds_request *req, u32 *plen)
+{
+ struct inode *dir = req->r_parent;
+ struct dentry *dentry = req->r_dentry;
+ u8 *cryptbuf = NULL;
+ u32 len = 0;
+ int ret = 0;
+
+ /* only encode if we have parent and dentry */
+ if (!dir || !dentry)
+ goto success;
+
+ /* No-op unless this is encrypted */
+ if (!IS_ENCRYPTED(dir))
+ goto success;
+
+ ret = __fscrypt_prepare_readdir(dir);
+ if (ret)
+ return ERR_PTR(ret);
+
+ /* No key? Just ignore it. */
+ if (!fscrypt_has_encryption_key(dir))
+ goto success;
+
+ if (!fscrypt_fname_encrypted_size(dir, dentry->d_name.len, NAME_MAX, &len)) {
+ WARN_ON_ONCE(1);
+ return ERR_PTR(-ENAMETOOLONG);
+ }
+
+ /* No need to append altname if name is short enough */
+ if (len <= CEPH_NOHASH_NAME_MAX) {
+ len = 0;
+ goto success;
+ }
+
+ cryptbuf = kmalloc(len, GFP_KERNEL);
+ if (!cryptbuf)
+ return ERR_PTR(-ENOMEM);
+
+ ret = fscrypt_fname_encrypt(dir, &dentry->d_name, cryptbuf, len);
+ if (ret) {
+ kfree(cryptbuf);
+ return ERR_PTR(ret);
+ }
+success:
+ *plen = len;
+ return cryptbuf;
+}
+#else
+static u8 *get_fscrypt_altname(const struct ceph_mds_request *req, u32 *plen)
+{
+ *plen = 0;
+ return NULL;
+}
+#endif
+
/**
* ceph_mdsc_build_path - build a path string to a given dentry
* @dentry: dentry to which path should be built
@@ -2606,14 +2664,15 @@ static void encode_mclientrequest_tail(void **p, const struct ceph_mds_request *
ceph_encode_timespec64(&ts, &req->r_stamp);
ceph_encode_copy(p, &ts, sizeof(ts));

- /* gid_list */
+ /* v4: gid_list */
ceph_encode_32(p, req->r_cred->group_info->ngroups);
for (i = 0; i < req->r_cred->group_info->ngroups; i++)
ceph_encode_64(p, from_kgid(&init_user_ns,
req->r_cred->group_info->gid[i]));

- /* v5: altname (TODO: skip for now) */
- ceph_encode_32(p, 0);
+ /* v5: altname */
+ ceph_encode_32(p, req->r_altname_len);
+ ceph_encode_copy(p, req->r_altname, req->r_altname_len);

/* v6: fscrypt_auth and fscrypt_file */
if (req->r_fscrypt_auth) {
@@ -2669,7 +2728,13 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
goto out_free1;
}

- /* head */
+ req->r_altname = get_fscrypt_altname(req, &req->r_altname_len);
+ if (IS_ERR(req->r_altname)) {
+ msg = ERR_CAST(req->r_altname);
+ req->r_altname = NULL;
+ goto out_free2;
+ }
+
len = legacy ? sizeof(*head) : sizeof(struct ceph_mds_request_head);

/* filepaths */
@@ -2695,7 +2760,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
len += sizeof(u32) + (sizeof(u64) * req->r_cred->group_info->ngroups);

/* alternate name */
- len += sizeof(u32); // TODO
+ len += sizeof(u32) + req->r_altname_len;

/* fscrypt_auth */
len += sizeof(u32); // fscrypt_auth
diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
index 2cc75f9ae7c7..cd719691a86d 100644
--- a/fs/ceph/mds_client.h
+++ b/fs/ceph/mds_client.h
@@ -290,6 +290,9 @@ struct ceph_mds_request {

struct ceph_fscrypt_auth *r_fscrypt_auth;

+ u8 *r_altname; /* fscrypt binary crypttext for long filenames */
+ u32 r_altname_len; /* length of r_altname */
+
int r_fmode; /* file mode, if expecting cap */
int r_request_release_offset;
const struct cred *r_cred;
--
2.35.1

2022-04-02 14:47:40

by Jeffrey Layton

[permalink] [raw]
Subject: [PATCH v12 46/54] ceph: disable copy offload on encrypted inodes

If we have an encrypted inode, then the client will need to re-encrypt
the contents of the new object. Disable copy offload to or from
encrypted inodes.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/ceph/file.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index d8dfb8de4219..29cc65222ddd 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -2528,6 +2528,10 @@ static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off,
return -EOPNOTSUPP;
}

+ /* Every encrypted inode gets its own key, so we can't offload them */
+ if (IS_ENCRYPTED(src_inode) || IS_ENCRYPTED(dst_inode))
+ return -EOPNOTSUPP;
+
if (len < src_ci->i_layout.object_size)
return -EOPNOTSUPP; /* no remote copy will be done */

--
2.35.1

2022-04-03 06:48:55

by Jeffrey Layton

[permalink] [raw]
Subject: [PATCH v12 13/54] ceph: decode alternate_name in lease info

Ceph is a bit different from local filesystems, in that we don't want
to store filenames as raw binary data, since we may also be dealing
with clients that don't support fscrypt.

We could just base64-encode the encrypted filenames, but that could
leave us with filenames longer than NAME_MAX. It turns out that the
MDS doesn't care much about filename length, but the clients do.

To manage this, we've added a new "alternate name" field that can be
optionally added to any dentry that we'll use to store the binary
crypttext of the filename if its base64-encoded value will be longer
than NAME_MAX. When a dentry has one of these names attached, the MDS
will send it along in the lease info, which we can then store for
later usage.

Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: Xiubo Li <[email protected]>
---
fs/ceph/mds_client.c | 43 +++++++++++++++++++++++++++++++++----------
fs/ceph/mds_client.h | 11 +++++++----
2 files changed, 40 insertions(+), 14 deletions(-)

diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index dcb800675dec..2f2f8221eb25 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -308,27 +308,47 @@ static int parse_reply_info_dir(void **p, void *end,

static int parse_reply_info_lease(void **p, void *end,
struct ceph_mds_reply_lease **lease,
- u64 features)
+ u64 features, u32 *altname_len, u8 **altname)
{
+ u8 struct_v;
+ u32 struct_len;
+ void *lend;
+
if (features == (u64)-1) {
- u8 struct_v, struct_compat;
- u32 struct_len;
+ u8 struct_compat;
+
ceph_decode_8_safe(p, end, struct_v, bad);
ceph_decode_8_safe(p, end, struct_compat, bad);
+
/* struct_v is expected to be >= 1. we only understand
* encoding whose struct_compat == 1. */
if (!struct_v || struct_compat != 1)
goto bad;
+
ceph_decode_32_safe(p, end, struct_len, bad);
- ceph_decode_need(p, end, struct_len, bad);
- end = *p + struct_len;
+ } else {
+ struct_len = sizeof(**lease);
+ *altname_len = 0;
+ *altname = NULL;
}

- ceph_decode_need(p, end, sizeof(**lease), bad);
+ lend = *p + struct_len;
+ ceph_decode_need(p, end, struct_len, bad);
*lease = *p;
*p += sizeof(**lease);
- if (features == (u64)-1)
- *p = end;
+
+ if (features == (u64)-1) {
+ if (struct_v >= 2) {
+ ceph_decode_32_safe(p, end, *altname_len, bad);
+ ceph_decode_need(p, end, *altname_len, bad);
+ *altname = *p;
+ *p += *altname_len;
+ } else {
+ *altname = NULL;
+ *altname_len = 0;
+ }
+ }
+ *p = lend;
return 0;
bad:
return -EIO;
@@ -358,7 +378,8 @@ static int parse_reply_info_trace(void **p, void *end,
info->dname = *p;
*p += info->dname_len;

- err = parse_reply_info_lease(p, end, &info->dlease, features);
+ err = parse_reply_info_lease(p, end, &info->dlease, features,
+ &info->altname_len, &info->altname);
if (err < 0)
goto out_bad;
}
@@ -425,9 +446,11 @@ static int parse_reply_info_readdir(void **p, void *end,
dout("parsed dir dname '%.*s'\n", rde->name_len, rde->name);

/* dentry lease */
- err = parse_reply_info_lease(p, end, &rde->lease, features);
+ err = parse_reply_info_lease(p, end, &rde->lease, features,
+ &rde->altname_len, &rde->altname);
if (err)
goto out_bad;
+
/* inode */
err = parse_reply_info_in(p, end, &rde->inode, features);
if (err < 0)
diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
index aab3ab284fce..2cc75f9ae7c7 100644
--- a/fs/ceph/mds_client.h
+++ b/fs/ceph/mds_client.h
@@ -29,8 +29,8 @@ enum ceph_feature_type {
CEPHFS_FEATURE_MULTI_RECONNECT,
CEPHFS_FEATURE_DELEG_INO,
CEPHFS_FEATURE_METRIC_COLLECT,
-
- CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_METRIC_COLLECT,
+ CEPHFS_FEATURE_ALTERNATE_NAME,
+ CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_ALTERNATE_NAME,
};

/*
@@ -45,8 +45,7 @@ enum ceph_feature_type {
CEPHFS_FEATURE_MULTI_RECONNECT, \
CEPHFS_FEATURE_DELEG_INO, \
CEPHFS_FEATURE_METRIC_COLLECT, \
- \
- CEPHFS_FEATURE_MAX, \
+ CEPHFS_FEATURE_ALTERNATE_NAME, \
}
#define CEPHFS_FEATURES_CLIENT_REQUIRED {}

@@ -98,7 +97,9 @@ struct ceph_mds_reply_info_in {

struct ceph_mds_reply_dir_entry {
char *name;
+ u8 *altname;
u32 name_len;
+ u32 altname_len;
struct ceph_mds_reply_lease *lease;
struct ceph_mds_reply_info_in inode;
loff_t offset;
@@ -122,7 +123,9 @@ struct ceph_mds_reply_info_parsed {
struct ceph_mds_reply_info_in diri, targeti;
struct ceph_mds_reply_dirfrag *dirfrag;
char *dname;
+ u8 *altname;
u32 dname_len;
+ u32 altname_len;
struct ceph_mds_reply_lease *dlease;
struct ceph_mds_reply_xattr xattr_info;

--
2.35.1

2022-04-03 07:24:41

by Jeffrey Layton

[permalink] [raw]
Subject: [PATCH v12 50/54] ceph: plumb in decryption during sync reads

Switch to using sparse reads when the inode is encrypted.

Note that the crypto block may be smaller than a page, but the reverse
cannot be true.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/ceph/file.c | 89 ++++++++++++++++++++++++++++++++++++--------------
1 file changed, 65 insertions(+), 24 deletions(-)

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index aaa7a9d0c439..5072570c2203 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -940,7 +940,7 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
u64 off = *ki_pos;
u64 len = iov_iter_count(to);
u64 i_size = i_size_read(inode);
- bool sparse = ceph_test_mount_opt(fsc, SPARSEREAD);
+ bool sparse = IS_ENCRYPTED(inode) || ceph_test_mount_opt(fsc, SPARSEREAD);
u64 objver = 0;

dout("sync_read on inode %p %llx~%llx\n", inode, *ki_pos, len);
@@ -968,10 +968,19 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
int idx;
size_t left;
struct ceph_osd_req_op *op;
+ u64 read_off = off;
+ u64 read_len = len;
+
+ /* determine new offset/length if encrypted */
+ ceph_fscrypt_adjust_off_and_len(inode, &read_off, &read_len);
+
+ dout("sync_read orig %llu~%llu reading %llu~%llu",
+ off, len, read_off, read_len);

req = ceph_osdc_new_request(osdc, &ci->i_layout,
- ci->i_vino, off, &len, 0, 1,
- sparse ? CEPH_OSD_OP_SPARSE_READ : CEPH_OSD_OP_READ,
+ ci->i_vino, read_off, &read_len, 0, 1,
+ sparse ? CEPH_OSD_OP_SPARSE_READ :
+ CEPH_OSD_OP_READ,
CEPH_OSD_FLAG_READ,
NULL, ci->i_truncate_seq,
ci->i_truncate_size, false);
@@ -980,10 +989,13 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
break;
}

+ /* adjust len downward if the request truncated the len */
+ if (off + len > read_off + read_len)
+ len = read_off + read_len - off;
more = len < iov_iter_count(to);

- num_pages = calc_pages_for(off, len);
- page_off = off & ~PAGE_MASK;
+ num_pages = calc_pages_for(read_off, read_len);
+ page_off = offset_in_page(off);
pages = ceph_alloc_page_vector(num_pages, GFP_KERNEL);
if (IS_ERR(pages)) {
ceph_osdc_put_request(req);
@@ -991,7 +1003,8 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
break;
}

- osd_req_op_extent_osd_data_pages(req, 0, pages, len, page_off,
+ osd_req_op_extent_osd_data_pages(req, 0, pages, read_len,
+ offset_in_page(read_off),
false, false);

op = &req->r_ops[0];
@@ -1010,7 +1023,7 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
ceph_update_read_metrics(&fsc->mdsc->metric,
req->r_start_latency,
req->r_end_latency,
- len, ret);
+ read_len, ret);

if (ret > 0)
objver = req->r_version;
@@ -1025,8 +1038,34 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
else if (ret == -ENOENT)
ret = 0;

+ if (ret > 0 && IS_ENCRYPTED(inode)) {
+ int fret;
+
+ fret = ceph_fscrypt_decrypt_extents(inode, pages, read_off,
+ op->extent.sparse_ext, op->extent.sparse_ext_cnt);
+ if (fret < 0) {
+ ret = fret;
+ ceph_osdc_put_request(req);
+ break;
+ }
+
+ /* account for any partial block at the beginning */
+ fret -= (off - read_off);
+
+ /*
+ * Short read after big offset adjustment?
+ * Nothing is usable, just call it a zero
+ * len read.
+ */
+ fret = max(fret, 0);
+
+ /* account for partial block at the end */
+ ret = min_t(ssize_t, fret, len);
+ }
+
ceph_osdc_put_request(req);

+ /* Short read but not EOF? Zero out the remainder. */
if (ret >= 0 && ret < len && (off + ret < i_size)) {
int zlen = min(len - ret, i_size - off - ret);
int zoff = page_off + ret;
@@ -1040,15 +1079,16 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
idx = 0;
left = ret > 0 ? ret : 0;
while (left > 0) {
- size_t len, copied;
- page_off = off & ~PAGE_MASK;
- len = min_t(size_t, left, PAGE_SIZE - page_off);
+ size_t plen, copied;
+
+ plen = min_t(size_t, left, PAGE_SIZE - page_off);
SetPageUptodate(pages[idx]);
copied = copy_page_to_iter(pages[idx++],
- page_off, len, to);
+ page_off, plen, to);
off += copied;
left -= copied;
- if (copied < len) {
+ page_off = 0;
+ if (copied < plen) {
ret = -EFAULT;
break;
}
@@ -1065,20 +1105,21 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
break;
}

- if (off > *ki_pos) {
- if (off >= i_size) {
- *retry_op = CHECK_EOF;
- ret = i_size - *ki_pos;
- *ki_pos = i_size;
- } else {
- ret = off - *ki_pos;
- *ki_pos = off;
+ if (ret > 0) {
+ if (off > *ki_pos) {
+ if (off >= i_size) {
+ *retry_op = CHECK_EOF;
+ ret = i_size - *ki_pos;
+ *ki_pos = i_size;
+ } else {
+ ret = off - *ki_pos;
+ *ki_pos = off;
+ }
}
- }
-
- if (last_objver && ret > 0)
- *last_objver = objver;

+ if (last_objver)
+ *last_objver = objver;
+ }
dout("sync_read result %zd retry_op %d\n", ret, *retry_op);
return ret;
}
--
2.35.1

2022-04-03 20:58:45

by Luis Henriques

[permalink] [raw]
Subject: Re: [PATCH v12 07/54] ceph: support legacy v1 encryption policy keysetup

Eric Biggers <[email protected]> writes:

> On Thu, Mar 31, 2022 at 11:30:43AM -0400, Jeff Layton wrote:
>> From: Luís Henriques <[email protected]>
>>
>> fstests make use of legacy keysetup where the key description uses a
>> filesystem-specific prefix. Add this ceph-specific prefix to the
>> fscrypt_operations data structure.
>>
>> Signed-off-by: Luís Henriques <[email protected]>
>> Signed-off-by: Jeff Layton <[email protected]>
>> ---
>> fs/ceph/crypto.c | 1 +
>> 1 file changed, 1 insertion(+)
>>
>> diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c
>> index a513ff373b13..d1a6595a810f 100644
>> --- a/fs/ceph/crypto.c
>> +++ b/fs/ceph/crypto.c
>> @@ -65,6 +65,7 @@ static bool ceph_crypt_empty_dir(struct inode *inode)
>> }
>>
>> static struct fscrypt_operations ceph_fscrypt_ops = {
>> + .key_prefix = "ceph:",
>> .get_context = ceph_crypt_get_context,
>> .set_context = ceph_crypt_set_context,
>> .empty_dir = ceph_crypt_empty_dir,
>> --
>> 2.35.1
>>
>
> As I mentioned before
> (https://lore.kernel.org/r/[email protected]), I don't
> think you should do this, given that the filesystem-specific key description
> prefixes are deprecated. In fact, they're sort of doubly deprecated, since
> first they were superseded by "fscrypt:", and then "login" keys were
> superseded by FS_IOC_ADD_ENCRYPTION_KEY.
>
> How about updating fstests to use "fscrypt:" instead of "$FSTYP:" if $FSTYP is
> not ext4 or f2fs?
>
> Or maybe fstests should just use "fscrypt:" unconditionally, given that this has
> been supported by ext4 and f2fs since 4.8, and 4.9 is now the oldest supported
> LTS kernel.

OK, makes sense. Thanks for the suggestion. I'll follow-up with an
fstests patch to do what you suggest (use $FSTYP only for those 2
filesystems).

Cheers,
--
Luís

2022-04-04 08:26:16

by Jeffrey Layton

[permalink] [raw]
Subject: [PATCH v12 54/54] ceph: fscrypt support for writepages

Add the appropriate machinery to write back dirty data with encryption.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/ceph/addr.c | 62 ++++++++++++++++++++++++++++++++++++++----------
fs/ceph/crypto.h | 18 +++++++++++++-
2 files changed, 67 insertions(+), 13 deletions(-)

diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 403e7a960a4e..cc4f561bd03c 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -556,10 +556,12 @@ static u64 get_writepages_data_length(struct inode *inode,
struct page *page, u64 start)
{
struct ceph_inode_info *ci = ceph_inode(inode);
- struct ceph_snap_context *snapc = page_snap_context(page);
+ struct ceph_snap_context *snapc;
struct ceph_cap_snap *capsnap = NULL;
u64 end = i_size_read(inode);
+ u64 ret;

+ snapc = page_snap_context(ceph_fscrypt_pagecache_page(page));
if (snapc != ci->i_head_snapc) {
bool found = false;
spin_lock(&ci->i_ceph_lock);
@@ -574,9 +576,12 @@ static u64 get_writepages_data_length(struct inode *inode,
spin_unlock(&ci->i_ceph_lock);
WARN_ON(!found);
}
- if (end > page_offset(page) + thp_size(page))
- end = page_offset(page) + thp_size(page);
- return end > start ? end - start : 0;
+ if (end > ceph_fscrypt_page_offset(page) + thp_size(page))
+ end = ceph_fscrypt_page_offset(page) + thp_size(page);
+ ret = end > start ? end - start : 0;
+ if (ret && fscrypt_is_bounce_page(page))
+ ret = round_up(ret, CEPH_FSCRYPT_BLOCK_SIZE);
+ return ret;
}

/*
@@ -792,6 +797,11 @@ static void writepages_finish(struct ceph_osd_request *req)
total_pages += num_pages;
for (j = 0; j < num_pages; j++) {
page = osd_data->pages[j];
+ if (fscrypt_is_bounce_page(page)) {
+ page = fscrypt_pagecache_page(page);
+ fscrypt_free_bounce_page(osd_data->pages[j]);
+ osd_data->pages[j] = page;
+ }
BUG_ON(!page);
WARN_ON(!PageUptodate(page));

@@ -1050,10 +1060,28 @@ static int ceph_writepages_start(struct address_space *mapping,
BLK_RW_ASYNC);
}

+ if (IS_ENCRYPTED(inode)) {
+ pages[locked_pages] =
+ fscrypt_encrypt_pagecache_blocks(page,
+ PAGE_SIZE, 0,
+ locked_pages ? GFP_NOWAIT : GFP_NOFS);
+ if (IS_ERR(pages[locked_pages])) {
+ if (PTR_ERR(pages[locked_pages]) == -EINVAL)
+ pr_err("%s: inode->i_blkbits=%hhu\n",
+ __func__, inode->i_blkbits);
+ /* better not fail on first page! */
+ BUG_ON(locked_pages == 0);
+ pages[locked_pages] = NULL;
+ redirty_page_for_writepage(wbc, page);
+ unlock_page(page);
+ break;
+ }
+ ++locked_pages;
+ } else {
+ pages[locked_pages++] = page;
+ }

- pages[locked_pages++] = page;
pvec.pages[i] = NULL;
-
len += thp_size(page);
}

@@ -1081,7 +1109,7 @@ static int ceph_writepages_start(struct address_space *mapping,
}

new_request:
- offset = page_offset(pages[0]);
+ offset = ceph_fscrypt_page_offset(pages[0]);
len = wsize;

req = ceph_osdc_new_request(&fsc->client->osdc,
@@ -1102,8 +1130,8 @@ static int ceph_writepages_start(struct address_space *mapping,
ceph_wbc.truncate_size, true);
BUG_ON(IS_ERR(req));
}
- BUG_ON(len < page_offset(pages[locked_pages - 1]) +
- thp_size(page) - offset);
+ BUG_ON(len < ceph_fscrypt_page_offset(pages[locked_pages - 1]) +
+ thp_size(pages[locked_pages - 1]) - offset);

req->r_callback = writepages_finish;
req->r_inode = inode;
@@ -1113,7 +1141,9 @@ static int ceph_writepages_start(struct address_space *mapping,
data_pages = pages;
op_idx = 0;
for (i = 0; i < locked_pages; i++) {
- u64 cur_offset = page_offset(pages[i]);
+ struct page *page = ceph_fscrypt_pagecache_page(pages[i]);
+
+ u64 cur_offset = page_offset(page);
/*
* Discontinuity in page range? Ceph can handle that by just passing
* multiple extents in the write op.
@@ -1142,9 +1172,9 @@ static int ceph_writepages_start(struct address_space *mapping,
op_idx++;
}

- set_page_writeback(pages[i]);
+ set_page_writeback(page);
if (caching)
- ceph_set_page_fscache(pages[i]);
+ ceph_set_page_fscache(page);
len += thp_size(page);
}
ceph_fscache_write_to_cache(inode, offset, len, caching);
@@ -1160,8 +1190,16 @@ static int ceph_writepages_start(struct address_space *mapping,
offset);
len = max(len, min_len);
}
+ if (IS_ENCRYPTED(inode))
+ len = round_up(len, CEPH_FSCRYPT_BLOCK_SIZE);
+
dout("writepages got pages at %llu~%llu\n", offset, len);

+ if (IS_ENCRYPTED(inode) &&
+ ((offset | len) & ~CEPH_FSCRYPT_BLOCK_MASK))
+ pr_warn("%s: bad encrypted write offset=%lld len=%llu\n",
+ __func__, offset, len);
+
osd_req_op_extent_osd_data_pages(req, op_idx, data_pages, len,
0, from_pool, false);
osd_req_op_extent_update(req, op_idx, len);
diff --git a/fs/ceph/crypto.h b/fs/ceph/crypto.h
index 92a7b221a975..0cf526f07567 100644
--- a/fs/ceph/crypto.h
+++ b/fs/ceph/crypto.h
@@ -146,6 +146,12 @@ int ceph_fscrypt_decrypt_extents(struct inode *inode, struct page **page, u64 of
struct ceph_sparse_extent *map, u32 ext_cnt);
int ceph_fscrypt_encrypt_pages(struct inode *inode, struct page **page, u64 off,
int len, gfp_t gfp);
+
+static inline struct page *ceph_fscrypt_pagecache_page(struct page *page)
+{
+ return fscrypt_is_bounce_page(page) ? fscrypt_pagecache_page(page) : page;
+}
+
#else /* CONFIG_FS_ENCRYPTION */

static inline void ceph_fscrypt_set_ops(struct super_block *sb)
@@ -235,6 +241,16 @@ static inline int ceph_fscrypt_encrypt_pages(struct inode *inode, struct page **
{
return 0;
}
+
+static inline struct page *ceph_fscrypt_pagecache_page(struct page *page)
+{
+ return page;
+}
#endif /* CONFIG_FS_ENCRYPTION */

-#endif
+static inline loff_t ceph_fscrypt_page_offset(struct page *page)
+{
+ return page_offset(ceph_fscrypt_pagecache_page(page));
+}
+
+#endif /* _CEPH_CRYPTO_H */
--
2.35.1

2022-04-05 00:17:55

by Jeffrey Layton

[permalink] [raw]
Subject: [PATCH v12 28/54] ceph: add support to readdir for encrypted filenames

From: Xiubo Li <[email protected]>

Once we've decrypted the names in a readdir reply, we no longer need the
crypttext, so overwrite them in ceph_mds_reply_dir_entry with the
unencrypted names. Then in both ceph_readdir_prepopulate() and
ceph_readdir() we will use the dencrypted name directly.

[ jlayton: convert some BUG_ONs into error returns ]

Signed-off-by: Xiubo Li <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
---
fs/ceph/crypto.c | 12 +++++--
fs/ceph/crypto.h | 1 +
fs/ceph/dir.c | 35 +++++++++++++++----
fs/ceph/inode.c | 12 ++++---
fs/ceph/mds_client.c | 81 ++++++++++++++++++++++++++++++++++++++++----
fs/ceph/mds_client.h | 4 +--
6 files changed, 124 insertions(+), 21 deletions(-)

diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c
index 7cf45d374c1b..c86cc4a7eaf6 100644
--- a/fs/ceph/crypto.c
+++ b/fs/ceph/crypto.c
@@ -142,7 +142,10 @@ int ceph_encode_encrypted_dname(const struct inode *parent, struct qstr *d_name,
int ret;
u8 *cryptbuf;

- WARN_ON_ONCE(!fscrypt_has_encryption_key(parent));
+ if (!fscrypt_has_encryption_key(parent)) {
+ memcpy(buf, d_name->name, d_name->len);
+ return d_name->len;
+ }

/*
* Convert cleartext d_name to ciphertext. If result is longer than
@@ -184,6 +187,8 @@ int ceph_encode_encrypted_dname(const struct inode *parent, struct qstr *d_name,

int ceph_encode_encrypted_fname(const struct inode *parent, struct dentry *dentry, char *buf)
{
+ WARN_ON_ONCE(!fscrypt_has_encryption_key(parent));
+
return ceph_encode_encrypted_dname(parent, &dentry->d_name, buf);
}

@@ -228,7 +233,10 @@ int ceph_fname_to_usr(const struct ceph_fname *fname, struct fscrypt_str *tname,
* generating a nokey name via fscrypt.
*/
if (!fscrypt_has_encryption_key(fname->dir)) {
- memcpy(oname->name, fname->name, fname->name_len);
+ if (fname->no_copy)
+ oname->name = fname->name;
+ else
+ memcpy(oname->name, fname->name, fname->name_len);
oname->len = fname->name_len;
if (is_nokey)
*is_nokey = true;
diff --git a/fs/ceph/crypto.h b/fs/ceph/crypto.h
index e54150260eba..080905b0c73c 100644
--- a/fs/ceph/crypto.h
+++ b/fs/ceph/crypto.h
@@ -19,6 +19,7 @@ struct ceph_fname {
unsigned char *ctext; // binary crypttext (if any)
u32 name_len; // length of name buffer
u32 ctext_len; // length of crypttext
+ bool no_copy;
};

struct ceph_fscrypt_auth {
diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
index caf2547c3fe1..5ce2a6384e55 100644
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -9,6 +9,7 @@

#include "super.h"
#include "mds_client.h"
+#include "crypto.h"

/*
* Directory operations: readdir, lookup, create, link, unlink,
@@ -241,7 +242,9 @@ static int __dcache_readdir(struct file *file, struct dir_context *ctx,
di = ceph_dentry(dentry);
if (d_unhashed(dentry) ||
d_really_is_negative(dentry) ||
- di->lease_shared_gen != shared_gen) {
+ di->lease_shared_gen != shared_gen ||
+ ((dentry->d_flags & DCACHE_NOKEY_NAME) &&
+ fscrypt_has_encryption_key(dir))) {
spin_unlock(&dentry->d_lock);
dput(dentry);
err = -EAGAIN;
@@ -340,6 +343,10 @@ static int ceph_readdir(struct file *file, struct dir_context *ctx)
ctx->pos = 2;
}

+ err = fscrypt_prepare_readdir(inode);
+ if (err)
+ return err;
+
spin_lock(&ci->i_ceph_lock);
/* request Fx cap. if have Fx, we don't need to release Fs cap
* for later create/unlink. */
@@ -389,6 +396,7 @@ static int ceph_readdir(struct file *file, struct dir_context *ctx)
req = ceph_mdsc_create_request(mdsc, op, USE_AUTH_MDS);
if (IS_ERR(req))
return PTR_ERR(req);
+
err = ceph_alloc_readdir_reply_buffer(req, inode);
if (err) {
ceph_mdsc_put_request(req);
@@ -402,11 +410,20 @@ static int ceph_readdir(struct file *file, struct dir_context *ctx)
req->r_inode_drop = CEPH_CAP_FILE_EXCL;
}
if (dfi->last_name) {
- req->r_path2 = kstrdup(dfi->last_name, GFP_KERNEL);
+ struct qstr d_name = { .name = dfi->last_name,
+ .len = strlen(dfi->last_name) };
+
+ req->r_path2 = kzalloc(NAME_MAX + 1, GFP_KERNEL);
if (!req->r_path2) {
ceph_mdsc_put_request(req);
return -ENOMEM;
}
+
+ err = ceph_encode_encrypted_dname(inode, &d_name, req->r_path2);
+ if (err < 0) {
+ ceph_mdsc_put_request(req);
+ return err;
+ }
} else if (is_hash_order(ctx->pos)) {
req->r_args.readdir.offset_hash =
cpu_to_le32(fpos_hash(ctx->pos));
@@ -511,15 +528,20 @@ static int ceph_readdir(struct file *file, struct dir_context *ctx)
for (; i < rinfo->dir_nr; i++) {
struct ceph_mds_reply_dir_entry *rde = rinfo->dir_entries + i;

- BUG_ON(rde->offset < ctx->pos);
+ if (rde->offset < ctx->pos) {
+ pr_warn("%s: rde->offset 0x%llx ctx->pos 0x%llx\n",
+ __func__, rde->offset, ctx->pos);
+ return -EIO;
+ }
+
+ if (WARN_ON_ONCE(!rde->inode.in))
+ return -EIO;

ctx->pos = rde->offset;
dout("readdir (%d/%d) -> %llx '%.*s' %p\n",
i, rinfo->dir_nr, ctx->pos,
rde->name_len, rde->name, &rde->inode.in);

- BUG_ON(!rde->inode.in);
-
if (!dir_emit(ctx, rde->name, rde->name_len,
ceph_present_ino(inode->i_sb, le64_to_cpu(rde->inode.in->ino)),
le32_to_cpu(rde->inode.in->mode) >> 12)) {
@@ -532,6 +554,8 @@ static int ceph_readdir(struct file *file, struct dir_context *ctx)
dout("filldir stopping us...\n");
return 0;
}
+
+ /* Reset the lengths to their original allocated vals */
ctx->pos++;
}

@@ -586,7 +610,6 @@ static int ceph_readdir(struct file *file, struct dir_context *ctx)
dfi->dir_ordered_count);
spin_unlock(&ci->i_ceph_lock);
}
-
dout("readdir %p file %p done.\n", inode, file);
return 0;
}
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 684dfc3f006c..98ac1369b353 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -1750,7 +1750,8 @@ int ceph_readdir_prepopulate(struct ceph_mds_request *req,
struct ceph_mds_session *session)
{
struct dentry *parent = req->r_dentry;
- struct ceph_inode_info *ci = ceph_inode(d_inode(parent));
+ struct inode *inode = d_inode(parent);
+ struct ceph_inode_info *ci = ceph_inode(inode);
struct ceph_mds_reply_info_parsed *rinfo = &req->r_reply_info;
struct qstr dname;
struct dentry *dn;
@@ -1824,9 +1825,7 @@ int ceph_readdir_prepopulate(struct ceph_mds_request *req,
tvino.snap = le64_to_cpu(rde->inode.in->snapid);

if (rinfo->hash_order) {
- u32 hash = ceph_str_hash(ci->i_dir_layout.dl_dir_hash,
- rde->name, rde->name_len);
- hash = ceph_frag_value(hash);
+ u32 hash = ceph_frag_value(rde->raw_hash);
if (hash != last_hash)
fpos_offset = 2;
last_hash = hash;
@@ -1849,6 +1848,11 @@ int ceph_readdir_prepopulate(struct ceph_mds_request *req,
err = -ENOMEM;
goto out;
}
+ if (rde->is_nokey) {
+ spin_lock(&dn->d_lock);
+ dn->d_flags |= DCACHE_NOKEY_NAME;
+ spin_unlock(&dn->d_lock);
+ }
} else if (d_really_is_positive(dn) &&
(ceph_ino(d_inode(dn)) != tvino.ino ||
ceph_snap(d_inode(dn)) != tvino.snap)) {
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 0a7f18d4df73..50fe77768295 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -439,20 +439,87 @@ static int parse_reply_info_readdir(void **p, void *end,

info->dir_nr = num;
while (num) {
+ struct inode *inode = d_inode(req->r_dentry);
+ struct ceph_inode_info *ci = ceph_inode(inode);
struct ceph_mds_reply_dir_entry *rde = info->dir_entries + i;
+ struct fscrypt_str tname = FSTR_INIT(NULL, 0);
+ struct fscrypt_str oname = FSTR_INIT(NULL, 0);
+ struct ceph_fname fname;
+ u32 altname_len, _name_len;
+ u8 *altname, *_name;
+
/* dentry */
- ceph_decode_32_safe(p, end, rde->name_len, bad);
- ceph_decode_need(p, end, rde->name_len, bad);
- rde->name = *p;
- *p += rde->name_len;
- dout("parsed dir dname '%.*s'\n", rde->name_len, rde->name);
+ ceph_decode_32_safe(p, end, _name_len, bad);
+ ceph_decode_need(p, end, _name_len, bad);
+ _name = *p;
+ *p += _name_len;
+ dout("parsed dir dname '%.*s'\n", _name_len, _name);
+
+ if (info->hash_order)
+ rde->raw_hash = ceph_str_hash(ci->i_dir_layout.dl_dir_hash,
+ _name, _name_len);

/* dentry lease */
err = parse_reply_info_lease(p, end, &rde->lease, features,
- &rde->altname_len, &rde->altname);
+ &altname_len, &altname);
if (err)
goto out_bad;

+ /*
+ * Try to dencrypt the dentry names and update them
+ * in the ceph_mds_reply_dir_entry struct.
+ */
+ fname.dir = inode;
+ fname.name = _name;
+ fname.name_len = _name_len;
+ fname.ctext = altname;
+ fname.ctext_len = altname_len;
+ /*
+ * The _name_len maybe larger than altname_len, such as
+ * when the human readable name length is in range of
+ * (CEPH_NOHASH_NAME_MAX, CEPH_NOHASH_NAME_MAX + SHA256_DIGEST_SIZE),
+ * then the copy in ceph_fname_to_usr will corrupt the
+ * data if there has no encryption key.
+ *
+ * Just set the no_copy flag and then if there has no
+ * encryption key the oname.name will be assigned to
+ * _name always.
+ */
+ fname.no_copy = true;
+ if (altname_len == 0) {
+ /*
+ * Set tname to _name, and this will be used
+ * to do the base64_decode in-place. It's
+ * safe because the decoded string should
+ * always be shorter, which is 3/4 of origin
+ * string.
+ */
+ tname.name = _name;
+
+ /*
+ * Set oname to _name too, and this will be
+ * used to do the dencryption in-place.
+ */
+ oname.name = _name;
+ oname.len = _name_len;
+ } else {
+ /*
+ * This will do the decryption only in-place
+ * from altname cryptext directly.
+ */
+ oname.name = altname;
+ oname.len = altname_len;
+ }
+ rde->is_nokey = false;
+ err = ceph_fname_to_usr(&fname, &tname, &oname, &rde->is_nokey);
+ if (err) {
+ pr_err("%s unable to decode %.*s, got %d\n", __func__,
+ _name_len, _name, err);
+ goto out_bad;
+ }
+ rde->name = oname.name;
+ rde->name_len = oname.len;
+
/* inode */
err = parse_reply_info_in(p, end, &rde->inode, features);
if (err < 0)
@@ -3501,7 +3568,7 @@ static void handle_reply(struct ceph_mds_session *session, struct ceph_msg *msg)
if (err == 0) {
if (result == 0 && (req->r_op == CEPH_MDS_OP_READDIR ||
req->r_op == CEPH_MDS_OP_LSSNAP))
- ceph_readdir_prepopulate(req, req->r_session);
+ err = ceph_readdir_prepopulate(req, req->r_session);
}
current->journal_info = NULL;
mutex_unlock(&req->r_fill_mutex);
diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
index cd719691a86d..046a9368c4a9 100644
--- a/fs/ceph/mds_client.h
+++ b/fs/ceph/mds_client.h
@@ -96,10 +96,10 @@ struct ceph_mds_reply_info_in {
};

struct ceph_mds_reply_dir_entry {
+ bool is_nokey;
char *name;
- u8 *altname;
u32 name_len;
- u32 altname_len;
+ u32 raw_hash;
struct ceph_mds_reply_lease *lease;
struct ceph_mds_reply_info_in inode;
loff_t offset;
--
2.35.1