LinuxLists.cc - [RFC] is ovl_fh->fid really intended to be misaligned?

2019-11-14 15:48:20

Subject: [RFC] is ovl_fh->fid really intended to be misaligned?

AFAICS, this
bytes = (fh->len - offsetof(struct ovl_fh, fid));
real = exportfs_decode_fh(mnt, (struct fid *)fh->fid,
bytes >> 2, (int)fh->type,
connected ? ovl_acceptable : NULL, mnt);
in ovl_decode_real_fh() combined with
origin = ovl_decode_real_fh(fh, ofs->lower_layers[i].mnt,
connected);
in ovl_check_origin_fh(),
/* First lookup overlay inode in inode cache by origin fh */
err = ovl_check_origin_fh(ofs, fh, false, NULL, &stack);
in ovl_lower_fh_to_d() and
struct ovl_fh *fh = (struct ovl_fh *) fid;
...
ovl_lower_fh_to_d(sb, fh);
in ovl_fh_to_dentry() leads to the pointer to struct fid passed to
exportfs_decode_fh() being 21 bytes ahead of that passed to
ovl_fh_to_dentry().

However, alignment of struct fid pointers is 32 bits and quite a few
places dealing with those (including ->fh_to_dentry() instances)
do access them directly. Argument of ->fh_to_dentry() is supposed
to be 32bit-aligned, and callers generally guarantee that. Your
code, OTOH, violates the alignment systematically there - what
it passes to layers' ->fh_to_dentry() (by way of exportfs_decode_fh())
always has two lower bits different from what it got itself.

What do we do with that? One solution would be to insert sane padding
in ovl_fh, but the damn thing appears to be stored as-is in xattrs on
disk, so that would require rather unpleasant operations reinserting
the padding on the fly ;-/

Another is to declare struct fid unaligned with explicit use of
__aligned in declaration and let all code normally dealing with
those pay the price. Frankly, I don't like that - it's overlayfs
mess, so penalizing the much older users doesn't sound like a good idea.
Worse, any code that does (like overlayfs) cast the incoming
struct fid * to a pointer to its own struct will still be in
trouble - explicit cast is explicit cast, and it discards all
alignment information (as yours has done).

BTW, your ->encode_fh() appears to report the length greater than
the object it has returned... Granted, the 3 overreported bytes
will be right after the end of 4n+1-byte kmalloc'ed area, so they
won't fall over the page boundary, but the values in those are left
uninitialized. And they are sent in over-the-wire representations;
you ignore those on decode, but they are there.

2019-11-14 17:06:53

by J. Bruce Fields

[permalink] [raw]

Subject: Re: [RFC] is ovl_fh->fid really intended to be misaligned?

2019-11-14 19:56:37

by Miklos Szeredi

[permalink] [raw]

Subject: Re: [RFC] is ovl_fh->fid really intended to be misaligned?

On Thu, Nov 14, 2019 at 03:47:23PM +0000, Al Viro wrote:
> AFAICS, this
> bytes = (fh->len - offsetof(struct ovl_fh, fid));
> real = exportfs_decode_fh(mnt, (struct fid *)fh->fid,
> bytes >> 2, (int)fh->type,
> connected ? ovl_acceptable : NULL, mnt);
> in ovl_decode_real_fh() combined with
> origin = ovl_decode_real_fh(fh, ofs->lower_layers[i].mnt,
> connected);
> in ovl_check_origin_fh(),
> /* First lookup overlay inode in inode cache by origin fh */
> err = ovl_check_origin_fh(ofs, fh, false, NULL, &stack);
> in ovl_lower_fh_to_d() and
> struct ovl_fh *fh = (struct ovl_fh *) fid;
> ...
> ovl_lower_fh_to_d(sb, fh);
> in ovl_fh_to_dentry() leads to the pointer to struct fid passed to
> exportfs_decode_fh() being 21 bytes ahead of that passed to
> ovl_fh_to_dentry().
>
> However, alignment of struct fid pointers is 32 bits and quite a few
> places dealing with those (including ->fh_to_dentry() instances)
> do access them directly. Argument of ->fh_to_dentry() is supposed
> to be 32bit-aligned, and callers generally guarantee that. Your
> code, OTOH, violates the alignment systematically there - what
> it passes to layers' ->fh_to_dentry() (by way of exportfs_decode_fh())
> always has two lower bits different from what it got itself.
>
> What do we do with that? One solution would be to insert sane padding
> in ovl_fh, but the damn thing appears to be stored as-is in xattrs on
> disk, so that would require rather unpleasant operations reinserting
> the padding on the fly ;-/

Something like this? Totally untested...

Thanks,
Miklos

diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index b801c6353100..60a4ca72cb4e 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -253,7 +253,7 @@ struct ovl_fh *ovl_encode_real_fh(struct dentry *real, bool is_upper)

BUILD_BUG_ON(MAX_HANDLE_SZ + offsetof(struct ovl_fh, fid) > 255);
fh_len = offsetof(struct ovl_fh, fid) + buflen;
- fh = kmalloc(fh_len, GFP_KERNEL);
+ fh = kzalloc(fh_len, GFP_KERNEL);
if (!fh) {
fh = ERR_PTR(-ENOMEM);
goto out;
@@ -271,7 +271,7 @@ struct ovl_fh *ovl_encode_real_fh(struct dentry *real, bool is_upper)
*/
if (is_upper)
fh->flags |= OVL_FH_FLAG_PATH_UPPER;
- fh->len = fh_len;
+ fh->len = fh_len - OVL_FH_WIRE_OFFSET;
fh->uuid = *uuid;
memcpy(fh->fid, buf, buflen);

@@ -300,7 +300,8 @@ int ovl_set_origin(struct dentry *dentry, struct dentry *lower,
/*
* Do not fail when upper doesn't support xattrs.
*/
- err = ovl_check_setxattr(dentry, upper, OVL_XATTR_ORIGIN, fh,
+ err = ovl_check_setxattr(dentry, upper, OVL_XATTR_ORIGIN,
+ fh ? OVL_FH_START(fh) : NULL,
fh ? fh->len : 0, 0);
kfree(fh);

@@ -317,7 +318,8 @@ static int ovl_set_upper_fh(struct dentry *upper, struct dentry *index)
if (IS_ERR(fh))
return PTR_ERR(fh);

- err = ovl_do_setxattr(index, OVL_XATTR_UPPER, fh, fh->len, 0);
+ err = ovl_do_setxattr(index, OVL_XATTR_UPPER,
+ OVL_FH_START(fh), fh->len, 0);

kfree(fh);
return err;
diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c
index 73c9775215b3..dedda95c7746 100644
--- a/fs/overlayfs/export.c
+++ b/fs/overlayfs/export.c
@@ -234,7 +234,7 @@ static int ovl_d_to_fh(struct dentry *dentry, char *buf, int buflen)
if (fh->len > buflen)
goto fail;

- memcpy(buf, (char *)fh, fh->len);
+ memcpy(buf, OVL_FH_START(fh), fh->len);
err = fh->len;

out:
@@ -260,6 +260,7 @@ static int ovl_dentry_to_fh(struct dentry *dentry, u32 *fid, int *max_len)

/* Round up to dwords */
*max_len = (len + 3) >> 2;
+ memset(fid + len, 0, (*max_len << 2) - len);
return OVL_FILEID;
}

@@ -781,7 +782,7 @@ static struct dentry *ovl_fh_to_dentry(struct super_block *sb, struct fid *fid,
int fh_len, int fh_type)
{
struct dentry *dentry = NULL;
- struct ovl_fh *fh = (struct ovl_fh *) fid;
+ struct ovl_fh *fh = (void *) fid - OVL_FH_WIRE_OFFSET;
int len = fh_len << 2;
unsigned int flags = 0;
int err;
diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
index e9717c2f7d45..f22a65359df1 100644
--- a/fs/overlayfs/namei.c
+++ b/fs/overlayfs/namei.c
@@ -86,7 +86,8 @@ static int ovl_acceptable(void *ctx, struct dentry *dentry)
*/
int ovl_check_fh_len(struct ovl_fh *fh, int fh_len)
{
- if (fh_len < sizeof(struct ovl_fh) || fh_len < fh->len)
+ if (fh_len < sizeof(struct ovl_fh) - OVL_FH_WIRE_OFFSET ||
+ fh_len < fh->len)
return -EINVAL;

if (fh->magic != OVL_FH_MAGIC)
@@ -119,11 +120,11 @@ static struct ovl_fh *ovl_get_fh(struct dentry *dentry, const char *name)
if (res == 0)
return NULL;

- fh = kzalloc(res, GFP_KERNEL);
+ fh = kzalloc(res + OVL_FH_WIRE_OFFSET, GFP_KERNEL);
if (!fh)
return ERR_PTR(-ENOMEM);

- res = vfs_getxattr(dentry, name, fh, res);
+ res = vfs_getxattr(dentry, name, fh + OVL_FH_WIRE_OFFSET, res);
if (res < 0)
goto fail;

@@ -161,7 +162,7 @@ struct dentry *ovl_decode_real_fh(struct ovl_fh *fh, struct vfsmount *mnt,
if (!uuid_equal(&fh->uuid, &mnt->mnt_sb->s_uuid))
return NULL;

- bytes = (fh->len - offsetof(struct ovl_fh, fid));
+ bytes = (fh->len + OVL_FH_WIRE_OFFSET - offsetof(struct ovl_fh, fid));
real = exportfs_decode_fh(mnt, (struct fid *)fh->fid,
bytes >> 2, (int)fh->type,
connected ? ovl_acceptable : NULL, mnt);
@@ -433,7 +434,8 @@ int ovl_verify_set_fh(struct dentry *dentry, const char *name,

err = ovl_verify_fh(dentry, name, fh);
if (set && err == -ENODATA)
- err = ovl_do_setxattr(dentry, name, fh, fh->len, 0);
+ err = ovl_do_setxattr(dentry, name,
+ OVL_FH_START(fh), fh->len, 0);
if (err)
goto fail;

@@ -512,12 +514,12 @@ int ovl_verify_index(struct ovl_fs *ofs, struct dentry *index)

err = -ENOMEM;
len = index->d_name.len / 2;
- fh = kzalloc(len, GFP_KERNEL);
+ fh = kzalloc(len + OVL_FH_WIRE_OFFSET, GFP_KERNEL);
if (!fh)
goto fail;

err = -EINVAL;
- if (hex2bin((u8 *)fh, index->d_name.name, len))
+ if (hex2bin(OVL_FH_START(fh), index->d_name.name, len))
goto fail;

err = ovl_check_fh_len(fh, len);
@@ -603,7 +605,7 @@ static int ovl_get_index_name_fh(struct ovl_fh *fh, struct qstr *name)
if (!n)
return -ENOMEM;

- s = bin2hex(n, fh, fh->len);
+ s = bin2hex(n, OVL_FH_START(fh), fh->len);
*name = (struct qstr) QSTR_INIT(n, s - n);

return 0;
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 6934bcf030f0..c62083671a12 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -74,8 +74,13 @@ enum ovl_entry_flag {
/* The type returned by overlay exportfs ops when encoding an ovl_fh handle */
#define OVL_FILEID 0xfb

-/* On-disk and in-memeory format for redirect by file handle */
+#define OVL_FH_WIRE_OFFSET 3
+#define OVL_FH_START(fh) ((void *)(fh) + OVL_FH_WIRE_OFFSET)
struct ovl_fh {
+ /* make sure fid is 32bit aligned */
+ u8 padding[OVL_FH_WIRE_OFFSET];
+
+ /* Wire/xattr encoding begins here*/
u8 version; /* 0 */
u8 magic; /* 0xfb */
u8 len; /* size of this header + size of fid */

2019-11-14 20:08:21

by Amir Goldstein

[permalink] [raw]

Subject: Re: [RFC] is ovl_fh->fid really intended to be misaligned?

On Thu, Nov 14, 2019 at 9:55 PM Miklos Szeredi <[email protected]> wrote:
>
> On Thu, Nov 14, 2019 at 03:47:23PM +0000, Al Viro wrote:
> > AFAICS, this
> > bytes = (fh->len - offsetof(struct ovl_fh, fid));
> > real = exportfs_decode_fh(mnt, (struct fid *)fh->fid,
> > bytes >> 2, (int)fh->type,
> > connected ? ovl_acceptable : NULL, mnt);
> > in ovl_decode_real_fh() combined with
> > origin = ovl_decode_real_fh(fh, ofs->lower_layers[i].mnt,
> > connected);
> > in ovl_check_origin_fh(),
> > /* First lookup overlay inode in inode cache by origin fh */
> > err = ovl_check_origin_fh(ofs, fh, false, NULL, &stack);
> > in ovl_lower_fh_to_d() and
> > struct ovl_fh *fh = (struct ovl_fh *) fid;
> > ...
> > ovl_lower_fh_to_d(sb, fh);
> > in ovl_fh_to_dentry() leads to the pointer to struct fid passed to
> > exportfs_decode_fh() being 21 bytes ahead of that passed to
> > ovl_fh_to_dentry().
> >
> > However, alignment of struct fid pointers is 32 bits and quite a few
> > places dealing with those (including ->fh_to_dentry() instances)
> > do access them directly. Argument of ->fh_to_dentry() is supposed
> > to be 32bit-aligned, and callers generally guarantee that. Your
> > code, OTOH, violates the alignment systematically there - what
> > it passes to layers' ->fh_to_dentry() (by way of exportfs_decode_fh())
> > always has two lower bits different from what it got itself.
> >
> > What do we do with that? One solution would be to insert sane padding
> > in ovl_fh, but the damn thing appears to be stored as-is in xattrs on
> > disk, so that would require rather unpleasant operations reinserting
> > the padding on the fly ;-/
>
> Something like this? Totally untested...
>

I was going to suggest something similar using

struct ovl_fhv1 {
u8 pad[3];
struct ovl_fh fhv0;
} __packed;

New overlayfs exported file handles on-wire could be ovl_fhv1,
but we can easily convert old ovl_fhv to ovl_fhv1
on-the-fly on decode (if we care about those few users at all)

xattrs would still be stored and read as ovl_fh v0.

Thanks,
Amir.

>
> diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
> index b801c6353100..60a4ca72cb4e 100644
> --- a/fs/overlayfs/copy_up.c
> +++ b/fs/overlayfs/copy_up.c
> @@ -253,7 +253,7 @@ struct ovl_fh *ovl_encode_real_fh(struct dentry *real, bool is_upper)
>
> BUILD_BUG_ON(MAX_HANDLE_SZ + offsetof(struct ovl_fh, fid) > 255);
> fh_len = offsetof(struct ovl_fh, fid) + buflen;
> - fh = kmalloc(fh_len, GFP_KERNEL);
> + fh = kzalloc(fh_len, GFP_KERNEL);
> if (!fh) {
> fh = ERR_PTR(-ENOMEM);
> goto out;
> @@ -271,7 +271,7 @@ struct ovl_fh *ovl_encode_real_fh(struct dentry *real, bool is_upper)
> */
> if (is_upper)
> fh->flags |= OVL_FH_FLAG_PATH_UPPER;
> - fh->len = fh_len;
> + fh->len = fh_len - OVL_FH_WIRE_OFFSET;
> fh->uuid = *uuid;
> memcpy(fh->fid, buf, buflen);
>
> @@ -300,7 +300,8 @@ int ovl_set_origin(struct dentry *dentry, struct dentry *lower,
> /*
> * Do not fail when upper doesn't support xattrs.
> */
> - err = ovl_check_setxattr(dentry, upper, OVL_XATTR_ORIGIN, fh,
> + err = ovl_check_setxattr(dentry, upper, OVL_XATTR_ORIGIN,
> + fh ? OVL_FH_START(fh) : NULL,
> fh ? fh->len : 0, 0);
> kfree(fh);
>
> @@ -317,7 +318,8 @@ static int ovl_set_upper_fh(struct dentry *upper, struct dentry *index)
> if (IS_ERR(fh))
> return PTR_ERR(fh);
>
> - err = ovl_do_setxattr(index, OVL_XATTR_UPPER, fh, fh->len, 0);
> + err = ovl_do_setxattr(index, OVL_XATTR_UPPER,
> + OVL_FH_START(fh), fh->len, 0);
>
> kfree(fh);
> return err;
> diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c
> index 73c9775215b3..dedda95c7746 100644
> --- a/fs/overlayfs/export.c
> +++ b/fs/overlayfs/export.c
> @@ -234,7 +234,7 @@ static int ovl_d_to_fh(struct dentry *dentry, char *buf, int buflen)
> if (fh->len > buflen)
> goto fail;
>
> - memcpy(buf, (char *)fh, fh->len);
> + memcpy(buf, OVL_FH_START(fh), fh->len);
> err = fh->len;
>
> out:
> @@ -260,6 +260,7 @@ static int ovl_dentry_to_fh(struct dentry *dentry, u32 *fid, int *max_len)
>
> /* Round up to dwords */
> *max_len = (len + 3) >> 2;
> + memset(fid + len, 0, (*max_len << 2) - len);
> return OVL_FILEID;
> }
>
> @@ -781,7 +782,7 @@ static struct dentry *ovl_fh_to_dentry(struct super_block *sb, struct fid *fid,
> int fh_len, int fh_type)
> {
> struct dentry *dentry = NULL;
> - struct ovl_fh *fh = (struct ovl_fh *) fid;
> + struct ovl_fh *fh = (void *) fid - OVL_FH_WIRE_OFFSET;
> int len = fh_len << 2;
> unsigned int flags = 0;
> int err;
> diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
> index e9717c2f7d45..f22a65359df1 100644
> --- a/fs/overlayfs/namei.c
> +++ b/fs/overlayfs/namei.c
> @@ -86,7 +86,8 @@ static int ovl_acceptable(void *ctx, struct dentry *dentry)
> */
> int ovl_check_fh_len(struct ovl_fh *fh, int fh_len)
> {
> - if (fh_len < sizeof(struct ovl_fh) || fh_len < fh->len)
> + if (fh_len < sizeof(struct ovl_fh) - OVL_FH_WIRE_OFFSET ||
> + fh_len < fh->len)
> return -EINVAL;
>
> if (fh->magic != OVL_FH_MAGIC)
> @@ -119,11 +120,11 @@ static struct ovl_fh *ovl_get_fh(struct dentry *dentry, const char *name)
> if (res == 0)
> return NULL;
>
> - fh = kzalloc(res, GFP_KERNEL);
> + fh = kzalloc(res + OVL_FH_WIRE_OFFSET, GFP_KERNEL);
> if (!fh)
> return ERR_PTR(-ENOMEM);
>
> - res = vfs_getxattr(dentry, name, fh, res);
> + res = vfs_getxattr(dentry, name, fh + OVL_FH_WIRE_OFFSET, res);
> if (res < 0)
> goto fail;
>
> @@ -161,7 +162,7 @@ struct dentry *ovl_decode_real_fh(struct ovl_fh *fh, struct vfsmount *mnt,
> if (!uuid_equal(&fh->uuid, &mnt->mnt_sb->s_uuid))
> return NULL;
>
> - bytes = (fh->len - offsetof(struct ovl_fh, fid));
> + bytes = (fh->len + OVL_FH_WIRE_OFFSET - offsetof(struct ovl_fh, fid));
> real = exportfs_decode_fh(mnt, (struct fid *)fh->fid,
> bytes >> 2, (int)fh->type,
> connected ? ovl_acceptable : NULL, mnt);
> @@ -433,7 +434,8 @@ int ovl_verify_set_fh(struct dentry *dentry, const char *name,
>
> err = ovl_verify_fh(dentry, name, fh);
> if (set && err == -ENODATA)
> - err = ovl_do_setxattr(dentry, name, fh, fh->len, 0);
> + err = ovl_do_setxattr(dentry, name,
> + OVL_FH_START(fh), fh->len, 0);
> if (err)
> goto fail;
>
> @@ -512,12 +514,12 @@ int ovl_verify_index(struct ovl_fs *ofs, struct dentry *index)
>
> err = -ENOMEM;
> len = index->d_name.len / 2;
> - fh = kzalloc(len, GFP_KERNEL);
> + fh = kzalloc(len + OVL_FH_WIRE_OFFSET, GFP_KERNEL);
> if (!fh)
> goto fail;
>
> err = -EINVAL;
> - if (hex2bin((u8 *)fh, index->d_name.name, len))
> + if (hex2bin(OVL_FH_START(fh), index->d_name.name, len))
> goto fail;
>
> err = ovl_check_fh_len(fh, len);
> @@ -603,7 +605,7 @@ static int ovl_get_index_name_fh(struct ovl_fh *fh, struct qstr *name)
> if (!n)
> return -ENOMEM;
>
> - s = bin2hex(n, fh, fh->len);
> + s = bin2hex(n, OVL_FH_START(fh), fh->len);
> *name = (struct qstr) QSTR_INIT(n, s - n);
>
> return 0;
> diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
> index 6934bcf030f0..c62083671a12 100644
> --- a/fs/overlayfs/overlayfs.h
> +++ b/fs/overlayfs/overlayfs.h
> @@ -74,8 +74,13 @@ enum ovl_entry_flag {
> /* The type returned by overlay exportfs ops when encoding an ovl_fh handle */
> #define OVL_FILEID 0xfb
>
> -/* On-disk and in-memeory format for redirect by file handle */
> +#define OVL_FH_WIRE_OFFSET 3
> +#define OVL_FH_START(fh) ((void *)(fh) + OVL_FH_WIRE_OFFSET)
> struct ovl_fh {
> + /* make sure fid is 32bit aligned */
> + u8 padding[OVL_FH_WIRE_OFFSET];
> +
> + /* Wire/xattr encoding begins here*/
> u8 version; /* 0 */
> u8 magic; /* 0xfb */
> u8 len; /* size of this header + size of fid */

2019-11-14 20:56:13

by Al Viro

[permalink] [raw]

Subject: Re: [RFC] is ovl_fh->fid really intended to be misaligned?

On Thu, Nov 14, 2019 at 08:55:44PM +0100, Miklos Szeredi wrote:

> BUILD_BUG_ON(MAX_HANDLE_SZ + offsetof(struct ovl_fh, fid) > 255);
> fh_len = offsetof(struct ovl_fh, fid) + buflen;

IOW, 3 bytes longer now. OK...

> - fh = kmalloc(fh_len, GFP_KERNEL);
> + fh = kzalloc(fh_len, GFP_KERNEL);
> if (!fh) {
> fh = ERR_PTR(-ENOMEM);
> goto out;
> @@ -271,7 +271,7 @@ struct ovl_fh *ovl_encode_real_fh(struct dentry *real, bool is_upper)
> */
> if (is_upper)
> fh->flags |= OVL_FH_FLAG_PATH_UPPER;
> - fh->len = fh_len;
> + fh->len = fh_len - OVL_FH_WIRE_OFFSET;

... same value as before

> fh->uuid = *uuid;
> memcpy(fh->fid, buf, buflen);

Is there any point to two allocations + memcpy here? It's not as if you kept those
ovl_fh instances allocated for a long time - the caller frees them shortly. So
why bother with buf at all? Just allocate your fh max-sized and bloody pass &fh->fid
to exportfs_encode_fh() there. I would suggest this for declaration, if you want
to pad it in front:
struct ovl_fh {
u8 __pad[OVL_FH_WIRE_OFFSET];
u8 version; /* 0 */
u8 magic; /* 0xfb */
u8 len; /* size of this header + size of fid */
u8 flags; /* OVL_FH_FLAG_* */
u8 type; /* fid_type of fid */
uuid_t uuid; /* uuid of filesystem */
union {
struct fid fid; /* file identifier */
u8 storage[];
};
} __packed;
with BUILD_BUG_ON(offsetof(struct ovl_fh, fid) % 4); somewhere in there.
Size calculation would be
fh_len = offsetof(struct ovl_fh, storage[MAX_HANDLE_SIZE]);

And check that resulting fhandle size does *not* exceed 32 words, just to make sure.

> @@ -234,7 +234,7 @@ static int ovl_d_to_fh(struct dentry *dentry, char *buf, int buflen)
> if (fh->len > buflen)
> goto fail;
>
> - memcpy(buf, (char *)fh, fh->len);
> + memcpy(buf, OVL_FH_START(fh), fh->len);
> err = fh->len;

Incidentally, memcpy() doesn't need any casts - it takes any data pointer types
just fine...

> out:
> @@ -260,6 +260,7 @@ static int ovl_dentry_to_fh(struct dentry *dentry, u32 *fid, int *max_len)
>
> /* Round up to dwords */
> *max_len = (len + 3) >> 2;
> + memset(fid + len, 0, (*max_len << 2) - len);

No. This is broken - fid is u32 pointer. What you want here is

memcpy(fid, OVL_FH_START(fh), fh->len);
memset((char *)fid + fh->len, 0, OVL_FH_START(fh), OVL_FH_WIRE_OFFSET);
len = fh->len + OVL_FH_WIRE_OFFSET;

in the d_to_fh part, then just have *max_len = len / 4;

Or just do the max-sized allocation for fh and have ovl_encode_real_fh()
zero everything past fh->len; then this would just be a plain memcpy() and
to hell with separate memset()...

Incidentally, I really don't understand why these three functions separated
from each other. Why not do all of that in ovl_encode_fh()? AFAICS, the
logics gets simpler than way.

> return OVL_FILEID;
> }

> @@ -781,7 +782,7 @@ static struct dentry *ovl_fh_to_dentry(struct super_block *sb, struct fid *fid,
> int fh_len, int fh_type)
> {
> struct dentry *dentry = NULL;
> - struct ovl_fh *fh = (struct ovl_fh *) fid;
> + struct ovl_fh *fh = (void *) fid - OVL_FH_WIRE_OFFSET;

Not enough, I'm afraid... Look what happens when you get to passing the
payload to ovl_decode_real_fh(). You pass a misaligned pointer (1 mod 4)
in there, then an equally misaligned pointer is passed to exportfs_decode_fh().
You really need to move that sucker here - the underlying fhandle is really
misaligned in what gets passed to you.

> @@ -119,11 +120,11 @@ static struct ovl_fh *ovl_get_fh(struct dentry *dentry, const char *name)
> if (res == 0)
> return NULL;
>
> - fh = kzalloc(res, GFP_KERNEL);
> + fh = kzalloc(res + OVL_FH_WIRE_OFFSET, GFP_KERNEL);
> if (!fh)
> return ERR_PTR(-ENOMEM);
>
> - res = vfs_getxattr(dentry, name, fh, res);
> + res = vfs_getxattr(dentry, name, fh + OVL_FH_WIRE_OFFSET, res);
> if (res < 0)
> goto fail;

BTW, is there any point in precisely-sized allocations here? Again,
the result won't live long, and we know the upper limit on the size,
so why bother with two vfs_getxattr() calls, etc?

2019-11-14 23:14:17

by Amir Goldstein

[permalink] [raw]

Subject: Re: [RFC] is ovl_fh->fid really intended to be misaligned?

On Thu, Nov 14, 2019 at 10:07 PM Amir Goldstein <[email protected]> wrote:
>
> On Thu, Nov 14, 2019 at 9:55 PM Miklos Szeredi <[email protected]> wrote:
> >
> > On Thu, Nov 14, 2019 at 03:47:23PM +0000, Al Viro wrote:
> > > AFAICS, this
> > > bytes = (fh->len - offsetof(struct ovl_fh, fid));
> > > real = exportfs_decode_fh(mnt, (struct fid *)fh->fid,
> > > bytes >> 2, (int)fh->type,
> > > connected ? ovl_acceptable : NULL, mnt);
> > > in ovl_decode_real_fh() combined with
> > > origin = ovl_decode_real_fh(fh, ofs->lower_layers[i].mnt,
> > > connected);
> > > in ovl_check_origin_fh(),
> > > /* First lookup overlay inode in inode cache by origin fh */
> > > err = ovl_check_origin_fh(ofs, fh, false, NULL, &stack);
> > > in ovl_lower_fh_to_d() and
> > > struct ovl_fh *fh = (struct ovl_fh *) fid;
> > > ...
> > > ovl_lower_fh_to_d(sb, fh);
> > > in ovl_fh_to_dentry() leads to the pointer to struct fid passed to
> > > exportfs_decode_fh() being 21 bytes ahead of that passed to
> > > ovl_fh_to_dentry().
> > >
> > > However, alignment of struct fid pointers is 32 bits and quite a few
> > > places dealing with those (including ->fh_to_dentry() instances)
> > > do access them directly. Argument of ->fh_to_dentry() is supposed
> > > to be 32bit-aligned, and callers generally guarantee that. Your
> > > code, OTOH, violates the alignment systematically there - what
> > > it passes to layers' ->fh_to_dentry() (by way of exportfs_decode_fh())
> > > always has two lower bits different from what it got itself.
> > >
> > > What do we do with that? One solution would be to insert sane padding
> > > in ovl_fh, but the damn thing appears to be stored as-is in xattrs on
> > > disk, so that would require rather unpleasant operations reinserting
> > > the padding on the fly ;-/
> >
> > Something like this? Totally untested...
> >
>
> I was going to suggest something similar using
>
> struct ovl_fhv1 {
> u8 pad[3];
> struct ovl_fh fhv0;
> } __packed;
>
> New overlayfs exported file handles on-wire could be ovl_fhv1,
> but we can easily convert old ovl_fhv to ovl_fhv1
> on-the-fly on decode (if we care about those few users at all)
>
> xattrs would still be stored and read as ovl_fh v0.
>

See attached.
IMHO it looks much easier to verify that these changes are correct
compared to your open coded offset shifting all over the place.

It even passed the exportfs tests first try.
Only some index tests are failing.

If you like this version, I can fix up the failures and add Al's
suggestions to simplify code with OVL_FH_MAX_SIZE
memory allocations.

Thanks,
Amir.

Attachments:

0001-ovl-make-sure-real-file-handle-is-32bit-aligned-in-m.patch.txt (10.78 kB)

2019-11-14 23:50:06

by Al Viro

[permalink] [raw]

Subject: Re: [RFC] is ovl_fh->fid really intended to be misaligned?

On Fri, Nov 15, 2019 at 01:13:15AM +0200, Amir Goldstein wrote:

> See attached.
> IMHO it looks much easier to verify that these changes are correct
> compared to your open coded offset shifting all over the place.
>
> It even passed the exportfs tests first try.
> Only some index tests are failing.
>
> If you like this version, I can fix up the failures and add Al's
> suggestions to simplify code with OVL_FH_MAX_SIZE
> memory allocations.

Huh? Correct me if I'm wrong, but doesn't that patch make it
reject all old fhandles just on the type check? That includes
anything sent over the wire, or stored in xattrs, or represented
as names in indexdir...

_If_ we can change the fhandle layout at will, everything's
easy - you can add padding in any way you like. But fhandles
are a lot more permanent than this approach would require -
mere rebooting the server into a new kernel must not make the
clients see -ESTALE on everything they'd imported from it!
And then there's implied "you can throw indexdir contents at
any time", which is also not a good thing to do.

Sure, introducing a variant with better layout would be nice,
and using a new type for it is pretty much required, but
you can't just discard old-layout fhandles you'd ever issued.

I'm afraid it's a lot stickier than you think; decisions on
fhandle layout are nearly as permanent as those on the
storage layouts for a filesystem. Especially in case of
overlayfs, where they are literally a part of the storage
layout - the names in indexdir and the contents of
trusted.overlay.upper xattr on subdirectories in there...

2019-11-15 06:09:12

by Amir Goldstein

[permalink] [raw]

Subject: Re: [RFC] is ovl_fh->fid really intended to be misaligned?

On Fri, Nov 15, 2019 at 1:49 AM Al Viro <[email protected]> wrote:
>
> On Fri, Nov 15, 2019 at 01:13:15AM +0200, Amir Goldstein wrote:
>
> > See attached.
> > IMHO it looks much easier to verify that these changes are correct
> > compared to your open coded offset shifting all over the place.
> >
> > It even passed the exportfs tests first try.
> > Only some index tests are failing.
> >
> > If you like this version, I can fix up the failures and add Al's
> > suggestions to simplify code with OVL_FH_MAX_SIZE
> > memory allocations.
>
> Huh? Correct me if I'm wrong, but doesn't that patch make it
> reject all old fhandles just on the type check? That includes
> anything sent over the wire, or stored in xattrs, or represented
> as names in indexdir...

That's not what the patch does.
It does reject OVL_FILEID_V0 in ovl_fh_to_dentry(), but
that can easily be fixed to convert them on-the-fly.
I just left this part out.

The patch does not change on-disk format, not in xattr
and not in index names - or not intentionally anyway, see:

ovl_get_fh():
...
// reading xattr into aligned buffer at offset 3
res = vfs_getxattr(dentry, name, fh->buf, res);
if (res < 0)
goto fail;
...
// checking old v0 on-disk format
err = ovl_check_fb_len(&fh->fb, res);
...

ovl_verify_set_fh():
...
// encode correctly aligned buffer
fh = ovl_encode_real_fh(real, is_upper);
...
// store old format into xattr
err = ovl_do_setxattr(dentry, name, fh->buf, fh->fb.len, 0);

Similarly in ovl_get_index_name_fh():
// here was a bug
n = kcalloc(fh->fb.len + OVL_FH_WIRE_OFFSET, 2, GFP_KERNEL);
...
// hex encode the old format
s = bin2hex(n, fh->buf, fh->fb.len);

>
> _If_ we can change the fhandle layout at will, everything's
> easy - you can add padding in any way you like. But fhandles
> are a lot more permanent than this approach would require -
> mere rebooting the server into a new kernel must not make the
> clients see -ESTALE on everything they'd imported from it!
> And then there's implied "you can throw indexdir contents at
> any time", which is also not a good thing to do.
>

I think I understand the confusion.
There is no requirement that the file handles stored on-disk
will be the same format as those exported on wire.
The file handles coming from wire are rarely compared
with those on-disk and vice versa. IIRC, there is a single
place that uses an fh from wire to lookup on-disk:

ovl_lower_fh_to_d():
...
/* Then lookup indexed upper/whiteout by origin fh */
if (ofs->indexdir) {
index = ovl_get_index_fh(ofs, fh);

But that is the exception.
Most of the time, the on-disk fh are used to make sure that
union inode numbers are persistently unique across the overlay.

> Sure, introducing a variant with better layout would be nice,
> and using a new type for it is pretty much required, but
> you can't just discard old-layout fhandles you'd ever issued.
>

In truth, we have ample evidence that ovl nfs export is not
widely in use. We know that docker and such have disabled
index because it exposed mount leak bugs that they have.
And we got too few nfs export bug reports (single reporter
IIRC).
But still, as I said, it is quite easy to respect OVL_FILEID_V0
file handles that were handed out to nfs clients after upgrade.

Unless I am still missing something...

Thanks,
Amir.