When filling an inode with info from the MDS, i_blkbits is being
initialized using fl_stripe_unit, which contains the stripe unit in
bytes. Unfortunately, this doesn't make sense for directories as they
have fl_stripe_unit set to '0'. This means that i_blkbits will be set
to 0xff, causing an UBSAN undefined behaviour in i_blocksize():
UBSAN: Undefined behaviour in ./include/linux/fs.h:731:12
shift exponent 255 is too large for 32-bit type 'int'
Fix this by initializing i_blkbits to CEPH_BLOCK_SHIFT if fl_stripe_unit
is zero.
Signed-off-by: Luis Henriques <[email protected]>
---
fs/ceph/inode.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
Hi Jeff,
To be honest, I'm not sure CEPH_BLOCK_SHIFT is the right value to use
here, but for sure the one currently being used isn't correct if the
inode is a directory. Using stripe units seems to be a bug that has
been there since the beginning, but it definitely became bigger problem
after commit 69448867abcb ("fs: shave 8 bytes off of struct inode").
This fix could also be moved into the 'switch' statement later in that
function, in the S_IFDIR case, similar to commit 5ba72e607cdb ("ceph:
set special inode's blocksize to page size"). Let me know which version
you would prefer.
Cheers,
--
Luis
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 791f84a13bb8..0e6d6db848b7 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -800,7 +800,12 @@ static int fill_inode(struct inode *inode, struct page *locked_page,
/* update inode */
inode->i_rdev = le32_to_cpu(info->rdev);
- inode->i_blkbits = fls(le32_to_cpu(info->layout.fl_stripe_unit)) - 1;
+ /* directories have fl_stripe_unit set to zero */
+ if (le32_to_cpu(info->layout.fl_stripe_unit))
+ inode->i_blkbits =
+ fls(le32_to_cpu(info->layout.fl_stripe_unit)) - 1;
+ else
+ inode->i_blkbits = CEPH_BLOCK_SHIFT;
__ceph_update_quota(ci, iinfo->max_bytes, iinfo->max_files);
On Tue, 2019-07-23 at 16:50 +0100, Luis Henriques wrote:
> When filling an inode with info from the MDS, i_blkbits is being
> initialized using fl_stripe_unit, which contains the stripe unit in
> bytes. Unfortunately, this doesn't make sense for directories as they
> have fl_stripe_unit set to '0'. This means that i_blkbits will be set
> to 0xff, causing an UBSAN undefined behaviour in i_blocksize():
>
> UBSAN: Undefined behaviour in ./include/linux/fs.h:731:12
> shift exponent 255 is too large for 32-bit type 'int'
>
> Fix this by initializing i_blkbits to CEPH_BLOCK_SHIFT if fl_stripe_unit
> is zero.
>
> Signed-off-by: Luis Henriques <[email protected]>
> ---
> fs/ceph/inode.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> Hi Jeff,
>
> To be honest, I'm not sure CEPH_BLOCK_SHIFT is the right value to use
> here, but for sure the one currently being used isn't correct if the
> inode is a directory. Using stripe units seems to be a bug that has
> been there since the beginning, but it definitely became bigger problem
> after commit 69448867abcb ("fs: shave 8 bytes off of struct inode").
>
> This fix could also be moved into the 'switch' statement later in that
> function, in the S_IFDIR case, similar to commit 5ba72e607cdb ("ceph:
> set special inode's blocksize to page size"). Let me know which version
> you would prefer.
>
What happens with (e.g.) named pipes or symlinks? Do those inodes also
get this bogus value? Assuming that they do, I'd probably prefer this
patch since it'd fix things for all inode types, not just directories.
> Cheers,
> --
> Luis
>
> diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
> index 791f84a13bb8..0e6d6db848b7 100644
> --- a/fs/ceph/inode.c
> +++ b/fs/ceph/inode.c
> @@ -800,7 +800,12 @@ static int fill_inode(struct inode *inode, struct page *locked_page,
>
> /* update inode */
> inode->i_rdev = le32_to_cpu(info->rdev);
> - inode->i_blkbits = fls(le32_to_cpu(info->layout.fl_stripe_unit)) - 1;
> + /* directories have fl_stripe_unit set to zero */
> + if (le32_to_cpu(info->layout.fl_stripe_unit))
> + inode->i_blkbits =
> + fls(le32_to_cpu(info->layout.fl_stripe_unit)) - 1;
> + else
> + inode->i_blkbits = CEPH_BLOCK_SHIFT;
>
> __ceph_update_quota(ci, iinfo->max_bytes, iinfo->max_files);
>
--
Jeff Layton <[email protected]>
"Jeff Layton" <[email protected]> writes:
> On Tue, 2019-07-23 at 16:50 +0100, Luis Henriques wrote:
>> When filling an inode with info from the MDS, i_blkbits is being
>> initialized using fl_stripe_unit, which contains the stripe unit in
>> bytes. Unfortunately, this doesn't make sense for directories as they
>> have fl_stripe_unit set to '0'. This means that i_blkbits will be set
>> to 0xff, causing an UBSAN undefined behaviour in i_blocksize():
>>
>> UBSAN: Undefined behaviour in ./include/linux/fs.h:731:12
>> shift exponent 255 is too large for 32-bit type 'int'
>>
>> Fix this by initializing i_blkbits to CEPH_BLOCK_SHIFT if fl_stripe_unit
>> is zero.
>>
>> Signed-off-by: Luis Henriques <[email protected]>
>> ---
>> fs/ceph/inode.c | 7 ++++++-
>> 1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> Hi Jeff,
>>
>> To be honest, I'm not sure CEPH_BLOCK_SHIFT is the right value to use
>> here, but for sure the one currently being used isn't correct if the
>> inode is a directory. Using stripe units seems to be a bug that has
>> been there since the beginning, but it definitely became bigger problem
>> after commit 69448867abcb ("fs: shave 8 bytes off of struct inode").
>>
>> This fix could also be moved into the 'switch' statement later in that
>> function, in the S_IFDIR case, similar to commit 5ba72e607cdb ("ceph:
>> set special inode's blocksize to page size"). Let me know which version
>> you would prefer.
>>
>
> What happens with (e.g.) named pipes or symlinks? Do those inodes also
> get this bogus value? Assuming that they do, I'd probably prefer this
> patch since it'd fix things for all inode types, not just directories.
I tested symlinks and they seem to be handled correctly (i.e. the stripe
units seems to be the same as the target file). Regarding pipes, I
didn't test them, but from the code it should be set to PAGE_SHIFT (see
the above mentioned commit 5ba72e607cdb).
Anyway, I can change the code to do *all* the i_blkbits initialization
inside the switch statement. Something like:
switch (inode->i_mode & S_IFMT) {
case S_IFIFO:
case S_IFBLK:
case S_IFCHR:
case S_IFSOCK:
inode->i_blkbits = PAGE_SHIFT;
...
case S_IFREG:
inode->i_blkbits = fls(le32_to_cpu(info->layout.fl_stripe_unit)) - 1;
...
case S_IFLNK:
inode->i_blkbits = fls(le32_to_cpu(info->layout.fl_stripe_unit)) - 1;
...
case S_IFDIR:
inode->i_blkbits = CEPH_BLOCK_SHIFT;
...
default:
pr_err();
...
}
This would add some code duplication (S_IFREG and S_IFLNK cases), but
maybe it's a bit more clear. The other option would be obviously to
leave the initialization outside the switch and only change the
i_blkbits value in the S_IF{IFO,BLK,CHR,SOCK,DIR} cases.
Cheers,
--
Luis
>
>> Cheers,
>> --
>> Luis
>>
>> diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
>> index 791f84a13bb8..0e6d6db848b7 100644
>> --- a/fs/ceph/inode.c
>> +++ b/fs/ceph/inode.c
>> @@ -800,7 +800,12 @@ static int fill_inode(struct inode *inode, struct page *locked_page,
>>
>> /* update inode */
>> inode->i_rdev = le32_to_cpu(info->rdev);
>> - inode->i_blkbits = fls(le32_to_cpu(info->layout.fl_stripe_unit)) - 1;
>> + /* directories have fl_stripe_unit set to zero */
>> + if (le32_to_cpu(info->layout.fl_stripe_unit))
>> + inode->i_blkbits =
>> + fls(le32_to_cpu(info->layout.fl_stripe_unit)) - 1;
>> + else
>> + inode->i_blkbits = CEPH_BLOCK_SHIFT;
>>
>> __ceph_update_quota(ci, iinfo->max_bytes, iinfo->max_files);
>>
On Wed, 2019-07-24 at 11:04 +0100, Luis Henriques wrote:
> Luis Henriques <[email protected]> writes:
>
> > "Jeff Layton" <[email protected]> writes:
> >
> > > On Tue, 2019-07-23 at 16:50 +0100, Luis Henriques wrote:
> > > > When filling an inode with info from the MDS, i_blkbits is being
> > > > initialized using fl_stripe_unit, which contains the stripe unit in
> > > > bytes. Unfortunately, this doesn't make sense for directories as they
> > > > have fl_stripe_unit set to '0'. This means that i_blkbits will be set
> > > > to 0xff, causing an UBSAN undefined behaviour in i_blocksize():
> > > >
> > > > UBSAN: Undefined behaviour in ./include/linux/fs.h:731:12
> > > > shift exponent 255 is too large for 32-bit type 'int'
> > > >
> > > > Fix this by initializing i_blkbits to CEPH_BLOCK_SHIFT if fl_stripe_unit
> > > > is zero.
> > > >
> > > > Signed-off-by: Luis Henriques <[email protected]>
> > > > ---
> > > > fs/ceph/inode.c | 7 ++++++-
> > > > 1 file changed, 6 insertions(+), 1 deletion(-)
> > > >
> > > > Hi Jeff,
> > > >
> > > > To be honest, I'm not sure CEPH_BLOCK_SHIFT is the right value to use
> > > > here, but for sure the one currently being used isn't correct if the
> > > > inode is a directory. Using stripe units seems to be a bug that has
> > > > been there since the beginning, but it definitely became bigger problem
> > > > after commit 69448867abcb ("fs: shave 8 bytes off of struct inode").
> > > >
> > > > This fix could also be moved into the 'switch' statement later in that
> > > > function, in the S_IFDIR case, similar to commit 5ba72e607cdb ("ceph:
> > > > set special inode's blocksize to page size"). Let me know which version
> > > > you would prefer.
> > > >
> > >
> > > What happens with (e.g.) named pipes or symlinks? Do those inodes also
> > > get this bogus value? Assuming that they do, I'd probably prefer this
> > > patch since it'd fix things for all inode types, not just directories.
> >
> > I tested symlinks and they seem to be handled correctly (i.e. the stripe
> > units seems to be the same as the target file). Regarding pipes, I
> > didn't test them, but from the code it should be set to PAGE_SHIFT (see
> > the above mentioned commit 5ba72e607cdb).
>
> Ok, after looking closer at the other inode types and running a few
> tests with extra debug code, it all seems to be sane -- only directories
> (root dir is an exception) will cause problems with i_blkbits being set
> to a bogus value. So, I'm sticking with my original RFC patch approach,
> which should be easy to apply to stable kernels.
>
> Cheers,
Sounds good. I'll just plan to merge your RFC patch, after I run some
more tests on it.
Thanks!
--
Jeff Layton <[email protected]>
Luis Henriques <[email protected]> writes:
> "Jeff Layton" <[email protected]> writes:
>
>> On Tue, 2019-07-23 at 16:50 +0100, Luis Henriques wrote:
>>> When filling an inode with info from the MDS, i_blkbits is being
>>> initialized using fl_stripe_unit, which contains the stripe unit in
>>> bytes. Unfortunately, this doesn't make sense for directories as they
>>> have fl_stripe_unit set to '0'. This means that i_blkbits will be set
>>> to 0xff, causing an UBSAN undefined behaviour in i_blocksize():
>>>
>>> UBSAN: Undefined behaviour in ./include/linux/fs.h:731:12
>>> shift exponent 255 is too large for 32-bit type 'int'
>>>
>>> Fix this by initializing i_blkbits to CEPH_BLOCK_SHIFT if fl_stripe_unit
>>> is zero.
>>>
>>> Signed-off-by: Luis Henriques <[email protected]>
>>> ---
>>> fs/ceph/inode.c | 7 ++++++-
>>> 1 file changed, 6 insertions(+), 1 deletion(-)
>>>
>>> Hi Jeff,
>>>
>>> To be honest, I'm not sure CEPH_BLOCK_SHIFT is the right value to use
>>> here, but for sure the one currently being used isn't correct if the
>>> inode is a directory. Using stripe units seems to be a bug that has
>>> been there since the beginning, but it definitely became bigger problem
>>> after commit 69448867abcb ("fs: shave 8 bytes off of struct inode").
>>>
>>> This fix could also be moved into the 'switch' statement later in that
>>> function, in the S_IFDIR case, similar to commit 5ba72e607cdb ("ceph:
>>> set special inode's blocksize to page size"). Let me know which version
>>> you would prefer.
>>>
>>
>> What happens with (e.g.) named pipes or symlinks? Do those inodes also
>> get this bogus value? Assuming that they do, I'd probably prefer this
>> patch since it'd fix things for all inode types, not just directories.
>
> I tested symlinks and they seem to be handled correctly (i.e. the stripe
> units seems to be the same as the target file). Regarding pipes, I
> didn't test them, but from the code it should be set to PAGE_SHIFT (see
> the above mentioned commit 5ba72e607cdb).
Ok, after looking closer at the other inode types and running a few
tests with extra debug code, it all seems to be sane -- only directories
(root dir is an exception) will cause problems with i_blkbits being set
to a bogus value. So, I'm sticking with my original RFC patch approach,
which should be easy to apply to stable kernels.
Cheers,
--
Luis
>
> Anyway, I can change the code to do *all* the i_blkbits initialization
> inside the switch statement. Something like:
>
> switch (inode->i_mode & S_IFMT) {
> case S_IFIFO:
> case S_IFBLK:
> case S_IFCHR:
> case S_IFSOCK:
> inode->i_blkbits = PAGE_SHIFT;
> ...
> case S_IFREG:
> inode->i_blkbits = fls(le32_to_cpu(info->layout.fl_stripe_unit)) - 1;
> ...
> case S_IFLNK:
> inode->i_blkbits = fls(le32_to_cpu(info->layout.fl_stripe_unit)) - 1;
> ...
> case S_IFDIR:
> inode->i_blkbits = CEPH_BLOCK_SHIFT;
> ...
> default:
> pr_err();
> ...
> }
>
> This would add some code duplication (S_IFREG and S_IFLNK cases), but
> maybe it's a bit more clear. The other option would be obviously to
> leave the initialization outside the switch and only change the
> i_blkbits value in the S_IF{IFO,BLK,CHR,SOCK,DIR} cases.
>
> Cheers,