2013-03-22 18:26:13

by Zach Brown

[permalink] [raw]
Subject: buggy readdir with inline dirs

I don't remember quite how, but I found myself flipping through the
inline dir code that's in mainline now. It looked pretty fishy so Eric
and I played around with it. It's very buggy in its current form.

ext4_read_inline_dir() doesn't seem to undertand the filldir arguments.
It suggests that offset 0 is the next offset after both the "." and ".."
entries. It needs to have specific offsets for "." and ".." and return them
accordingly. It looks like fixing this will trickle down into the
revalidation loop.

It doesn't understand that it's possible to only return a single "."
entry in getdents and have a subsequent call have f_pos pointing at the
fake ".." entry. With the current code if your getdents buffer only has
room for "." it just spins returning that entry leaving f_pos at 0.

Those are all relatively simple bugs that just need to be fixed.

But the big bug is that it changes the d_off values for entries as it
converts from byte offsets in the inline dir xattr to hashed offsets in
indexed dir leaves. A concurrent readdir could be unlucky enough to get
a bunch of duplicate entries as it reads past the nice low inline byte
offsets into the huge hashed offsets.

I'm not sure how to easily fix that. It feels like it'd want to
maintain the dir entries in the xattr blob with the offsets that they'll
have once converted to full dir blocks. So instead of being a magical
readdir path maybe it wants to be in the path of looking up dir blocks
so existing unindexed and indexed code would operate on the data in the
xattr blob as though it were a block?

Dunno, just wanted to share what we found. Are these all known problems
in prototype code that isn't intended to be used?

- z


2013-03-23 01:02:23

by Dave Chinner

[permalink] [raw]
Subject: Re: buggy readdir with inline dirs

On Fri, Mar 22, 2013 at 11:26:08AM -0700, Zach Brown wrote:
> I don't remember quite how, but I found myself flipping through the
> inline dir code that's in mainline now. It looked pretty fishy so Eric
> and I played around with it. It's very buggy in its current form.
>
> ext4_read_inline_dir() doesn't seem to undertand the filldir arguments.
> It suggests that offset 0 is the next offset after both the "." and ".."
> entries. It needs to have specific offsets for "." and ".." and return them
> accordingly. It looks like fixing this will trickle down into the
> revalidation loop.
>
> It doesn't understand that it's possible to only return a single "."
> entry in getdents and have a subsequent call have f_pos pointing at the
> fake ".." entry. With the current code if your getdents buffer only has
> room for "." it just spins returning that entry leaving f_pos at 0.
>
> Those are all relatively simple bugs that just need to be fixed.
>
> But the big bug is that it changes the d_off values for entries as it
> converts from byte offsets in the inline dir xattr to hashed offsets in
> indexed dir leaves. A concurrent readdir could be unlucky enough to get
> a bunch of duplicate entries as it reads past the nice low inline byte
> offsets into the huge hashed offsets.
>
> I'm not sure how to easily fix that. It feels like it'd want to
> maintain the dir entries in the xattr blob with the offsets that they'll
> have once converted to full dir blocks. So instead of being a magical
> readdir path maybe it wants to be in the path of looking up dir blocks
> so existing unindexed and indexed code would operate on the data in the
> xattr blob as though it were a block?

XFs solves this problem by keeping an "offset" field in the
short-form directory entry. It calculates what the offset would be
if the entry was in block form and adds that to the directory entry
so that when the directory grows to block form and the entries are
moved,they retain the same userspace visible offset. The offsets
returned to getdents/readdir are what is in the offset field, not
the physical offset of then entry in the shortform structure...

A similar (but opposite) process takes place when going from block
form back to short form....

Cheers,

Dave.
--
Dave Chinner
[email protected]

2013-03-23 13:24:56

by Tao Ma

[permalink] [raw]
Subject: Re: buggy readdir with inline dirs

Hi Zach,
On 03/23/2013 02:26 AM, Zach Brown wrote:
> I don't remember quite how, but I found myself flipping through the
> inline dir code that's in mainline now. It looked pretty fishy so Eric
> and I played around with it. It's very buggy in its current form.
Sorry about any inconvenience brought to you.
>
> ext4_read_inline_dir() doesn't seem to undertand the filldir arguments.
> It suggests that offset 0 is the next offset after both the "." and ".."
> entries. It needs to have specific offsets for "." and ".." and return them
> accordingly. It looks like fixing this will trickle down into the
> revalidation loop.
yes, it is my fault, I guess at the very first beginning, I just can't
figured out how to return a proper 'offset' to the user to indicate
'..'. Now we don't save anything about '.', so offset 0 is OK for it,
but maybe we should return some offset like '2' to the user about it.
Anyway it should be fixed.
>
> It doesn't understand that it's possible to only return a single "."
> entry in getdents and have a subsequent call have f_pos pointing at the
> fake ".." entry. With the current code if your getdents buffer only has
> room for "." it just spins returning that entry leaving f_pos at 0.
Sorry.
>
> Those are all relatively simple bugs that just need to be fixed.
>
> But the big bug is that it changes the d_off values for entries as it
> converts from byte offsets in the inline dir xattr to hashed offsets in
> indexed dir leaves. A concurrent readdir could be unlucky enough to get
> a bunch of duplicate entries as it reads past the nice low inline byte
> offsets into the huge hashed offsets.
>
> I'm not sure how to easily fix that. It feels like it'd want to
> maintain the dir entries in the xattr blob with the offsets that they'll
> have once converted to full dir blocks. So instead of being a magical
> readdir path maybe it wants to be in the path of looking up dir blocks
> so existing unindexed and indexed code would operate on the data in the
> xattr blob as though it were a block?
>
> Dunno, just wanted to share what we found. Are these all known problems
> in prototype code that isn't intended to be used?
I will check what xfs does in this case as Dave mentioned in another
reply and come back with a fix about it.

Thanks,
Tao

2013-03-23 17:35:11

by Andreas Dilger

[permalink] [raw]
Subject: Re: buggy readdir with inline dirs

On 2013-03-23, at 6:24, Tao Ma <[email protected]> wrote:
> On 03/23/2013 02:26 AM, Zach Brown wrote:
>> I don't remember quite how, but I found myself flipping through the
>> inline dir code that's in mainline now. It looked pretty fishy so Eric
>> and I played around with it. It's very buggy in its current form.
> Sorry about any inconvenience brought to you.
>>
>> ext4_read_inline_dir() doesn't seem to undertand the filldir arguments.
>> It suggests that offset 0 is the next offset after both the "." and ".."
>> entries. It needs to have specific offsets for "." and ".." and return them
>> accordingly. It looks like fixing this will trickle down into the
>> revalidation loop.
> yes, it is my fault, I guess at the very first beginning, I just can't
> figured out how to return a proper 'offset' to the user to indicate
> '..'. Now we don't save anything about '.', so offset 0 is OK for it,
> but maybe we should return some offset like '2' to the user about it.
> Anyway it should be fixed.

FYI, ZFS (which generates "." and ".." entries on-the-fly also) uses "0" for start of readdir, "1" for ".", and "2" for "..".

Cheers, Andreas

>> It doesn't understand that it's possible to only return a single "."
>> entry in getdents and have a subsequent call have f_pos pointing at the
>> fake ".." entry. With the current code if your getdents buffer only has
>> room for "." it just spins returning that entry leaving f_pos at 0.
> Sorry.
>>
>> Those are all relatively simple bugs that just need to be fixed.
>>
>> But the big bug is that it changes the d_off values for entries as it
>> converts from byte offsets in the inline dir xattr to hashed offsets in
>> indexed dir leaves. A concurrent readdir could be unlucky enough to get
>> a bunch of duplicate entries as it reads past the nice low inline byte
>> offsets into the huge hashed offsets.
>>
>> I'm not sure how to easily fix that. It feels like it'd want to
>> maintain the dir entries in the xattr blob with the offsets that they'll
>> have once converted to full dir blocks. So instead of being a magical
>> readdir path maybe it wants to be in the path of looking up dir blocks
>> so existing unindexed and indexed code would operate on the data in the
>> xattr blob as though it were a block?
>>
>> Dunno, just wanted to share what we found. Are these all known problems
>> in prototype code that isn't intended to be used?
> I will check what xfs does in this case as Dave mentioned in another
> reply and come back with a fix about it.
>
> Thanks,
> Tao
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html