2013-03-28 18:05:41

by Anand Avati

[permalink] [raw]
Subject: Re: regressions due to 64-bit ext4 directory cookies

On Thu, Mar 28, 2013 at 10:52 AM, Zach Brown <[email protected]> wrote:

> On Thu, Mar 28, 2013 at 10:07:44AM -0400, Theodore Ts'o wrote:
> > On Tue, Mar 26, 2013 at 10:48:14AM -0500, Eric Sandeen wrote:
> > > > We don't have reached a conclusion so far, do we? What about the
> > > > ioctl approach, but a bit differently? Would it work to specify the
> > > > allowed upper bits for ext4 (for example 16 additional bit) and the
> > > > remaining part for gluster? One of the mails had the calculation
> > > > formula:
> > >
> > > I did throw together an ioctl patch last week, but I think Anand has a
> new
> > > approach he's trying out which won't require ext4 code changes. I'll
> let
> > > him reply when he has a moment. :)
> >
> > Any update about whether Gluster can address this without needing the
> > ioctl patch? Or should we push the ioctl patch into ext4 for the next
> > merge window?
>
> They're testing a work-around:
>
> http://review.gluster.org/#change,4711
>
> I'm not sure if they've decided that they're going to go with it, or
> not.
>

Jeff reported that the approach did not work in his testing. I haven't had
a chance to look into the failure yet. Independent of the fix, it would
certainly be good have the ioctl() support - Samba could use it too, if it
wanted.

Avati


Attachments:
(No filename) (156.00 B)

2013-03-28 19:44:02

by Jeff Darcy

[permalink] [raw]
Subject: Re: [Gluster-devel] regressions due to 64-bit ext4 directory cookies

On 03/28/2013 02:49 PM, Anand Avati wrote:
> Yes, it should, based on the theory of how ext4 was generating the
> 63bits. But Jeff's test finds that the experiment is not matching the
> theory.

FWIW, I was able to re-run my test in between stuff related to That
Other Problem. What seems to be happening is that we read correctly
until just after d_off 0x4000000000000000, then we suddenly wrap around
- not to the very first d_off we saw, but to a pretty early one (e.g.
0x0041b6340689a32e). This is all on a single brick, BTW, so it's pretty
easy to line up the back-end and front-end d_off values which match
perfectly up to this point.

I haven't had a chance to ponder what this all means and debug it
further. Hopefully I'll be able to do so soon, but I figured I'd
mention it in case something about those numbers rang a bell.



2013-03-28 18:31:58

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [Gluster-devel] regressions due to 64-bit ext4 directory cookies

On Thu, Mar 28, 2013 at 11:05:41AM -0700, Anand Avati wrote:
> On Thu, Mar 28, 2013 at 10:52 AM, Zach Brown <[email protected]> wrote:
>
> > On Thu, Mar 28, 2013 at 10:07:44AM -0400, Theodore Ts'o wrote:
> > > On Tue, Mar 26, 2013 at 10:48:14AM -0500, Eric Sandeen wrote:
> > > > > We don't have reached a conclusion so far, do we? What about the
> > > > > ioctl approach, but a bit differently? Would it work to specify the
> > > > > allowed upper bits for ext4 (for example 16 additional bit) and the
> > > > > remaining part for gluster? One of the mails had the calculation
> > > > > formula:
> > > >
> > > > I did throw together an ioctl patch last week, but I think Anand has a
> > new
> > > > approach he's trying out which won't require ext4 code changes. I'll
> > let
> > > > him reply when he has a moment. :)
> > >
> > > Any update about whether Gluster can address this without needing the
> > > ioctl patch? Or should we push the ioctl patch into ext4 for the next
> > > merge window?
> >
> > They're testing a work-around:
> >
> > http://review.gluster.org/#change,4711
> >
> > I'm not sure if they've decided that they're going to go with it, or
> > not.
> >
>
> Jeff reported that the approach did not work in his testing. I haven't had
> a chance to look into the failure yet. Independent of the fix, it would
> certainly be good have the ioctl() support

The one advantage of your scheme is that it keeps more of the hash bits;
the chance of 31-bit cookie collisions is much higher.

> Samba could use it too, if it wanted.

It'd be useful to understand their situation.

--b.

2013-03-28 18:49:36

by Anand Avati

[permalink] [raw]
Subject: Re: regressions due to 64-bit ext4 directory cookies

On Thu, Mar 28, 2013 at 11:31 AM, J. Bruce Fields <[email protected]>wrote:
>
> > Jeff reported that the approach did not work in his testing. I haven't
> had
> > a chance to look into the failure yet. Independent of the fix, it would
> > certainly be good have the ioctl() support
>
> The one advantage of your scheme is that it keeps more of the hash bits;
> the chance of 31-bit cookie collisions is much higher.


Yes, it should, based on the theory of how ext4 was generating the 63bits.
But Jeff's test finds that the experiment is not matching the theory. I
intend to debug this, but currently drowned in a different issue. It would
be good if the ext developers can have a look at
http://review.gluster.org/4711 and see if there are obvious holes in the
approach or code.

Avati


Attachments:
(No filename) (156.00 B)

2013-03-28 22:14:50

by Anand Avati

[permalink] [raw]
Subject: Re: regressions due to 64-bit ext4 directory cookies

On Thu, Mar 28, 2013 at 12:43 PM, Jeff Darcy <[email protected]> wrote:

> On 03/28/2013 02:49 PM, Anand Avati wrote:
> > Yes, it should, based on the theory of how ext4 was generating the
> > 63bits. But Jeff's test finds that the experiment is not matching the
> > theory.
>
> FWIW, I was able to re-run my test in between stuff related to That
> Other Problem. What seems to be happening is that we read correctly
> until just after d_off 0x4000000000000000, then we suddenly wrap around
> - not to the very first d_off we saw, but to a pretty early one (e.g.
> 0x0041b6340689a32e). This is all on a single brick, BTW, so it's pretty
> easy to line up the back-end and front-end d_off values which match
> perfectly up to this point.
>
> I haven't had a chance to ponder what this all means and debug it
> further. Hopefully I'll be able to do so soon, but I figured I'd
> mention it in case something about those numbers rang a bell.
>

Of course, the unit tests (with artificial offsets) were done with brick
count >= 2. You have tested with DHT subvol count=1, which was not tested,
and sure enough, the code isn't handling it well. Just verified with the
unit tests that brick count = 1 condition fails to return the same d_off.

Posting a fixed version. Thanks for the catch!

Avati


Attachments:
(No filename) (156.00 B)