2012-08-13 15:31:00

by Jim Vanns

[permalink] [raw]
Subject: Your comments, guidance, advice please :)

Hello NFS hackers. First off, fear not - the attached patch is not
something I wish to submit to the mainline kernel! However, it is
important for me that you pass judgement or comment on it. It is small.

Basically, I've written the patch solely to workaround a Bluearc bug
where it duplicates fileids within an fsid and therefore we're not able
to rely on the fsid+fileid to identify distinct files in an NFS
filesystem. Some of our storage indexing and reporting software relies
on this and works happily with other, more RFC-adherent
implementations ;)

The functional change is one that modified the received fileid to a hash
of the file handle as that, thankfully, is still unique. As with a
fileid I need this hash to remain consistent for the lifetime of a file.
It is used as a unique identifier in a database.

I'd really appreciate it if you could let me know of any problems you
see with it - whether it'll break some client-side code, hash table use
or worse still send back bad data to the server.

I've modified what I can see as the least amount of code possible - and
my test VM is working happily as a client with this patch. It is
intended that the patch modifies only client-side code once the Bluearc
RPCs are pulled off the wire. I never want to send back these modified
fileids to the server.

Kind regards and thanks for your help,

Jim Vanns

PS. Sorry, I should mention of course that this patch is a diff from the
2.6.32 kernel as shipped with Fedora 12 - so not a recent kernel.

--
Jim Vanns
Systems Programmer
Framestore


Attachments:
fscfc-nfs-fileid-hash.patch (4.38 kB)

2012-08-13 17:28:33

by J. Bruce Fields

[permalink] [raw]
Subject: Re: Your comments, guidance, advice please :)

On Mon, Aug 13, 2012 at 04:40:39PM +0000, Myklebust, Trond wrote:
> On Mon, 2012-08-13 at 15:55 +0100, Jim Vanns wrote:
> > Hello NFS hackers. First off, fear not - the attached patch is not
> > something I wish to submit to the mainline kernel! However, it is
> > important for me that you pass judgement or comment on it. It is small.
> >
> > Basically, I've written the patch solely to workaround a Bluearc bug
> > where it duplicates fileids within an fsid and therefore we're not able
> > to rely on the fsid+fileid to identify distinct files in an NFS
> > filesystem. Some of our storage indexing and reporting software relies
> > on this and works happily with other, more RFC-adherent
> > implementations ;)
> >
> > The functional change is one that modified the received fileid to a hash
> > of the file handle as that, thankfully, is still unique. As with a
> > fileid I need this hash to remain consistent for the lifetime of a file.
> > It is used as a unique identifier in a database.
> >
> > I'd really appreciate it if you could let me know of any problems you
> > see with it - whether it'll break some client-side code, hash table use
> > or worse still send back bad data to the server.
> >
> > I've modified what I can see as the least amount of code possible - and
> > my test VM is working happily as a client with this patch. It is
> > intended that the patch modifies only client-side code once the Bluearc
> > RPCs are pulled off the wire. I never want to send back these modified
> > fileids to the server.
>
> READDIR and READDIRPLUS will continue to return the fileid from the
> server, so the getdents() and readdir() syscalls will be broken. Since
> READDIRPLUS does return the filehandle, you might be able to fix that
> up, but plain READDIR would appear to be unfixable.
>
> Otherwise, your strategy should in principle be OK, but with the caveat
> that a hash does not suffice to completely prevent collisions even if it
> is well chosen.
> IOW: All you are doing is tweaking the probability of a collision.

Also: the v4 rfc's allow two distinct filehandles to point to the same
file, don't they? (See e.g.
http://tools.ietf.org/html/rfc5661#section-10.3.4).

--b.

2012-08-13 16:40:40

by Myklebust, Trond

[permalink] [raw]
Subject: Re: Your comments, guidance, advice please :)

T24gTW9uLCAyMDEyLTA4LTEzIGF0IDE1OjU1ICswMTAwLCBKaW0gVmFubnMgd3JvdGU6DQo+IEhl
bGxvIE5GUyBoYWNrZXJzLiBGaXJzdCBvZmYsIGZlYXIgbm90IC0gdGhlIGF0dGFjaGVkIHBhdGNo
IGlzIG5vdA0KPiBzb21ldGhpbmcgSSB3aXNoIHRvIHN1Ym1pdCB0byB0aGUgbWFpbmxpbmUga2Vy
bmVsISBIb3dldmVyLCBpdCBpcw0KPiBpbXBvcnRhbnQgZm9yIG1lIHRoYXQgeW91IHBhc3MganVk
Z2VtZW50IG9yIGNvbW1lbnQgb24gaXQuIEl0IGlzIHNtYWxsLg0KPiANCj4gQmFzaWNhbGx5LCBJ
J3ZlIHdyaXR0ZW4gdGhlIHBhdGNoIHNvbGVseSB0byB3b3JrYXJvdW5kIGEgQmx1ZWFyYyBidWcN
Cj4gd2hlcmUgaXQgZHVwbGljYXRlcyBmaWxlaWRzIHdpdGhpbiBhbiBmc2lkIGFuZCB0aGVyZWZv
cmUgd2UncmUgbm90IGFibGUNCj4gdG8gcmVseSBvbiB0aGUgZnNpZCtmaWxlaWQgdG8gaWRlbnRp
ZnkgZGlzdGluY3QgZmlsZXMgaW4gYW4gTkZTDQo+IGZpbGVzeXN0ZW0uIFNvbWUgb2Ygb3VyIHN0
b3JhZ2UgaW5kZXhpbmcgYW5kIHJlcG9ydGluZyBzb2Z0d2FyZSByZWxpZXMNCj4gb24gdGhpcyBh
bmQgd29ya3MgaGFwcGlseSB3aXRoIG90aGVyLCBtb3JlIFJGQy1hZGhlcmVudA0KPiBpbXBsZW1l
bnRhdGlvbnMgOykNCj4gDQo+IFRoZSBmdW5jdGlvbmFsIGNoYW5nZSBpcyBvbmUgdGhhdCBtb2Rp
ZmllZCB0aGUgcmVjZWl2ZWQgZmlsZWlkIHRvIGEgaGFzaA0KPiBvZiB0aGUgZmlsZSBoYW5kbGUg
YXMgdGhhdCwgdGhhbmtmdWxseSwgaXMgc3RpbGwgdW5pcXVlLiBBcyB3aXRoIGENCj4gZmlsZWlk
IEkgbmVlZCB0aGlzIGhhc2ggdG8gcmVtYWluIGNvbnNpc3RlbnQgZm9yIHRoZSBsaWZldGltZSBv
ZiBhIGZpbGUuDQo+IEl0IGlzIHVzZWQgYXMgYSB1bmlxdWUgaWRlbnRpZmllciBpbiBhIGRhdGFi
YXNlLg0KPiANCj4gSSdkIHJlYWxseSBhcHByZWNpYXRlIGl0IGlmIHlvdSBjb3VsZCBsZXQgbWUg
a25vdyBvZiBhbnkgcHJvYmxlbXMgeW91DQo+IHNlZSB3aXRoIGl0IC0gd2hldGhlciBpdCdsbCBi
cmVhayBzb21lIGNsaWVudC1zaWRlIGNvZGUsIGhhc2ggdGFibGUgdXNlDQo+IG9yIHdvcnNlIHN0
aWxsIHNlbmQgYmFjayBiYWQgZGF0YSB0byB0aGUgc2VydmVyLg0KPiANCj4gSSd2ZSBtb2RpZmll
ZCB3aGF0IEkgY2FuIHNlZSBhcyB0aGUgbGVhc3QgYW1vdW50IG9mIGNvZGUgcG9zc2libGUgLSBh
bmQNCj4gbXkgdGVzdCBWTSBpcyB3b3JraW5nIGhhcHBpbHkgYXMgYSBjbGllbnQgd2l0aCB0aGlz
IHBhdGNoLiBJdCBpcw0KPiBpbnRlbmRlZCB0aGF0IHRoZSBwYXRjaCBtb2RpZmllcyBvbmx5IGNs
aWVudC1zaWRlIGNvZGUgb25jZSB0aGUgQmx1ZWFyYw0KPiBSUENzIGFyZSBwdWxsZWQgb2ZmIHRo
ZSB3aXJlLiBJIG5ldmVyIHdhbnQgdG8gc2VuZCBiYWNrIHRoZXNlIG1vZGlmaWVkDQo+IGZpbGVp
ZHMgdG8gdGhlIHNlcnZlci4NCg0KUkVBRERJUiBhbmQgUkVBRERJUlBMVVMgd2lsbCBjb250aW51
ZSB0byByZXR1cm4gdGhlIGZpbGVpZCBmcm9tIHRoZQ0Kc2VydmVyLCBzbyB0aGUgZ2V0ZGVudHMo
KSBhbmQgcmVhZGRpcigpIHN5c2NhbGxzIHdpbGwgYmUgYnJva2VuLiBTaW5jZQ0KUkVBRERJUlBM
VVMgZG9lcyByZXR1cm4gdGhlIGZpbGVoYW5kbGUsIHlvdSBtaWdodCBiZSBhYmxlIHRvIGZpeCB0
aGF0DQp1cCwgYnV0IHBsYWluIFJFQURESVIgd291bGQgYXBwZWFyIHRvIGJlIHVuZml4YWJsZS4N
Cg0KT3RoZXJ3aXNlLCB5b3VyIHN0cmF0ZWd5IHNob3VsZCBpbiBwcmluY2lwbGUgYmUgT0ssIGJ1
dCB3aXRoIHRoZSBjYXZlYXQNCnRoYXQgYSBoYXNoIGRvZXMgbm90IHN1ZmZpY2UgdG8gY29tcGxl
dGVseSBwcmV2ZW50IGNvbGxpc2lvbnMgZXZlbiBpZiBpdA0KaXMgd2VsbCBjaG9zZW4uDQpJT1c6
IEFsbCB5b3UgYXJlIGRvaW5nIGlzIHR3ZWFraW5nIHRoZSBwcm9iYWJpbGl0eSBvZiBhIGNvbGxp
c2lvbi4NCg0KLS0gDQpUcm9uZCBNeWtsZWJ1c3QNCkxpbnV4IE5GUyBjbGllbnQgbWFpbnRhaW5l
cg0KDQpOZXRBcHANClRyb25kLk15a2xlYnVzdEBuZXRhcHAuY29tDQp3d3cubmV0YXBwLmNvbQ0K
DQo=

2012-08-13 16:51:59

by Jim Vanns

[permalink] [raw]
Subject: Re: Your comments, guidance, advice please :)

On Mon, 2012-08-13 at 16:40 +0000, Myklebust, Trond wrote:
> On Mon, 2012-08-13 at 15:55 +0100, Jim Vanns wrote:
> > Hello NFS hackers. First off, fear not - the attached patch is not
> > something I wish to submit to the mainline kernel! However, it is
> > important for me that you pass judgement or comment on it. It is small.
> >
> > Basically, I've written the patch solely to workaround a Bluearc bug
> > where it duplicates fileids within an fsid and therefore we're not able
> > to rely on the fsid+fileid to identify distinct files in an NFS
> > filesystem. Some of our storage indexing and reporting software relies
> > on this and works happily with other, more RFC-adherent
> > implementations ;)
> >
> > The functional change is one that modified the received fileid to a hash
> > of the file handle as that, thankfully, is still unique. As with a
> > fileid I need this hash to remain consistent for the lifetime of a file.
> > It is used as a unique identifier in a database.
> >
> > I'd really appreciate it if you could let me know of any problems you
> > see with it - whether it'll break some client-side code, hash table use
> > or worse still send back bad data to the server.
> >
> > I've modified what I can see as the least amount of code possible - and
> > my test VM is working happily as a client with this patch. It is
> > intended that the patch modifies only client-side code once the Bluearc
> > RPCs are pulled off the wire. I never want to send back these modified
> > fileids to the server.
>
> READDIR and READDIRPLUS will continue to return the fileid from the
> server, so the getdents() and readdir() syscalls will be broken. Since
> READDIRPLUS does return the filehandle, you might be able to fix that
> up, but plain READDIR would appear to be unfixable.

Thanks, I'll take a look at that.

> Otherwise, your strategy should in principle be OK, but with the caveat
> that a hash does not suffice to completely prevent collisions even if it
> is well chosen.
> IOW: All you are doing is tweaking the probability of a collision.

Oh yes, I completely understand that. I've done very little testing but
I'm confident that this at least reduces the number of collisions
considerably.

Jim

> --
> Trond Myklebust
> Linux NFS client maintainer
>
> NetApp
> [email protected]
> http://www.netapp.com
>
> NrybXǧv^)޺{.n+{"^nrzh&Gh(階ݢj"mzޖfh~m

--
Jim Vanns
Systems Programmer
Framestore


2012-08-14 16:32:52

by Jim Vanns

[permalink] [raw]
Subject: Re: Your comments, guidance, advice please :)

On Mon, 2012-08-13 at 16:40 +0000, Myklebust, Trond wrote:
> On Mon, 2012-08-13 at 15:55 +0100, Jim Vanns wrote:
> > Hello NFS hackers. First off, fear not - the attached patch is not
> > something I wish to submit to the mainline kernel! However, it is
> > important for me that you pass judgement or comment on it. It is small.
> >
> > Basically, I've written the patch solely to workaround a Bluearc bug
> > where it duplicates fileids within an fsid and therefore we're not able
> > to rely on the fsid+fileid to identify distinct files in an NFS
> > filesystem. Some of our storage indexing and reporting software relies
> > on this and works happily with other, more RFC-adherent
> > implementations ;)
> >
> > The functional change is one that modified the received fileid to a hash
> > of the file handle as that, thankfully, is still unique. As with a
> > fileid I need this hash to remain consistent for the lifetime of a file.
> > It is used as a unique identifier in a database.
> >
> > I'd really appreciate it if you could let me know of any problems you
> > see with it - whether it'll break some client-side code, hash table use
> > or worse still send back bad data to the server.
> >
> > I've modified what I can see as the least amount of code possible - and
> > my test VM is working happily as a client with this patch. It is
> > intended that the patch modifies only client-side code once the Bluearc
> > RPCs are pulled off the wire. I never want to send back these modified
> > fileids to the server.
>
> READDIR and READDIRPLUS will continue to return the fileid from the
> server, so the getdents() and readdir() syscalls will be broken. Since
> READDIRPLUS does return the filehandle, you might be able to fix that
> up, but plain READDIR would appear to be unfixable.

To this end then, I've modified my patch so that within
nfs_refresh_inode() itself I do the following:

fattr->fileid = nfs_fh_hash(NFS_FH(inode));

Before the spin lock is taken. Full patch attached again for context.

Thanks again,

Jim

> Otherwise, your strategy should in principle be OK, but with the caveat
> that a hash does not suffice to completely prevent collisions even if it
> is well chosen.
> IOW: All you are doing is tweaking the probability of a collision.
>
> --
> Trond Myklebust
> Linux NFS client maintainer
>
> NetApp
> [email protected]
> http://www.netapp.com
>
> NrybXǧv^)޺{.n+{"^nrzh&Gh(階ݢj"mzޖfh~m

--
Jim Vanns
Systems Programmer
Framestore


Attachments:
fscfc-nfs-fileid-hash-2.patch (3.37 kB)