2013-10-16 08:45:32

by Benny Halevy

[permalink] [raw]
Subject: Re: nfs4_file usage

[adding linux-nfs to the Cc]

On 2013-10-16 09:46, Christoph Hellwig wrote:
> On Wed, Oct 16, 2013 at 09:04:12AM +0300, Benny Halevy wrote:
>> On 2013-10-16 00:14, Christoph Hellwig wrote:
>>> it seems like the pnfs tree uses struct nfs4_file for something entirely
>>> different than the current nfsd usage of it - in fact none of the actual
>>> fields except for the refcount, hashing and inode reference seems to be
>>> shared.
>>
>> The dlm patchset adds these members:
>> + struct list_head fi_lo_states;
>>
>> The fi_lo_states (coupled with nfs4_client.cl_lo_states) use case is equivalent
>> to fi_delegations.
>> But that said, it might make sense to move the respective layout state handling
>> to nfs4state.c.
>>
>> + struct nfs4_fsid fi_fsid;
>> fi_fsid is a shorthand for fh_export->ex_fsid and it would be expensive to look
>> it up on demand (return and recall by fsid)
>>
>> Other uses in the full pnfsd patchset add:
>> + struct mutex fi_lo_lock;
>> for inter-locking layout changing ops (layout get/commit/return) across
>> down calls to the file system.
>> I need to see if that can/should be moved to the layout state structure instead.
>>
>> + u32 fi_fhlen;
>> + u8 fi_fhval[NFS4_FHSIZE];
>>
>> Similar to fi_fsid, a shorthand for layout recall by file (common case).
>> I don't see a better place to put it.
>
> Maybe I wasn't clear enough in the first mail. pnfs still needs a
> structure the shadows struct inode with all the fields it adds. I just
> don't think overlyaing it over nfs4_file makes sense, but rather have
> something separate ala:
>
> struct pnfsd_inode {
> atomic_t pi_ref;
> struct hlist_node pi_hash;
> struct list_head pi_lo_states;

I'm reluctant about the layout stateids as it uses common hashing data structs
locking infrastructure, and code with all other nfsd4 stateids.

I hope that the pnfs layer can just carry a pointer to the nfs4_file
to passed to functions living in nfs4state.c that control the layout state.

> struct mutex pi_lo_lock;
> struct nfs4_fsid pi_fsid;
> u32 pi_fhlen;
> u8 pi_fhval[NFS4_FHSIZE];
> };
>

We can have a similar hash table for pnfsd_inode similar to nfs4_file
Note that nfs4_file is also per inode, not per file as might be reflected from its name.
Maybe moving it nfs4state.c also warrants renaming it to something more accurate.
And while there, change file_hashval to hash on i_ino rather than the inode ptr.

Benny


2013-10-16 08:51:18

by Christoph Hellwig

[permalink] [raw]
Subject: Re: nfs4_file usage

On Wed, Oct 16, 2013 at 11:45:28AM +0300, Benny Halevy wrote:
> I'm reluctant about the layout stateids as it uses common hashing data structs
> locking infrastructure, and code with all other nfsd4 stateids.

The layout stateids are another thing that confuses me in the current
pnfs code. Depending on what we find the generic state id might be
embedded into the layout stateid or they might be separate, but there
don't seem to be clear rules on how to deal with freeing things in the
later case. I haven't brought this up because I didn't dig deep into it
enough, but it's for sure more than confusing.

> We can have a similar hash table for pnfsd_inode similar to nfs4_file

Yes, that was the plan.

> Note that nfs4_file is also per inode, not per file as might be reflected from its name.
> Maybe moving it nfs4state.c also warrants renaming it to something more accurate.

Yes, it should have an inode instead of file in the name, and it really
should be nfsd4_ prefix instead of using the client namespace. Given
that it's open/locking specific I'm not even sure that name should be
that generic.

> And while there, change file_hashval to hash on i_ino rather than the inode ptr.

That's a bad idea. i_ino is 32-bit only while many NFS-exported
filesystems have 64-bit inode numbers. Hashing on kernel pointers
genetrally is a fairly good idea, and we do it for the inode in other
places, too.


2013-10-16 12:18:24

by Benny Halevy

[permalink] [raw]
Subject: Re: nfs4_file usage

On 2013-10-16 11:51, Christoph Hellwig wrote:
> On Wed, Oct 16, 2013 at 11:45:28AM +0300, Benny Halevy wrote:
>> I'm reluctant about the layout stateids as it uses common hashing data structs
>> locking infrastructure, and code with all other nfsd4 stateids.
>
> The layout stateids are another thing that confuses me in the current
> pnfs code. Depending on what we find the generic state id might be
> embedded into the layout stateid or they might be separate, but there
> don't seem to be clear rules on how to deal with freeing things in the
> later case. I haven't brought this up because I didn't dig deep into it
> enough, but it's for sure more than confusing.
>

The nfs4.1 layout stateid is like any other stateid (open/lock/deleg)
but it's life time is a bit different as it is not descending from the
open stateid (though the protocol requires the client to use it
to acquire the first layout for the file) and it is released via
either LAYOUTRETURN or CLOSE (in the "return_on_close" case).
The rules here are different enough from open/lock/deleg stateids that
pnfsd needs special code to handle the management of the actual layout state
under the layout stateid and manipulate it accordingly.

>> We can have a similar hash table for pnfsd_inode similar to nfs4_file
>
> Yes, that was the plan.
>
>> Note that nfs4_file is also per inode, not per file as might be reflected from its name.
>> Maybe moving it nfs4state.c also warrants renaming it to something more accurate.
>
> Yes, it should have an inode instead of file in the name, and it really
> should be nfsd4_ prefix instead of using the client namespace. Given
> that it's open/locking specific I'm not even sure that name should be
> that generic.

Yeah, the server naming conventions are confusing and in many cases
conflicting with client names (file names, data types, functions, etc.)

>
>> And while there, change file_hashval to hash on i_ino rather than the inode ptr.
>
> That's a bad idea. i_ino is 32-bit only while many NFS-exported
> filesystems have 64-bit inode numbers. Hashing on kernel pointers
> genetrally is a fairly good idea, and we do it for the inode in other
> places, too.
>

How many bits in the kernel address space are unique?
I am under the impression that they'd differ mostly in their least significant bits anyway.
Regardless, if it works well now then no reason to change it.

2013-10-16 12:33:20

by Christoph Hellwig

[permalink] [raw]
Subject: Re: nfs4_file usage

On Wed, Oct 16, 2013 at 03:18:21PM +0300, Benny Halevy wrote:
> The nfs4.1 layout stateid is like any other stateid (open/lock/deleg)
> but it's life time is a bit different as it is not descending from the
> open stateid (though the protocol requires the client to use it
> to acquire the first layout for the file) and it is released via
> either LAYOUTRETURN or CLOSE (in the "return_on_close" case).
> The rules here are different enough from open/lock/deleg stateids that
> pnfsd needs special code to handle the management of the actual layout state
> under the layout stateid and manipulate it accordingly.

I'm not primarily ocnfused about the protocol even if that isn't all
that clear either. I'm more worried about the code in and around
nfs4_find_create_layout_stateid. I have looked a bit deeper into it
and I think the main problem is that struct nfs4_stid isn't refcounted
by itself, which really confused me expecting nfsd4_lookup_stateid
to return a reference to it.

> How many bits in the kernel address space are unique?
> I am under the impression that they'd differ mostly in their least significant bits anyway.
> Regardless, if it works well now then no reason to change it.

In theory the hash_ptr helper is there to take care exactly of that sort
of issue, in practice I can't see anything in there right now.

Before that becomes an issue we'll have to replace that dumb hash with a
more scalable data structure anyway.

2013-10-16 12:48:59

by Benny Halevy

[permalink] [raw]
Subject: Re: nfs4_file usage

On 2013-10-16 15:33, Christoph Hellwig wrote:
> On Wed, Oct 16, 2013 at 03:18:21PM +0300, Benny Halevy wrote:
>> The nfs4.1 layout stateid is like any other stateid (open/lock/deleg)
>> but it's life time is a bit different as it is not descending from the
>> open stateid (though the protocol requires the client to use it
>> to acquire the first layout for the file) and it is released via
>> either LAYOUTRETURN or CLOSE (in the "return_on_close" case).
>> The rules here are different enough from open/lock/deleg stateids that
>> pnfsd needs special code to handle the management of the actual layout state
>> under the layout stateid and manipulate it accordingly.
>
> I'm not primarily ocnfused about the protocol even if that isn't all
> that clear either. I'm more worried about the code in and around
> nfs4_find_create_layout_stateid. I have looked a bit deeper into it
> and I think the main problem is that struct nfs4_stid isn't refcounted
> by itself, which really confused me expecting nfsd4_lookup_stateid
> to return a reference to it.
>

True. That is likely to change with the state lock elimination project
where a stateid would get refcounted once found and its pointer is returned to
the caller.