[permalink] [raw]

Subject: Re: [PATCH v2 7/9] NFSD: Use rhashtable for managing nfs4_file objects

> On Oct 11, 2022, at 7:37 PM, NeilBrown <[email protected]> wrote:
>
> On Tue, 11 Oct 2022, Chuck Lever III wrote:
>>> On Oct 10, 2022, at 8:16 PM, NeilBrown <[email protected]> wrote:
>>>
>>> On Fri, 07 Oct 2022, Chuck Lever wrote:
>>>>
>>>> -static unsigned int file_hashval(struct svc_fh *fh)
>>>> +/*
>>>> + * The returned hash value is based solely on the address of an in-code
>>>> + * inode, a pointer to a slab-allocated object. The entropy in such a
>>>> + * pointer is concentrated in its middle bits.
>>>
>>> I think you need more justification than that for throwing away some of
>>> the entropy, even if you don't think it is much.
>>
>> We might have that justification:
>>
>> https://lore.kernel.org/linux-nfs/YrUFbLJ5uVbWtZbf@ZenIV/
>>
>> Actually I believe we are not discarding /any/ entropy in
>> this function. The bits we discard are invariant.
>>
>
> Ok, I can accept that this:
>
> + k = ptr >> L1_CACHE_SHIFT;

> I searched for ">> *L1_CACHE_SHIFT". Apart from the nfsd
> filecache you mentioned I find two. One in quota and one in reiserfs.
> Both work with traditional hash tables which are more forgiving of
> longer chains.
> Do you have other evidence of this being a common trope?

This approach is based on the hash function in fs/inode.c,
which uses integer division instead of a shift.

509 static unsigned long hash(struct super_block *sb, unsigned long hashval)
510 {
511 unsigned long tmp;
512
513 tmp = (hashval * (unsigned long)sb) ^ (GOLDEN_RATIO_PRIME + hashval) /
514 L1_CACHE_BYTES;
515 tmp = tmp ^ ((tmp ^ GOLDEN_RATIO_PRIME) >> i_hash_shift);
516 return tmp & i_hash_mask;
517 }

> only discards invariant bits, but how can you justify this:
>
> + k &= 0x00ffffff;
>
> ??

After shifting an address, the top byte generally contains
invariant bits as well.

> And given that you pass it all to jhash anyway, why not just pass all of
> it?

I don't think this is a big deal, but these functions are
basically the same as what was recently merged without
complaint. It's not a high priority to revisit those.

It might be worth a clean-up to share this hash function
between the two hash tables... at that point we might
consider removing the extra mask.

>> And, note that this is exactly the same situation we just merged
>> in the filecache overhaul, and is a common trope amongst other
>> hash tables that base their function on the inode's address.
>>
>>
>>> Presumably you think hashing 32 bits is faster than hashing 64 bits.
>>> Maybe it is, but is it worth it?
>>>
>>> rhashtable depends heavily on having a strong hash function. In
>>> particular if any bucket ends up with more than 16 elements it will
>>> choose a new seed and rehash. If you deliberately remove some bits that
>>> it might have been used to spread those 16 out, then you are asking for
>>> trouble.
>>>
>>> We know that two different file handles can refer to the same inode
>>> ("rarely"), and you deliberately place them in the same hash bucket.
>>> So if an attacker arranges to access 17 files with the same inode but
>>> different file handles, then the hashtable will be continuously
>>> rehashed.
>>>
>>> The preferred approach when you require things to share a hash chain is
>>> to use an rhl table.
>>
>> Again, this is the same situation for the filecache. Do you
>> believe it is worth reworking that? I'm guessing "yes".
>
> As a matter of principle: yes.
> rhashtable is designed to assume that hash collisions are bad and can be
> changed by choosing a different seed.
> rhl_tables was explicitly added for cases when people wanted multiple
> elements to hash to the same value.
>
> The chance of it causing a problem without an attack are admittedly
> tiny. Attacks are only possible with subtree_check enabled, or if the
> underlying filesystem does something "clever" with file handles, so
> there wouldn't be many situations where an attack would even be
> possible. But if it were possible, it would likely be easy.
> The cost of the attack would be a minor-to-modest performance impact.
>
> So it is hard to argue "this code is dangerous and must be changed", but
> we have different tools for a reason, and I believe that rhl-tables is
> the right tool for this job.

Agreed. I wasn't suggesting it's an emergency situation, but it's
something that should get fixed at some point if there's a problem
with it, even a minor one.

I think I stopped at the non-list variant of rhashtable because
using rhl was more complex, and the non-list variant seemed to
work fine. There's no architectural reason either file_hashtbl
or the filecache must use the non-list variant.

In any event, it's worth taking the trouble now to change the
nfs4_file implementation proposed here as you suggest.

--
Chuck Lever

2022-10-12 21:21:27

by NeilBrown

[permalink] [raw]

Subject: Re: [PATCH v2 7/9] NFSD: Use rhashtable for managing nfs4_file objects

On Thu, 13 Oct 2022, Chuck Lever III wrote:
>
> I think I stopped at the non-list variant of rhashtable because
> using rhl was more complex, and the non-list variant seemed to
> work fine. There's no architectural reason either file_hashtbl
> or the filecache must use the non-list variant.
>
> In any event, it's worth taking the trouble now to change the
> nfs4_file implementation proposed here as you suggest.

If you like you could leave it as-is for now and I can provide a patch
to convert to rhl-tables later (won't be until late October).
There is one thing I would need to understand though: why are the
nfsd_files per-filehandle instead of per-inode? There is probably a
good reason, but I cannot think of one.

Thanks,
NeilBrown

2022-10-13 14:36:29

by Chuck Lever III

[permalink] [raw]

Subject: Re: [PATCH v2 7/9] NFSD: Use rhashtable for managing nfs4_file objects

> On Oct 12, 2022, at 5:18 PM, NeilBrown <[email protected]> wrote:
>
> On Thu, 13 Oct 2022, Chuck Lever III wrote:
>>
>> I think I stopped at the non-list variant of rhashtable because
>> using rhl was more complex, and the non-list variant seemed to
>> work fine. There's no architectural reason either file_hashtbl
>> or the filecache must use the non-list variant.
>>
>> In any event, it's worth taking the trouble now to change the
>> nfs4_file implementation proposed here as you suggest.
>
> If you like you could leave it as-is for now and I can provide a patch
> to convert to rhl-tables later (won't be until late October).
> There is one thing I would need to understand though: why are the
> nfsd_files per-filehandle instead of per-inode? There is probably a
> good reason, but I cannot think of one.

I'm not clear on your offer: do you mean converting the nfsd_file
cache from rhashtable to rhl, or converting the proposed nfs4_file
rework? I had planned to do the latter myself and post a refresh.

The nfsd_file_acquire API is the only place that seems to want a
filehandle, and it's just to lookup the underlying inode. Perhaps
I don't understand your question?

--
Chuck Lever

2022-10-13 22:22:48

by NeilBrown

[permalink] [raw]

Subject: Re: [PATCH v2 7/9] NFSD: Use rhashtable for managing nfs4_file objects

On Fri, 14 Oct 2022, Chuck Lever III wrote:
>
> > On Oct 12, 2022, at 5:18 PM, NeilBrown <[email protected]> wrote:
> >
> > On Thu, 13 Oct 2022, Chuck Lever III wrote:
> >>
> >> I think I stopped at the non-list variant of rhashtable because
> >> using rhl was more complex, and the non-list variant seemed to
> >> work fine. There's no architectural reason either file_hashtbl
> >> or the filecache must use the non-list variant.
> >>
> >> In any event, it's worth taking the trouble now to change the
> >> nfs4_file implementation proposed here as you suggest.
> >
> > If you like you could leave it as-is for now and I can provide a patch
> > to convert to rhl-tables later (won't be until late October).
> > There is one thing I would need to understand though: why are the
> > nfsd_files per-filehandle instead of per-inode? There is probably a
> > good reason, but I cannot think of one.
>
> I'm not clear on your offer: do you mean converting the nfsd_file
> cache from rhashtable to rhl, or converting the proposed nfs4_file
> rework? I had planned to do the latter myself and post a refresh.

Either/both. Of course if you do the refresh, then I'll just review it.

>
> The nfsd_file_acquire API is the only place that seems to want a
> filehandle, and it's just to lookup the underlying inode. Perhaps
> I don't understand your question?

Sorry, I meant nfs4_files, not nfsd_file: find_file() and find_or_add_file().
Why is there one nfs4_file per filehandle

I see that there can be several nfsd_file per inode - in different
network namespaces, or with different credentials or different access
modes. That really needs to be fixed.

Thanks,
NeilBrown

2022-10-14 12:54:46

by Chuck Lever III

[permalink] [raw]

Subject: Re: [PATCH v2 7/9] NFSD: Use rhashtable for managing nfs4_file objects

> On Oct 13, 2022, at 6:14 PM, NeilBrown <[email protected]> wrote:
>
> On Fri, 14 Oct 2022, Chuck Lever III wrote:
>>
>>> On Oct 12, 2022, at 5:18 PM, NeilBrown <[email protected]> wrote:
>>>
>>> On Thu, 13 Oct 2022, Chuck Lever III wrote:
>>>>
>>>> I think I stopped at the non-list variant of rhashtable because
>>>> using rhl was more complex, and the non-list variant seemed to
>>>> work fine. There's no architectural reason either file_hashtbl
>>>> or the filecache must use the non-list variant.
>>>>
>>>> In any event, it's worth taking the trouble now to change the
>>>> nfs4_file implementation proposed here as you suggest.
>>>
>>> If you like you could leave it as-is for now and I can provide a patch
>>> to convert to rhl-tables later (won't be until late October).
>>> There is one thing I would need to understand though: why are the
>>> nfsd_files per-filehandle instead of per-inode? There is probably a
>>> good reason, but I cannot think of one.
>>
>> I'm not clear on your offer: do you mean converting the nfsd_file
>> cache from rhashtable to rhl, or converting the proposed nfs4_file
>> rework? I had planned to do the latter myself and post a refresh.
>
> Either/both. Of course if you do the refresh, then I'll just review it.

Yep, I plan to repost, as part of addressing (your) review comments.

>> The nfsd_file_acquire API is the only place that seems to want a
>> filehandle, and it's just to lookup the underlying inode. Perhaps
>> I don't understand your question?
>
> Sorry, I meant nfs4_files, not nfsd_file: find_file() and find_or_add_file().
> Why is there one nfs4_file per filehandle

I can't answer that (yet), but I suspect there is some semantic
association between the [current] filehandle and a particular
state ID that makes this a sensible arrangement.

> I see that there can be several nfsd_file per inode - in different
> network namespaces, or with different credentials or different access
> modes.

My impression is that is by design. Each namespace and unique
credential needs a distinct nfsd_file. Each nfsd_file acts like
an open file descriptor in user space.

> That really needs to be fixed.

I'm not sure I agree, but maybe "that" and "fixed" are doing some
heavy lifting. Jeff might be able to add some insight on design
purpose, and we can write a patch that adds that as a documenting
comment somewhere.

--
Chuck Lever