by Benjamin Coddington

[permalink] [raw]

Subject: Re: [PATCH v6 05/13] NFS: Improve algorithm for falling back to uncached readdir

On 23 Feb 2022, at 7:17, Trond Myklebust wrote:

> On Tue, 2022-02-22 at 15:21 -0500, Benjamin Coddington wrote:
>> On 22 Feb 2022, at 15:11, Trond Myklebust wrote:
>>
>>> On Tue, 2022-02-22 at 07:50 -0500, Benjamin Coddington wrote:
>>>> On 21 Feb 2022, at 18:20, Trond Myklebust wrote:
>>>>
>>>>> On Mon, 2022-02-21 at 16:10 -0500, Benjamin Coddington wrote:
>>>>>> On 21 Feb 2022, at 15:55, Trond Myklebust wrote:
>>>>>>>
>>>>>>> We will always need the ability to cut over to uncached
>>>>>>> readdir.
>>>>>>
>>>>>> Yes.
>>>>>>
>>>>>>> If the cookie is no longer returned by the server because
>>>>>>> one
>>>>>>> or more
>>>>>>> files were deleted then we need to resolve the situation
>>>>>>> somehow (IOW:
>>>>>>> the 'rm *' case). The new algorithm _does_ improve
>>>>>>> performance
>>>>>>> on those
>>>>>>> situations, because it no longer requires us to read the
>>>>>>> entire
>>>>>>> directory before switching over: we try 5 times, then fail
>>>>>>> over.
>>>>>>
>>>>>> Yes, using per-page validation doesn't remove the need for
>>>>>> uncached
>>>>>> readdir. It does allow a reader to simply resume filling the
>>>>>> cache where
>>>>>> it left off. There's no need to try 5 times and fail over.
>>>>>> And
>>>>>> there's
>>>>>> no need to make a trade-off and make the situation worse in
>>>>>> certain
>>>>>> scenarios.
>>>>>>
>>>>>> I thought I'd point that out and make an offer to re-submit
>>>>>> it.
>>>>>> Any
>>>>>> interest?
>>>>>>
>>>>>
>>>>> As I recall, I had concerns about that approach. Can you
>>>>> explain
>>>>> again
>>>>> how it will work?
>>>>
>>>> Every page of readdir results has the directory's change attr
>>>> stored
>>>> on the
>>>> page. That, along with the page's index and the first cookie is
>>>> enough
>>>> information to determine if the page's data can be used by
>>>> another
>>>> process.
>>>>
>>>> Which means that when the pagecache is dropped, fillers don't
>>>> have to
>>>> restart
>>>> filling the cache at page index 0, they can continue to fill at
>>>> whatever
>>>> index they were at previously. If another process finds a page
>>>> that
>>>> doesn't
>>>> match its page index, cookie, and the current directory's change
>>>> attr, the
>>>> page is dropped and refilled from that process' indexing.
>>>>
>>>>> A few of the concerns I have revolve around
>>>>> telldir()/seekdir(). If
>>>>> the
>>>>> platform is 32-bit, then we cannot use cookies as the telldir()
>>>>> output,
>>>>> and so our only option is to use offsets into the page cache
>>>>> (this
>>>>> is
>>>>> why this patch carves out an exception if desc->dir_cookie ==
>>>>> 0).
>>>>> How
>>>>> would that work with you scheme?
>>>>
>>>> For 32-bit seekdir, pages are filled starting at index 0. This
>>>> is
>>>> very
>>>> unlikely to match other readers (unless they also do the _same_
>>>> seekdir).
>>>>
>>>>> Even in the 64-bit case where are able to use cookies for
>>>>> telldir()/seekdir(), how do we determine an appropriate page
>>>>> index
>>>>> after a seekdir()?
>>>>
>>>> We don't. Instead we start filling at index 0. Again, the
>>>> pagecache
>>>> will
>>>> only be useful to other processes that have done the same llseek.
>>>>
>>>> This approach optimizes the pagecache for processes that are
>>>> doing
>>>> similar
>>>> work, and has the added benefit of scaling well for large
>>>> directory
>>>> listings
>>>> under memory pressure. Also a number of classes of directory
>>>> modifications
>>>> (such as renames, or insertions/deletions at locations a reader
>>>> has
>>>> moved
>>>> beyond) are no longer a reason to re-fill the pagecache from
>>>> scratch.
>>>>
>>>
>>> OK, you've got me more or less sold on it.
>>>
>>> I'd still like to figure out how to improve the performance for
>>> seekdir
>>> (since I do have an interest in re-exporting NFS) but I've been
>>> playing
>>> around with a couple of patches that implement your concept and
>>> they do
>>> seem to work well for the common case of a linear read through the
>>> directory.
>>
>> Nice. I have another version from the one I originally posted:
>> https://lore.kernel.org/linux-nfs/[email protected]/
>>
>> .. but I don't remember exactly the changes and it needs rebasing.
>> Should I
>> rebase it against your testing branch and send the result?
>
> My 2 patches did something slightly different to yours, storing the
> change attribute in the array header instead of in page_private. That
> makes for a slightly smaller change.

I worried that increasing the size of the array header wouldn't allow us
to
store as many entries per page.

> Having further slept on the idea, I think I know how to solve the
> seekdir() issue, at least for the cases where we can use cookies as
> telldir()/seekdir() offsets. If we ditch the linear index, and instead
> use a hash of the desc->last_cookie as the index, then we can make
> random access to the directories work with your idea.

Yes! Nice!

> There are a couple of further problems with this concept, but I think
> those are solvable. The issues I have identified so far are:
>
> * invalidate_inode_pages2_range() no longer works to clear out the
> page cache in a predictable way for the readdirplus heuristic.
> * duplicate cookie detection needs to be changed.

I don't understand those problems, yet. I have a few other things I
have to
do, but after those I will come back and give them some attention.

FWIW, I rebased that old series of page-validation work on your
e85f86150b32 NFS: Trace effects of the readdirplus heuristic

I'll send it as a reply to [ v6 13/13] in this posting, but I've only
compile-tested it so far.

Ben