Hi,
So I have a bit of a newbie question (apologies) that came to me while
debugging some code that was spamming our NFS servers with lookups for
nonexistent files.
If we can cache directory entries (readdir) and even all their
attributes (readdirplus) for some specified period of time (actimeo,
nocto) on a client, then why can't we use that data to serve negative
lookups for files in that directory too (if we so choose)?
There are probably very good reasons you always need to do a
(negative) file lookup, like being able to read files recently created
on another client (despite your local cache for that directory), but
I'm just curious what the official reasons are. If we could choose to
serve negative lookups using the directory entries cache for a
read-only or unchanging filesystem, would that still be bad? We
already choose to use nocto for some workloads...
In our case we see these kinds of heavy negative lookup workloads for
network installed software (100 entries in PYTHONPATH is bad) and in
buggy software (randomly generated filename lookups are really bad!).
Of course, this overhead gets really bad as you add latency between
the client and server.
Daire
On Thu, 2022-09-01 at 14:32 +0100, Daire Byrne wrote:
> Hi,
>
> So I have a bit of a newbie question (apologies) that came to me
> while
> debugging some code that was spamming our NFS servers with lookups
> for
> nonexistent files.
>
> If we can cache directory entries (readdir) and even all their
> attributes (readdirplus) for some specified period of time (actimeo,
> nocto) on a client, then why can't we use that data to serve negative
> lookups for files in that directory too (if we so choose)?
>
> There are probably very good reasons you always need to do a
> (negative) file lookup, like being able to read files recently
> created
> on another client (despite your local cache for that directory), but
> I'm just curious what the official reasons are. If we could choose to
> serve negative lookups using the directory entries cache for a
> read-only or unchanging filesystem, would that still be bad? We
> already choose to use nocto for some workloads...
>
> In our case we see these kinds of heavy negative lookup workloads for
> network installed software (100 entries in PYTHONPATH is bad) and in
> buggy software (randomly generated filename lookups are really bad!).
> Of course, this overhead gets really bad as you add latency between
> the client and server.
>
> Daire
man 5 nfs
Look for the section on the 'lookupcache=mode' mount option.
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]
On Thu, 1 Sept 2022 at 14:55, Trond Myklebust <[email protected]> wrote:
> man 5 nfs
>
> Look for the section on the 'lookupcache=mode' mount option.
So I get that the client caches negative lookups once we've made them
(the default lookupcache=all), but what I'm wondering is if we have
already cached the entire directory contents before the (negative)
lookup, can we not reply that it doesn't exist using that information
without having to go across the wire the at all (even the first time)?
Or is there no concept of "cached directory contents"? I thought that
maybe readdir/readdirplus knew about the "full contents" of a
directory?
My thinking was that if we did a readdir/readirplus first, we could
then do lookups for any random non-existent filename without having to
send anything across the wire. Like I said, a newbie question with
limited understanding of the actual internals :)
Daire
On Thu, 2022-09-01 at 16:27 +0100, Daire Byrne wrote:
> On Thu, 1 Sept 2022 at 14:55, Trond Myklebust
> <[email protected]> wrote:
> > man 5 nfs
> >
> > Look for the section on the 'lookupcache=mode' mount option.
>
> So I get that the client caches negative lookups once we've made them
> (the default lookupcache=all), but what I'm wondering is if we have
> already cached the entire directory contents before the (negative)
> lookup, can we not reply that it doesn't exist using that information
> without having to go across the wire the at all (even the first
> time)?
>
> Or is there no concept of "cached directory contents"? I thought that
> maybe readdir/readdirplus knew about the "full contents" of a
> directory?
>
> My thinking was that if we did a readdir/readirplus first, we could
> then do lookups for any random non-existent filename without having
> to
> send anything across the wire. Like I said, a newbie question with
> limited understanding of the actual internals :)
>
> Daire
There is no concept of a 'fully cached directory'. The VFS and the
memory management code are free to kick out any unused cached entries
from the dcache at any time and for any reason. So the absence of an
entry is not the same as a negative entry.
Furthermore, certain features like case insensitive filesystems on
servers makes it hard for the NFS client to know whether or not a
specific name will or won't match an entry returned by readdir. In
those circumstances, even if you think you have cached the entire
directory, you are not guaranteed to know whether the lookup will fail
or succeed.
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]
Yea, got it now. That all makes sense. Thanks!
Apologies for the noise. Now I just have to go and fix a bunch of our
user's code so I can forget about negative lookups again...
Daire
On Thu, 1 Sept 2022 at 16:43, Trond Myklebust <[email protected]> wrote:
>
> On Thu, 2022-09-01 at 16:27 +0100, Daire Byrne wrote:
> > On Thu, 1 Sept 2022 at 14:55, Trond Myklebust
> > <[email protected]> wrote:
> > > man 5 nfs
> > >
> > > Look for the section on the 'lookupcache=mode' mount option.
> >
> > So I get that the client caches negative lookups once we've made them
> > (the default lookupcache=all), but what I'm wondering is if we have
> > already cached the entire directory contents before the (negative)
> > lookup, can we not reply that it doesn't exist using that information
> > without having to go across the wire the at all (even the first
> > time)?
> >
> > Or is there no concept of "cached directory contents"? I thought that
> > maybe readdir/readdirplus knew about the "full contents" of a
> > directory?
> >
> > My thinking was that if we did a readdir/readirplus first, we could
> > then do lookups for any random non-existent filename without having
> > to
> > send anything across the wire. Like I said, a newbie question with
> > limited understanding of the actual internals :)
> >
> > Daire
>
> There is no concept of a 'fully cached directory'. The VFS and the
> memory management code are free to kick out any unused cached entries
> from the dcache at any time and for any reason. So the absence of an
> entry is not the same as a negative entry.
>
> Furthermore, certain features like case insensitive filesystems on
> servers makes it hard for the NFS client to know whether or not a
> specific name will or won't match an entry returned by readdir. In
> those circumstances, even if you think you have cached the entire
> directory, you are not guaranteed to know whether the lookup will fail
> or succeed.
>
> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> [email protected]
>
>
Apologies for dragging up an old thread, but I've had to tackle
wayward negative lookup storms again and I have obviously half
forgotten what I learned in this thread last time (even after
re-reading it!).
Can I just ask if I understand correctly and that there was an
intention a long time ago to be able to serve negative dentries from a
"complete" READDIRPLUS result?
https://www.cs.helsinki.fi/linux/linux-kernel/2002-30/0108.html
So if we did a readdirplus on a directory then immediately fired
random non existent lookups at the directory, it could be served from
the readdirplus result? i.e. not in readdir result, then return ENOENT
without needing to ask server?
But that is not the case today because you can't track the
"completeness" of a READDIRPLUS result for a directory over time (in
page cache)? Or is it all due to needing to deal with case insensitive
filesystems (which I would think effects positive lookups too)?
I did try to decipher the v6.6 fs/nfs/dir.c READDIR bits but I quickly
got lost...
Cheers,
Daire
On Thu, 1 Sept 2022 at 16:49, Daire Byrne <[email protected]> wrote:
>
> Yea, got it now. That all makes sense. Thanks!
>
> Apologies for the noise. Now I just have to go and fix a bunch of our
> user's code so I can forget about negative lookups again...
>
> Daire
>
> On Thu, 1 Sept 2022 at 16:43, Trond Myklebust <[email protected]> wrote:
> >
> > On Thu, 2022-09-01 at 16:27 +0100, Daire Byrne wrote:
> > > On Thu, 1 Sept 2022 at 14:55, Trond Myklebust
> > > <[email protected]> wrote:
> > > > man 5 nfs
> > > >
> > > > Look for the section on the 'lookupcache=mode' mount option.
> > >
> > > So I get that the client caches negative lookups once we've made them
> > > (the default lookupcache=all), but what I'm wondering is if we have
> > > already cached the entire directory contents before the (negative)
> > > lookup, can we not reply that it doesn't exist using that information
> > > without having to go across the wire the at all (even the first
> > > time)?
> > >
> > > Or is there no concept of "cached directory contents"? I thought that
> > > maybe readdir/readdirplus knew about the "full contents" of a
> > > directory?
> > >
> > > My thinking was that if we did a readdir/readirplus first, we could
> > > then do lookups for any random non-existent filename without having
> > > to
> > > send anything across the wire. Like I said, a newbie question with
> > > limited understanding of the actual internals :)
> > >
> > > Daire
> >
> > There is no concept of a 'fully cached directory'. The VFS and the
> > memory management code are free to kick out any unused cached entries
> > from the dcache at any time and for any reason. So the absence of an
> > entry is not the same as a negative entry.
> >
> > Furthermore, certain features like case insensitive filesystems on
> > servers makes it hard for the NFS client to know whether or not a
> > specific name will or won't match an entry returned by readdir. In
> > those circumstances, even if you think you have cached the entire
> > directory, you are not guaranteed to know whether the lookup will fail
> > or succeed.
> >
> > --
> > Trond Myklebust
> > Linux NFS client maintainer, Hammerspace
> > [email protected]
> >
> >
On Fri, 2024-04-05 at 15:47 +0100, Daire Byrne wrote:
> Apologies for dragging up an old thread, but I've had to tackle
> wayward negative lookup storms again and I have obviously half
> forgotten what I learned in this thread last time (even after
> re-reading it!).
>
> Can I just ask if I understand correctly and that there was an
> intention a long time ago to be able to serve negative dentries from
> a
> "complete" READDIRPLUS result?
>
> https://www.cs.helsinki.fi/linux/linux-kernel/2002-30/0108.html
>
> So if we did a readdirplus on a directory then immediately fired
> random non existent lookups at the directory, it could be served from
> the readdirplus result? i.e. not in readdir result, then return
> ENOENT
> without needing to ask server?
>
> But that is not the case today because you can't track the
> "completeness" of a READDIRPLUS result for a directory over time (in
> page cache)? Or is it all due to needing to deal with case
> insensitive
> filesystems (which I would think effects positive lookups too)?
>
> I did try to decipher the v6.6 fs/nfs/dir.c READDIR bits but I
> quickly
> got lost...
>
> Cheers,
>
> Daire
If the question is whether the client trusts that a READDIR call to the
server returns all the names that can be successfully looked up, then
the answer is "no".
It's not even a question of case sensitivity. There are plenty of
servers out there that will allow you to look up names that won't ever
appear in the results of a READDIR (or READDIRPLUS) call. Having a
hidden ".snapshot" directory is, for instance, a popular way to present
snapshots.
So no, we're not ever going to implement any negative dentry cache
scheme that relies on READDIR/READDIRPLUS.
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]
Thanks for the clarity Trond - I promise not to forget this time and
ask the same question again in 2 years!
It just keeps coming up here at DNEG due to accessing software over
NFS and crazy PYTHONPATH usage by some of our developers. In some
cases, there are 57,000 negative lookups but only 5000 positive
lookups (and opens)!
Getting devs to optimise their code is my cross to bear I guess.
But this is also a well known and common problem for large batch farms
and there are some novel workarounds out there:
https://guix.gnu.org/en/blog/2021/taming-the-stat-storm-with-a-loader-cache
https://computing.llnl.gov/projects/spindle
https://cernvm.cern.ch/fs/
Coupled with our propensity for high latency (~100ms) NFS via
re-export servers (for "cloud rendering"), these inefficient path
lookups quickly become a killer - the application takes longer to
lookup non-existent files and open files, than it does to execute to
completion. We use aggressive caching (actimeo=3600,nocto,vers=3) and
"preload" metadata ops (ls -l, open) on a regular basis to try and
keep things in (re-export) client cache which certainly helps. It's
hard to keep known (expensive) metadata worksets in memory.
I've also been looking at using an overlay and hand crafting whiteout
files in the upper layers to essentially block known negative lookups
from hitting the lower NFS share - again, only useful and correct for
read-only software shares.
I wonder if Jeff Layton's directory delegations will help for
(read-only) metadata heavy lookups over the WAN?
Daire
On Fri, 5 Apr 2024 at 16:03, Trond Myklebust <[email protected]> wrote:
>
> On Fri, 2024-04-05 at 15:47 +0100, Daire Byrne wrote:
> > Apologies for dragging up an old thread, but I've had to tackle
> > wayward negative lookup storms again and I have obviously half
> > forgotten what I learned in this thread last time (even after
> > re-reading it!).
> >
> > Can I just ask if I understand correctly and that there was an
> > intention a long time ago to be able to serve negative dentries from
> > a
> > "complete" READDIRPLUS result?
> >
> > https://www.cs.helsinki.fi/linux/linux-kernel/2002-30/0108.html
> >
> > So if we did a readdirplus on a directory then immediately fired
> > random non existent lookups at the directory, it could be served from
> > the readdirplus result? i.e. not in readdir result, then return
> > ENOENT
> > without needing to ask server?
> >
> > But that is not the case today because you can't track the
> > "completeness" of a READDIRPLUS result for a directory over time (in
> > page cache)? Or is it all due to needing to deal with case
> > insensitive
> > filesystems (which I would think effects positive lookups too)?
> >
> > I did try to decipher the v6.6 fs/nfs/dir.c READDIR bits but I
> > quickly
> > got lost...
> >
> > Cheers,
> >
> > Daire
>
> If the question is whether the client trusts that a READDIR call to the
> server returns all the names that can be successfully looked up, then
> the answer is "no".
> It's not even a question of case sensitivity. There are plenty of
> servers out there that will allow you to look up names that won't ever
> appear in the results of a READDIR (or READDIRPLUS) call. Having a
> hidden ".snapshot" directory is, for instance, a popular way to present
> snapshots.
>
> So no, we're not ever going to implement any negative dentry cache
> scheme that relies on READDIR/READDIRPLUS.
> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> [email protected]
>
>
On Fri, 2024-04-12 at 10:11 +0100, Daire Byrne wrote:
> Thanks for the clarity Trond - I promise not to forget this time and
> ask the same question again in 2 years!
>
> It just keeps coming up here at DNEG due to accessing software over
> NFS and crazy PYTHONPATH usage by some of our developers. In some
> cases, there are 57,000 negative lookups but only 5000 positive
> lookups (and opens)!
>
> Getting devs to optimise their code is my cross to bear I guess.
>
> But this is also a well known and common problem for large batch farms
> and there are some novel workarounds out there:
>
> https://guix.gnu.org/en/blog/2021/taming-the-stat-storm-with-a-loader-cache
> https://computing.llnl.gov/projects/spindle
> https://cernvm.cern.ch/fs/
>
> Coupled with our propensity for high latency (~100ms) NFS via
> re-export servers (for "cloud rendering"), these inefficient path
> lookups quickly become a killer - the application takes longer to
> lookup non-existent files and open files, than it does to execute to
> completion. We use aggressive caching (actimeo=3600,nocto,vers=3) and
> "preload" metadata ops (ls -l, open) on a regular basis to try and
> keep things in (re-export) client cache which certainly helps. It's
> hard to keep known (expensive) metadata worksets in memory.
>
> I've also been looking at using an overlay and hand crafting whiteout
> files in the upper layers to essentially block known negative lookups
> from hitting the lower NFS share - again, only useful and correct for
> read-only software shares.
>
> I wonder if Jeff Layton's directory delegations will help for
> (read-only) metadata heavy lookups over the WAN?
>
Probably not. In order to optimize away lookups of negative dentries
that aren't in cache, you need to know all of the positive dentries in
the directory. As Trond pointed out earlier in the discussion, NFS
doesn't have a concept of directory "completeness", so we can't
reasonably do this.
FWIW, CephFS does have such a concept and can satisfy readdir requests
and negative lookups out of the cache when it has complete directory
info.
>
>
> On Fri, 5 Apr 2024 at 16:03, Trond Myklebust <[email protected]> wrote:
> >
> > On Fri, 2024-04-05 at 15:47 +0100, Daire Byrne wrote:
> > > Apologies for dragging up an old thread, but I've had to tackle
> > > wayward negative lookup storms again and I have obviously half
> > > forgotten what I learned in this thread last time (even after
> > > re-reading it!).
> > >
> > > Can I just ask if I understand correctly and that there was an
> > > intention a long time ago to be able to serve negative dentries from
> > > a
> > > "complete" READDIRPLUS result?
> > >
> > > https://www.cs.helsinki.fi/linux/linux-kernel/2002-30/0108.html
> > >
> > > So if we did a readdirplus on a directory then immediately fired
> > > random non existent lookups at the directory, it could be served from
> > > the readdirplus result? i.e. not in readdir result, then return
> > > ENOENT
> > > without needing to ask server?
> > >
> > > But that is not the case today because you can't track the
> > > "completeness" of a READDIRPLUS result for a directory over time (in
> > > page cache)? Or is it all due to needing to deal with case
> > > insensitive
> > > filesystems (which I would think effects positive lookups too)?
> > >
> > > I did try to decipher the v6.6 fs/nfs/dir.c READDIR bits but I
> > > quickly
> > > got lost...
> > >
> > > Cheers,
> > >
> > > Daire
> >
> > If the question is whether the client trusts that a READDIR call to the
> > server returns all the names that can be successfully looked up, then
> > the answer is "no".
> > It's not even a question of case sensitivity. There are plenty of
> > servers out there that will allow you to look up names that won't ever
> > appear in the results of a READDIR (or READDIRPLUS) call. Having a
> > hidden ".snapshot" directory is, for instance, a popular way to present
> > snapshots.
> >
> > So no, we're not ever going to implement any negative dentry cache
> > scheme that relies on READDIR/READDIRPLUS.
> > --
> > Trond Myklebust
> > Linux NFS client maintainer, Hammerspace
> > [email protected]
> >
> >
>
--
Jeff Layton <[email protected]>
On Fri, 12 Apr 2024 at 11:21, Jeff Layton <[email protected]> wrote:
>
> On Fri, 2024-04-12 at 10:11 +0100, Daire Byrne wrote:
> > Thanks for the clarity Trond - I promise not to forget this time and
> > ask the same question again in 2 years!
> >
> > It just keeps coming up here at DNEG due to accessing software over
> > NFS and crazy PYTHONPATH usage by some of our developers. In some
> > cases, there are 57,000 negative lookups but only 5000 positive
> > lookups (and opens)!
> >
> > Getting devs to optimise their code is my cross to bear I guess.
> >
> > But this is also a well known and common problem for large batch farms
> > and there are some novel workarounds out there:
> >
> > https://guix.gnu.org/en/blog/2021/taming-the-stat-storm-with-a-loader-cache
> > https://computing.llnl.gov/projects/spindle
> > https://cernvm.cern.ch/fs/
> >
> > Coupled with our propensity for high latency (~100ms) NFS via
> > re-export servers (for "cloud rendering"), these inefficient path
> > lookups quickly become a killer - the application takes longer to
> > lookup non-existent files and open files, than it does to execute to
> > completion. We use aggressive caching (actimeo=3600,nocto,vers=3) and
> > "preload" metadata ops (ls -l, open) on a regular basis to try and
> > keep things in (re-export) client cache which certainly helps. It's
> > hard to keep known (expensive) metadata worksets in memory.
> >
> > I've also been looking at using an overlay and hand crafting whiteout
> > files in the upper layers to essentially block known negative lookups
> > from hitting the lower NFS share - again, only useful and correct for
> > read-only software shares.
> >
> > I wonder if Jeff Layton's directory delegations will help for
> > (read-only) metadata heavy lookups over the WAN?
> >
>
> Probably not. In order to optimize away lookups of negative dentries
> that aren't in cache, you need to know all of the positive dentries in
> the directory. As Trond pointed out earlier in the discussion, NFS
> doesn't have a concept of directory "completeness", so we can't
> reasonably do this.
>
> FWIW, CephFS does have such a concept and can satisfy readdir requests
> and negative lookups out of the cache when it has complete directory
> info.
Out of interest, do directory delegations help with positive lookups
or repeat opens? They may be less numerous in our badly behaved
workloads, but they are still nice to optimise for latency.
Can you disable "cto" for example if you have a directory delegation
and repeatedly open the same file for reading without a network hop?
I also noticed that "nocto" can completely stop any subsequent network
hops for opens (with a long actimeo) for NFSv3, but on NFSv4 it only
cuts a single GETATTR before still doing an OPEN DH over the network
each time.
I'm probably wandering off into "disconnected clients" and AFS style
territory now...
Daire
> > On Fri, 5 Apr 2024 at 16:03, Trond Myklebust <[email protected]> wrote:
> > >
> > > On Fri, 2024-04-05 at 15:47 +0100, Daire Byrne wrote:
> > > > Apologies for dragging up an old thread, but I've had to tackle
> > > > wayward negative lookup storms again and I have obviously half
> > > > forgotten what I learned in this thread last time (even after
> > > > re-reading it!).
> > > >
> > > > Can I just ask if I understand correctly and that there was an
> > > > intention a long time ago to be able to serve negative dentries from
> > > > a
> > > > "complete" READDIRPLUS result?
> > > >
> > > > https://www.cs.helsinki.fi/linux/linux-kernel/2002-30/0108.html
> > > >
> > > > So if we did a readdirplus on a directory then immediately fired
> > > > random non existent lookups at the directory, it could be served from
> > > > the readdirplus result? i.e. not in readdir result, then return
> > > > ENOENT
> > > > without needing to ask server?
> > > >
> > > > But that is not the case today because you can't track the
> > > > "completeness" of a READDIRPLUS result for a directory over time (in
> > > > page cache)? Or is it all due to needing to deal with case
> > > > insensitive
> > > > filesystems (which I would think effects positive lookups too)?
> > > >
> > > > I did try to decipher the v6.6 fs/nfs/dir.c READDIR bits but I
> > > > quickly
> > > > got lost...
> > > >
> > > > Cheers,
> > > >
> > > > Daire
> > >
> > > If the question is whether the client trusts that a READDIR call to the
> > > server returns all the names that can be successfully looked up, then
> > > the answer is "no".
> > > It's not even a question of case sensitivity. There are plenty of
> > > servers out there that will allow you to look up names that won't ever
> > > appear in the results of a READDIR (or READDIRPLUS) call. Having a
> > > hidden ".snapshot" directory is, for instance, a popular way to present
> > > snapshots.
> > >
> > > So no, we're not ever going to implement any negative dentry cache
> > > scheme that relies on READDIR/READDIRPLUS.
> > > --
> > > Trond Myklebust
> > > Linux NFS client maintainer, Hammerspace
> > > [email protected]
> > >
> > >
> >
>
> --
> Jeff Layton <[email protected]>
On Fri, 2024-04-12 at 12:43 +0100, Daire Byrne wrote:
> On Fri, 12 Apr 2024 at 11:21, Jeff Layton <[email protected]> wrote:
> >
> > On Fri, 2024-04-12 at 10:11 +0100, Daire Byrne wrote:
> > > Thanks for the clarity Trond - I promise not to forget this time and
> > > ask the same question again in 2 years!
> > >
> > > It just keeps coming up here at DNEG due to accessing software over
> > > NFS and crazy PYTHONPATH usage by some of our developers. In some
> > > cases, there are 57,000 negative lookups but only 5000 positive
> > > lookups (and opens)!
> > >
> > > Getting devs to optimise their code is my cross to bear I guess.
> > >
> > > But this is also a well known and common problem for large batch farms
> > > and there are some novel workarounds out there:
> > >
> > > https://guix.gnu.org/en/blog/2021/taming-the-stat-storm-with-a-loader-cache
> > > https://computing.llnl.gov/projects/spindle
> > > https://cernvm.cern.ch/fs/
> > >
> > > Coupled with our propensity for high latency (~100ms) NFS via
> > > re-export servers (for "cloud rendering"), these inefficient path
> > > lookups quickly become a killer - the application takes longer to
> > > lookup non-existent files and open files, than it does to execute to
> > > completion. We use aggressive caching (actimeo=3600,nocto,vers=3) and
> > > "preload" metadata ops (ls -l, open) on a regular basis to try and
> > > keep things in (re-export) client cache which certainly helps. It's
> > > hard to keep known (expensive) metadata worksets in memory.
> > >
> > > I've also been looking at using an overlay and hand crafting whiteout
> > > files in the upper layers to essentially block known negative lookups
> > > from hitting the lower NFS share - again, only useful and correct for
> > > read-only software shares.
> > >
> > > I wonder if Jeff Layton's directory delegations will help for
> > > (read-only) metadata heavy lookups over the WAN?
> > >
> >
> > Probably not. In order to optimize away lookups of negative dentries
> > that aren't in cache, you need to know all of the positive dentries in
> > the directory. As Trond pointed out earlier in the discussion, NFS
> > doesn't have a concept of directory "completeness", so we can't
> > reasonably do this.
> >
> > FWIW, CephFS does have such a concept and can satisfy readdir requests
> > and negative lookups out of the cache when it has complete directory
> > info.
>
> Out of interest, do directory delegations help with positive lookups
> or repeat opens? They may be less numerous in our badly behaved
> workloads, but they are still nice to optimise for latency.
>
> Can you disable "cto" for example if you have a directory delegation
> and repeatedly open the same file for reading without a network hop?
Maybe? Dir delegations don't really help with CTO, since that's all
about the file itself, not its parent directory. It might help avoid
having to revalidate the parent directory for the lookup however.
FWIW, basic, recallable directory delegations with no notifications are
pretty useless in my testing. You optimize away a few GETATTRs on the
parent directories, but those are pretty infrequent anyway -- 1 every
60s or so on directories that aren't changing much by default.
That's close to "why bother" territory, but maybe there is a case to be
made for that on high-latency links (like you mention).
Mixing in notifications may change things though:
Consider 2 clients that are both working with files in the same
directory and both hold directory delegations. client1 creates a file or
another directory in the dir. Server then pushes out a notification to
client2. client2 goes to look up the new dentry later, and finds that
it's already in cache.
That's a potential optimization, but it's pretty specific to workloads
where multiple clients are operating on the same files in the a
directory that is frequently changing.
>
> I also noticed that "nocto" can completely stop any subsequent network
> hops for opens (with a long actimeo) for NFSv3, but on NFSv4 it only
> cuts a single GETATTR before still doing an OPEN DH over the network
> each time.
>
File delegations can allow you to do an open w/o having to cross the
network. If I hold the right sort of deleg on a file, I should be able
to open it without talking to the server.
Dir delegations could help optimize away some round trips for the
lookups leading up to the open however.
> I'm probably wandering off into "disconnected clients" and AFS style
> territory now...
>
>
>
> > > On Fri, 5 Apr 2024 at 16:03, Trond Myklebust <[email protected]> wrote:
> > > >
> > > > On Fri, 2024-04-05 at 15:47 +0100, Daire Byrne wrote:
> > > > > Apologies for dragging up an old thread, but I've had to tackle
> > > > > wayward negative lookup storms again and I have obviously half
> > > > > forgotten what I learned in this thread last time (even after
> > > > > re-reading it!).
> > > > >
> > > > > Can I just ask if I understand correctly and that there was an
> > > > > intention a long time ago to be able to serve negative dentries from
> > > > > a
> > > > > "complete" READDIRPLUS result?
> > > > >
> > > > > https://www.cs.helsinki.fi/linux/linux-kernel/2002-30/0108.html
> > > > >
> > > > > So if we did a readdirplus on a directory then immediately fired
> > > > > random non existent lookups at the directory, it could be served from
> > > > > the readdirplus result? i.e. not in readdir result, then return
> > > > > ENOENT
> > > > > without needing to ask server?
> > > > >
> > > > > But that is not the case today because you can't track the
> > > > > "completeness" of a READDIRPLUS result for a directory over time (in
> > > > > page cache)? Or is it all due to needing to deal with case
> > > > > insensitive
> > > > > filesystems (which I would think effects positive lookups too)?
> > > > >
> > > > > I did try to decipher the v6.6 fs/nfs/dir.c READDIR bits but I
> > > > > quickly
> > > > > got lost...
> > > > >
> > > > > Cheers,
> > > > >
> > > > > Daire
> > > >
> > > > If the question is whether the client trusts that a READDIR call to the
> > > > server returns all the names that can be successfully looked up, then
> > > > the answer is "no".
> > > > It's not even a question of case sensitivity. There are plenty of
> > > > servers out there that will allow you to look up names that won't ever
> > > > appear in the results of a READDIR (or READDIRPLUS) call. Having a
> > > > hidden ".snapshot" directory is, for instance, a popular way to present
> > > > snapshots.
> > > >
> > > > So no, we're not ever going to implement any negative dentry cache
> > > > scheme that relies on READDIR/READDIRPLUS.
> > > > --
> > > > Trond Myklebust
> > > > Linux NFS client maintainer, Hammerspace
> > > > [email protected]
> > > >
> > > >
> > >
> >
> > --
> > Jeff Layton <[email protected]>
--
Jeff Layton <[email protected]>