2013-04-04 14:47:15

by Mkrtchyan, Tigran

[permalink] [raw]
Subject: readdir vs. getattr

Hi,

here is the story:

we have a directory with 50K (number of ) files in it.
The user does a 'ls' and I can see READDIR4. To
get the complete listing a client need to send ~380 requests.
Now user does yet another 'ls' in the same directory.
The client sends a GETATTR on directorie's FH
(actually two of GETATTRS - why?!!) and discovers that a
directory didn't change and re-uses existing listing, BUT!!!
for each file in the directory it sends a GETATTR to discover
is the file's attributes are changed. For 50K files it's a 50K requests.

I believe there is a way to make client smart and in some
situations prefer READDIR over GETATTR.

Tigran.


2013-04-04 16:15:19

by Myklebust, Trond

[permalink] [raw]
Subject: Re: readdir vs. getattr

On Thu, 2013-04-04 at 12:01 -0400, Jim Rees wrote:
> Chuck Lever wrote:
>
> > So is this a "ls -l"? Because for "ls" it shouldn't stat all the files.
>
> Default these days is ls --color
>
> No it's not, at least not on linux. From the man page:
>
> "Using color to distinguish file types is disabled both by default and with
> --color=never."

Yes, but a number of Linux distros override that in /etc/profile. For
instance all Red Hat/Fedora distros have aliases for ls that map them to
'ls --color=auto'

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com

2013-04-04 15:38:02

by Mkrtchyan, Tigran

[permalink] [raw]
Subject: Re: readdir vs. getattr

On Thu, Apr 4, 2013 at 5:15 PM, Jim Rees <[email protected]> wrote:
> Tigran Mkrtchyan wrote:
>
> we have a directory with 50K (number of ) files in it.
> The user does a 'ls' and I can see READDIR4. To
> get the complete listing a client need to send ~380 requests.
> Now user does yet another 'ls' in the same directory.
> The client sends a GETATTR on directorie's FH
> (actually two of GETATTRS - why?!!) and discovers that a
> directory didn't change and re-uses existing listing, BUT!!!
> for each file in the directory it sends a GETATTR to discover
> is the file's attributes are changed. For 50K files it's a 50K requests.
>
> So is this a "ls -l"? Because for "ls" it shouldn't stat all the files.

I believe it's 'ls -l'. Well, you probably want to say that it's ls
calling stat on each file. Nevertheless client still should re-use
cached information.

2013-04-04 15:31:58

by Chuck Lever

[permalink] [raw]
Subject: Re: readdir vs. getattr



On Apr 4, 2013, at 8:15 AM, Jim Rees <[email protected]> wrote:

> Tigran Mkrtchyan wrote:
>
> we have a directory with 50K (number of ) files in it.
> The user does a 'ls' and I can see READDIR4. To
> get the complete listing a client need to send ~380 requests.
> Now user does yet another 'ls' in the same directory.
> The client sends a GETATTR on directorie's FH
> (actually two of GETATTRS - why?!!) and discovers that a
> directory didn't change and re-uses existing listing, BUT!!!
> for each file in the directory it sends a GETATTR to discover
> is the file's attributes are changed. For 50K files it's a 50K requests.
>
> So is this a "ls -l"? Because for "ls" it shouldn't stat all the files.

Default these days is ls --color


> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2013-04-04 18:28:46

by Mkrtchyan, Tigran

[permalink] [raw]
Subject: Re: readdir vs. getattr

Ok, by tweaking the values of acregmin and acregmax I got the expected
behaviour.

What about this double GETATTR on directory itself?

Tigran.

On Thu, Apr 4, 2013 at 6:35 PM, Jim Rees <[email protected]> wrote:
> Myklebust, Trond wrote:
>
> On Thu, 2013-04-04 at 12:01 -0400, Jim Rees wrote:
> > Chuck Lever wrote:
> >
> > > So is this a "ls -l"? Because for "ls" it shouldn't stat all the files.
> >
> > Default these days is ls --color
> >
> > No it's not, at least not on linux. From the man page:
> >
> > "Using color to distinguish file types is disabled both by default and with
> > --color=never."
>
> Yes, but a number of Linux distros override that in /etc/profile. For
> instance all Red Hat/Fedora distros have aliases for ls that map them to
> 'ls --color=auto'
>
> I was actually just trying to figure out whether the application (ls) was
> doing all those file stats, or whether readdir was somehow doing them by
> mistake. Sounds like it's the application.

2013-04-04 16:35:33

by Jim Rees

[permalink] [raw]
Subject: Re: readdir vs. getattr

Myklebust, Trond wrote:

On Thu, 2013-04-04 at 12:01 -0400, Jim Rees wrote:
> Chuck Lever wrote:
>
> > So is this a "ls -l"? Because for "ls" it shouldn't stat all the files.
>
> Default these days is ls --color
>
> No it's not, at least not on linux. From the man page:
>
> "Using color to distinguish file types is disabled both by default and with
> --color=never."

Yes, but a number of Linux distros override that in /etc/profile. For
instance all Red Hat/Fedora distros have aliases for ls that map them to
'ls --color=auto'

I was actually just trying to figure out whether the application (ls) was
doing all those file stats, or whether readdir was somehow doing them by
mistake. Sounds like it's the application.

2013-04-04 15:15:12

by Jim Rees

[permalink] [raw]
Subject: Re: readdir vs. getattr

Tigran Mkrtchyan wrote:

we have a directory with 50K (number of ) files in it.
The user does a 'ls' and I can see READDIR4. To
get the complete listing a client need to send ~380 requests.
Now user does yet another 'ls' in the same directory.
The client sends a GETATTR on directorie's FH
(actually two of GETATTRS - why?!!) and discovers that a
directory didn't change and re-uses existing listing, BUT!!!
for each file in the directory it sends a GETATTR to discover
is the file's attributes are changed. For 50K files it's a 50K requests.

So is this a "ls -l"? Because for "ls" it shouldn't stat all the files.

2013-04-04 15:48:04

by Myklebust, Trond

[permalink] [raw]
Subject: Re: readdir vs. getattr

On Thu, 2013-04-04 at 17:38 +0200, Tigran Mkrtchyan wrote:
> On Thu, Apr 4, 2013 at 5:15 PM, Jim Rees <[email protected]> wrote:
> > Tigran Mkrtchyan wrote:
> >
> > we have a directory with 50K (number of ) files in it.
> > The user does a 'ls' and I can see READDIR4. To
> > get the complete listing a client need to send ~380 requests.
> > Now user does yet another 'ls' in the same directory.
> > The client sends a GETATTR on directorie's FH
> > (actually two of GETATTRS - why?!!) and discovers that a
> > directory didn't change and re-uses existing listing, BUT!!!
> > for each file in the directory it sends a GETATTR to discover
> > is the file's attributes are changed. For 50K files it's a 50K requests.
> >
> > So is this a "ls -l"? Because for "ls" it shouldn't stat all the files.
>
> I believe it's 'ls -l'. Well, you probably want to say that it's ls
> calling stat on each file. Nevertheless client still should re-use
> cached information.

What makes you think that it isn't using cached information? I'm
guessing you just need to adjust the values of acregmin and acregmax
upwards.

That said, we might be able to be a little more intelligent about how we
use the NFS_INO_ADVISE_RDPLUS hint, and have it blow out the readdir
cache when we find ourselves doing lots of lookup revalidates.

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com

2013-04-04 15:52:53

by Mkrtchyan, Tigran

[permalink] [raw]
Subject: Re: readdir vs. getattr

On Thu, Apr 4, 2013 at 5:48 PM, Myklebust, Trond
<[email protected]> wrote:
> On Thu, 2013-04-04 at 17:38 +0200, Tigran Mkrtchyan wrote:
>> On Thu, Apr 4, 2013 at 5:15 PM, Jim Rees <[email protected]> wrote:
>> > Tigran Mkrtchyan wrote:
>> >
>> > we have a directory with 50K (number of ) files in it.
>> > The user does a 'ls' and I can see READDIR4. To
>> > get the complete listing a client need to send ~380 requests.
>> > Now user does yet another 'ls' in the same directory.
>> > The client sends a GETATTR on directorie's FH
>> > (actually two of GETATTRS - why?!!) and discovers that a
>> > directory didn't change and re-uses existing listing, BUT!!!
>> > for each file in the directory it sends a GETATTR to discover
>> > is the file's attributes are changed. For 50K files it's a 50K requests.
>> >
>> > So is this a "ls -l"? Because for "ls" it shouldn't stat all the files.
>>
>> I believe it's 'ls -l'. Well, you probably want to say that it's ls
>> calling stat on each file. Nevertheless client still should re-use
>> cached information.
>
> What makes you think that it isn't using cached information? I'm
well, wireshark tells me that. either timeout is too short or cache
size is too small.
It's quite strange behavior when fist 'ls' is much faster than second one.
I will play with the timeouts and will let you know.

Tigran.

> guessing you just need to adjust the values of acregmin and acregmax
> upwards.
>
> That said, we might be able to be a little more intelligent about how we
> use the NFS_INO_ADVISE_RDPLUS hint, and have it blow out the readdir
> cache when we find ourselves doing lots of lookup revalidates.
>
> --
> Trond Myklebust
> Linux NFS client maintainer
>
> NetApp
> [email protected]
> http://www.netapp.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2013-04-04 16:01:50

by Jim Rees

[permalink] [raw]
Subject: Re: readdir vs. getattr

Chuck Lever wrote:

> So is this a "ls -l"? Because for "ls" it shouldn't stat all the files.

Default these days is ls --color

No it's not, at least not on linux. From the man page:

"Using color to distinguish file types is disabled both by default and with
--color=never."

2014-01-29 07:25:44

by NeilBrown

[permalink] [raw]
Subject: Re: readdir vs. getattr


(resent with correct address for Trond)

On Thu, 4 Apr 2013 15:48:01 +0000 "Myklebust, Trond"
<[email protected]> wrote:

> On Thu, 2013-04-04 at 17:38 +0200, Tigran Mkrtchyan wrote:
> > On Thu, Apr 4, 2013 at 5:15 PM, Jim Rees <[email protected]> wrote:
> > > Tigran Mkrtchyan wrote:
> > >
> > > we have a directory with 50K (number of ) files in it.
> > > The user does a 'ls' and I can see READDIR4. To
> > > get the complete listing a client need to send ~380 requests.
> > > Now user does yet another 'ls' in the same directory.
> > > The client sends a GETATTR on directorie's FH
> > > (actually two of GETATTRS - why?!!) and discovers that a
> > > directory didn't change and re-uses existing listing, BUT!!!
> > > for each file in the directory it sends a GETATTR to discover
> > > is the file's attributes are changed. For 50K files it's a 50K requests.
> > >
> > > So is this a "ls -l"? Because for "ls" it shouldn't stat all the files.
> >
> > I believe it's 'ls -l'. Well, you probably want to say that it's ls
> > calling stat on each file. Nevertheless client still should re-use
> > cached information.
>
> What makes you think that it isn't using cached information? I'm
> guessing you just need to adjust the values of acregmin and acregmax
> upwards.
>
> That said, we might be able to be a little more intelligent about how we
> use the NFS_INO_ADVISE_RDPLUS hint, and have it blow out the readdir
> cache when we find ourselves doing lots of lookup revalidates.
>

Pop.

I recently had a customer raise exactly this issue with me, so I've been
looking into it.

I don't think it can really be fixed by adjusting acregmin/acregmax.

Once you have done READDIRPLUS, you have the directory contents in the
page-cache and will continue to use those contents until they drop out of the
cache, or until the directory changes in some way.

Meanwhile the stat information from the READDIRPLUS was used to create/update
info in the inode table and that will eventually become stale. As soon as it
becomes stale you get a GETATTR storm on the next "ls -l" instead of a few
READDIRPLUS calls. By increasing acregmin you can delay that storm, but you
can put it off forever.

Fixing this is tricky. We really want to know on the first nfs_readdir() call
whether it will be followed by lookups or not. If it won't, then using the
cached data is fine. If it will, then we really want a READDIRPLUS.

The only way I can see to address this is for nfs_advise_use_readdirplus (or
code near where that is called) to notice that a readdir is currently active
on the same directory and is using cached data, and to re-use that 'open' of
the directory to do a readdirplus. This would update the stat info for the
current inode and all the other inodes for the directory.

This is fairly horrible. The 'struct file' used by the readdir would need to
be stored somewhere so that nfs_lookup_revalidate can use it (if process
permissions allow). If multiple processes were doing a readdir at the same
time .... I would certainly get confused.

However I cannot think of anything else that would even come close to being a
real solution.

Any solution that just modified nfs_readdir() could only avoid the GETATTR
storm by largely ignoring the cached information and (almost) always calling
READDIRPLUS.

Does anyone have any other ideas? Or do you think it is worth trying to
implement the above "horrible" idea. I did start working on it, but only got
far enough to understand the full extend of what is required.

Thanks,
NeilBrown


Attachments:
signature.asc (828.00 B)

2014-01-29 12:18:45

by Jeff Layton

[permalink] [raw]
Subject: Re: readdir vs. getattr

On Wed, 29 Jan 2014 10:21:43 +0100 (CET)
"Mkrtchyan, Tigran" <[email protected]> wrote:

> (without knowing kernel details)
>
> What about some stat counter and trigger readdirplus instead of next getattr if
> some threshold are reached. Let say directory has more than 8000 entries and
> getattr was called on 10% of the files. Of course you have to watch this per
> process.
>
> Tigran.
>

That might be the only possible solution, but as always with these
sorts of heuristics, it's bound to help some workloads and hurt others.

I'm not sure you'd need to watch that per-process. The pagecache data is
shared after all. The problem though is that you'd need to keep track
of which (or how many different) entries in the dir have gotten a
GETATTR since the last READDIRPLUS.

IOW, if someone is repeatedly hammering a single entry in a directory
with stat() calls that need the atime, you probably don't want to force
a READDIRPLUS on the next readdir() call. If however, they're stat'ing
each entry in the directory in turn for a "ls -l" type workload then
you probably do want to dump the cache and just go for a READDIRPLUS.

The real way to fix this is probably for someone to pick up and drive
the xstat()/readdirplus() work. If 'ls -l' used a readdirplus() syscall
to do its bidding then you'd have a pretty good idea of what you'd need
to do. That won't help in the short term though...

> ----- Original Message -----
> > From: "NeilBrown" <[email protected]>
> > To: "Trond Myklebust" <[email protected]>
> > Cc: "tigran mkrtchyan" <[email protected]>, "Jim Rees" <[email protected]>, "linux-nfs"
> > <[email protected]>
> > Sent: Wednesday, January 29, 2014 8:25:32 AM
> > Subject: Re: readdir vs. getattr
> >
> >
> > (resent with correct address for Trond)
> >
> > On Thu, 4 Apr 2013 15:48:01 +0000 "Myklebust, Trond"
> > <[email protected]> wrote:
> >
> > > On Thu, 2013-04-04 at 17:38 +0200, Tigran Mkrtchyan wrote:
> > > > On Thu, Apr 4, 2013 at 5:15 PM, Jim Rees <[email protected]> wrote:
> > > > > Tigran Mkrtchyan wrote:
> > > > >
> > > > > we have a directory with 50K (number of ) files in it.
> > > > > The user does a 'ls' and I can see READDIR4. To
> > > > > get the complete listing a client need to send ~380 requests.
> > > > > Now user does yet another 'ls' in the same directory.
> > > > > The client sends a GETATTR on directorie's FH
> > > > > (actually two of GETATTRS - why?!!) and discovers that a
> > > > > directory didn't change and re-uses existing listing, BUT!!!
> > > > > for each file in the directory it sends a GETATTR to discover
> > > > > is the file's attributes are changed. For 50K files it's a 50K
> > > > > requests.
> > > > >
> > > > > So is this a "ls -l"? Because for "ls" it shouldn't stat all the files.
> > > >
> > > > I believe it's 'ls -l'. Well, you probably want to say that it's ls
> > > > calling stat on each file. Nevertheless client still should re-use
> > > > cached information.
> > >
> > > What makes you think that it isn't using cached information? I'm
> > > guessing you just need to adjust the values of acregmin and acregmax
> > > upwards.
> > >
> > > That said, we might be able to be a little more intelligent about how we
> > > use the NFS_INO_ADVISE_RDPLUS hint, and have it blow out the readdir
> > > cache when we find ourselves doing lots of lookup revalidates.
> > >
> >
> > Pop.
> >
> > I recently had a customer raise exactly this issue with me, so I've been
> > looking into it.
> >
> > I don't think it can really be fixed by adjusting acregmin/acregmax.
> >
> > Once you have done READDIRPLUS, you have the directory contents in the
> > page-cache and will continue to use those contents until they drop out of the
> > cache, or until the directory changes in some way.
> >
> > Meanwhile the stat information from the READDIRPLUS was used to create/update
> > info in the inode table and that will eventually become stale. As soon as it
> > becomes stale you get a GETATTR storm on the next "ls -l" instead of a few
> > READDIRPLUS calls. By increasing acregmin you can delay that storm, but you
> > can put it off forever.
> >
> > Fixing this is tricky. We really want to know on the first nfs_readdir()
> > call
> > whether it will be followed by lookups or not. If it won't, then using the
> > cached data is fine. If it will, then we really want a READDIRPLUS.
> >
> > The only way I can see to address this is for nfs_advise_use_readdirplus (or
> > code near where that is called) to notice that a readdir is currently active
> > on the same directory and is using cached data, and to re-use that 'open' of
> > the directory to do a readdirplus. This would update the stat info for the
> > current inode and all the other inodes for the directory.
> >
> > This is fairly horrible. The 'struct file' used by the readdir would need to
> > be stored somewhere so that nfs_lookup_revalidate can use it (if process
> > permissions allow). If multiple processes were doing a readdir at the same
> > time .... I would certainly get confused.
> >
> > However I cannot think of anything else that would even come close to being a
> > real solution.
> >
> > Any solution that just modified nfs_readdir() could only avoid the GETATTR
> > storm by largely ignoring the cached information and (almost) always calling
> > READDIRPLUS.
> >
> > Does anyone have any other ideas? Or do you think it is worth trying to
> > implement the above "horrible" idea. I did start working on it, but only got
> > far enough to understand the full extend of what is required.
> >
> > Thanks,
> > NeilBrown
> >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


--
Jeff Layton <[email protected]>

2014-01-29 09:21:45

by Mkrtchyan, Tigran

[permalink] [raw]
Subject: Re: readdir vs. getattr

(without knowing kernel details)

What about some stat counter and trigger readdirplus instead of next getattr if
some threshold are reached. Let say directory has more than 8000 entries and
getattr was called on 10% of the files. Of course you have to watch this per
process.

Tigran.

----- Original Message -----
> From: "NeilBrown" <[email protected]>
> To: "Trond Myklebust" <[email protected]>
> Cc: "tigran mkrtchyan" <[email protected]>, "Jim Rees" <[email protected]>, "linux-nfs"
> <[email protected]>
> Sent: Wednesday, January 29, 2014 8:25:32 AM
> Subject: Re: readdir vs. getattr
>
>
> (resent with correct address for Trond)
>
> On Thu, 4 Apr 2013 15:48:01 +0000 "Myklebust, Trond"
> <[email protected]> wrote:
>
> > On Thu, 2013-04-04 at 17:38 +0200, Tigran Mkrtchyan wrote:
> > > On Thu, Apr 4, 2013 at 5:15 PM, Jim Rees <[email protected]> wrote:
> > > > Tigran Mkrtchyan wrote:
> > > >
> > > > we have a directory with 50K (number of ) files in it.
> > > > The user does a 'ls' and I can see READDIR4. To
> > > > get the complete listing a client need to send ~380 requests.
> > > > Now user does yet another 'ls' in the same directory.
> > > > The client sends a GETATTR on directorie's FH
> > > > (actually two of GETATTRS - why?!!) and discovers that a
> > > > directory didn't change and re-uses existing listing, BUT!!!
> > > > for each file in the directory it sends a GETATTR to discover
> > > > is the file's attributes are changed. For 50K files it's a 50K
> > > > requests.
> > > >
> > > > So is this a "ls -l"? Because for "ls" it shouldn't stat all the files.
> > >
> > > I believe it's 'ls -l'. Well, you probably want to say that it's ls
> > > calling stat on each file. Nevertheless client still should re-use
> > > cached information.
> >
> > What makes you think that it isn't using cached information? I'm
> > guessing you just need to adjust the values of acregmin and acregmax
> > upwards.
> >
> > That said, we might be able to be a little more intelligent about how we
> > use the NFS_INO_ADVISE_RDPLUS hint, and have it blow out the readdir
> > cache when we find ourselves doing lots of lookup revalidates.
> >
>
> Pop.
>
> I recently had a customer raise exactly this issue with me, so I've been
> looking into it.
>
> I don't think it can really be fixed by adjusting acregmin/acregmax.
>
> Once you have done READDIRPLUS, you have the directory contents in the
> page-cache and will continue to use those contents until they drop out of the
> cache, or until the directory changes in some way.
>
> Meanwhile the stat information from the READDIRPLUS was used to create/update
> info in the inode table and that will eventually become stale. As soon as it
> becomes stale you get a GETATTR storm on the next "ls -l" instead of a few
> READDIRPLUS calls. By increasing acregmin you can delay that storm, but you
> can put it off forever.
>
> Fixing this is tricky. We really want to know on the first nfs_readdir()
> call
> whether it will be followed by lookups or not. If it won't, then using the
> cached data is fine. If it will, then we really want a READDIRPLUS.
>
> The only way I can see to address this is for nfs_advise_use_readdirplus (or
> code near where that is called) to notice that a readdir is currently active
> on the same directory and is using cached data, and to re-use that 'open' of
> the directory to do a readdirplus. This would update the stat info for the
> current inode and all the other inodes for the directory.
>
> This is fairly horrible. The 'struct file' used by the readdir would need to
> be stored somewhere so that nfs_lookup_revalidate can use it (if process
> permissions allow). If multiple processes were doing a readdir at the same
> time .... I would certainly get confused.
>
> However I cannot think of anything else that would even come close to being a
> real solution.
>
> Any solution that just modified nfs_readdir() could only avoid the GETATTR
> storm by largely ignoring the cached information and (almost) always calling
> READDIRPLUS.
>
> Does anyone have any other ideas? Or do you think it is worth trying to
> implement the above "horrible" idea. I did start working on it, but only got
> far enough to understand the full extend of what is required.
>
> Thanks,
> NeilBrown
>

2014-01-29 07:10:21

by NeilBrown

[permalink] [raw]
Subject: Re: readdir vs. getattr

On Thu, 4 Apr 2013 15:48:01 +0000 "Myklebust, Trond"
<[email protected]> wrote:

> On Thu, 2013-04-04 at 17:38 +0200, Tigran Mkrtchyan wrote:
> > On Thu, Apr 4, 2013 at 5:15 PM, Jim Rees <[email protected]> wrote:
> > > Tigran Mkrtchyan wrote:
> > >
> > > we have a directory with 50K (number of ) files in it.
> > > The user does a 'ls' and I can see READDIR4. To
> > > get the complete listing a client need to send ~380 requests.
> > > Now user does yet another 'ls' in the same directory.
> > > The client sends a GETATTR on directorie's FH
> > > (actually two of GETATTRS - why?!!) and discovers that a
> > > directory didn't change and re-uses existing listing, BUT!!!
> > > for each file in the directory it sends a GETATTR to discover
> > > is the file's attributes are changed. For 50K files it's a 50K requests.
> > >
> > > So is this a "ls -l"? Because for "ls" it shouldn't stat all the files.
> >
> > I believe it's 'ls -l'. Well, you probably want to say that it's ls
> > calling stat on each file. Nevertheless client still should re-use
> > cached information.
>
> What makes you think that it isn't using cached information? I'm
> guessing you just need to adjust the values of acregmin and acregmax
> upwards.
>
> That said, we might be able to be a little more intelligent about how we
> use the NFS_INO_ADVISE_RDPLUS hint, and have it blow out the readdir
> cache when we find ourselves doing lots of lookup revalidates.
>

Pop.

I recently had a customer raise exactly this issue with me, so I've been
looking into it.

I don't think it can really be fixed by adjusting acregmin/acregmax.

Once you have done READDIRPLUS, you have the directory contents in the
page-cache and will continue to use those contents until they drop out of the
cache, or until the directory changes in some way.

Meanwhile the stat information from the READDIRPLUS was used to create/update
info in the inode table and that will eventually become stale. As soon as it
becomes stale you get a GETATTR storm on the next "ls -l" instead of a few
READDIRPLUS calls. By increasing acregmin you can delay that storm, but you
can put it off forever.

Fixing this is tricky. We really want to know on the first nfs_readdir() call
whether it will be followed by lookups or not. If it won't, then using the
cached data is fine. If it will, then we really want a READDIRPLUS.

The only way I can see to address this is for nfs_advise_use_readdirplus (or
code near where that is called) to notice that a readdir is currently active
on the same directory and is using cached data, and to re-use that 'open' of
the directory to do a readdirplus. This would update the stat info for the
current inode and all the other inodes for the directory.

This is fairly horrible. The 'struct file' used by the readdir would need to
be stored somewhere so that nfs_lookup_revalidate can use it (if process
permissions allow). If multiple processes were doing a readdir at the same
time .... I would certainly get confused.

However I cannot think of anything else that would even come close to being a
real solution.

Any solution that just modified nfs_readdir() could only avoid the GETATTR
storm by largely ignoring the cached information and (almost) always calling
READDIRPLUS.

Does anyone have any other ideas? Or do you think it is worth trying to
implement the above "horrible" idea. I did start working on it, but only got
far enough to understand the full extend of what is required.

Thanks,
NeilBrown


Attachments:
signature.asc (828.00 B)

2014-02-10 00:16:34

by NeilBrown

[permalink] [raw]
Subject: Re: readdir vs. getattr

On Fri, 07 Feb 2014 17:08:39 -0500 Trond Myklebust
<[email protected]> wrote:

> On Fri, 2014-02-07 at 14:47 -0500, Trond Myklebust wrote:
> > On Feb 6, 2014, at 23:30, NeilBrown <[email protected]> wrote:
> > > In my test case there is a large directory on the server which doesn't change.
> > > The client sends lots of requests to see if it has changed, but it hasn't.
> > > (Tested with the labeled-NFS-regression fix applied)
> > >
> > > Your patch only changes behaviour if nfs_force_lookup_revalidate is called,
> > > and that is only called when NFS_ATTR_FATTR_CHANGE is set, or when an NFSv4
> > > client modifies the directory. Neither of those are happening.
> >
> > Right, but I really do not know how to fix that case.
> >
> > I read your patch as saying: "If we're revalidating a dentry and we find that the directory has not changed but that the dentry’s inode needs revalidation, then invalidate that directory's attribute, acl and readdir cached data. Then drop the dentry”. I have a hard time seeing how that can be beneficial.
> >
> > If we were to modify it, to only invalidate the readdir cached data, then that might improve matters, but it is still hard for me to see how a single inode needing revalidation can justify a full re-read of the parent directory in the general case. You may end up rereading thousands of readdir entries from the server just because someone did a path lookup.
>
> OK. How about something like the following then? The heuristic
> specifically looks for the 'ls -l' fingerprint, and uses that to clear
> the readdir cache if and when the server supports readdirplus.
>
> 8<--------------------------------------------------------------------
> >From e2fdc035d223b6a5bffa55583818ac0aa22b996c Mon Sep 17 00:00:00 2001
> From: Trond Myklebust <[email protected]>
> Date: Fri, 7 Feb 2014 17:02:08 -0500
> Subject: [PATCH] NFS: Be more aggressive in using readdirplus for 'ls -l'
> situations
>
> Try to detect 'ls -l' by having nfs_getattr() look at whether or not
> there is an opendir() file descriptor for the parent directory.
> If so, then assume that we want to force use of readdirplus in order
> to avoid the multiple GETATTR calls over the wire.
>
> Signed-off-by: Trond Myklebust <[email protected]>
> ---
> fs/nfs/dir.c | 42 ++++++++++++++++++++++++++++++++++++++----
> fs/nfs/inode.c | 34 +++++++++++++++++++++++++++-------
> fs/nfs/internal.h | 1 +
> include/linux/nfs_fs.h | 1 +
> 4 files changed, 67 insertions(+), 11 deletions(-)
>
> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> index be38b573495a..c8e48c26418b 100644
> --- a/fs/nfs/dir.c
> +++ b/fs/nfs/dir.c
> @@ -69,21 +69,28 @@ const struct address_space_operations nfs_dir_aops = {
>
> static struct nfs_open_dir_context *alloc_nfs_open_dir_context(struct inode *dir, struct rpc_cred *cred)
> {
> + struct nfs_inode *nfsi = NFS_I(dir);
> struct nfs_open_dir_context *ctx;
> ctx = kmalloc(sizeof(*ctx), GFP_KERNEL);
> if (ctx != NULL) {
> ctx->duped = 0;
> - ctx->attr_gencount = NFS_I(dir)->attr_gencount;
> + ctx->attr_gencount = nfsi->attr_gencount;
> ctx->dir_cookie = 0;
> ctx->dup_cookie = 0;
> ctx->cred = get_rpccred(cred);
> + spin_lock(&dir->i_lock);
> + list_add(&ctx->list, &nfsi->open_files);
> + spin_unlock(&dir->i_lock);
> return ctx;
> }
> return ERR_PTR(-ENOMEM);
> }
>
> -static void put_nfs_open_dir_context(struct nfs_open_dir_context *ctx)
> +static void put_nfs_open_dir_context(struct inode *dir, struct nfs_open_dir_context *ctx)
> {
> + spin_lock(&dir->i_lock);
> + list_del(&ctx->list);
> + spin_unlock(&dir->i_lock);
> put_rpccred(ctx->cred);
> kfree(ctx);
> }
> @@ -126,7 +133,7 @@ out:
> static int
> nfs_closedir(struct inode *inode, struct file *filp)
> {
> - put_nfs_open_dir_context(filp->private_data);
> + put_nfs_open_dir_context(filp->f_path.dentry->d_inode, filp->private_data);
> return 0;
> }
>
> @@ -437,6 +444,22 @@ void nfs_advise_use_readdirplus(struct inode *dir)
> set_bit(NFS_INO_ADVISE_RDPLUS, &NFS_I(dir)->flags);
> }
>
> +/*
> + * This function is mainly for use by nfs_getattr().
> + *
> + * If this is an 'ls -l', we want to force use of readdirplus.
> + * Do this by checking if there is an active file descriptor
> + * and calling nfs_advise_use_readdirplus, then forcing a
> + * cache flush.
> + */
> +void nfs_force_use_readdirplus(struct inode *dir)
> +{
> + if (!list_empty(&NFS_I(dir)->open_files)) {
> + nfs_advise_use_readdirplus(dir);
> + nfs_zap_mapping(dir, dir->i_mapping);
> + }
> +}
> +
> static
> void nfs_prime_dcache(struct dentry *parent, struct nfs_entry *entry)
> {
> @@ -815,6 +838,17 @@ int uncached_readdir(nfs_readdir_descriptor_t *desc)
> goto out;
> }
>
> +static bool nfs_dir_mapping_need_revalidate(struct inode *dir)
> +{
> + struct nfs_inode *nfsi = NFS_I(dir);
> +
> + if (nfs_attribute_cache_expired(dir))
> + return true;
> + if (nfsi->cache_validity & NFS_INO_INVALID_DATA)
> + return true;
> + return false;
> +}
> +
> /* The file offset position represents the dirent entry number. A
> last cookie cache takes care of the common case of reading the
> whole directory.
> @@ -847,7 +881,7 @@ static int nfs_readdir(struct file *file, struct dir_context *ctx)
> desc->plus = nfs_use_readdirplus(inode, ctx) ? 1 : 0;
>
> nfs_block_sillyrename(dentry);
> - if (ctx->pos == 0 || nfs_attribute_cache_expired(inode))
> + if (ctx->pos == 0 || nfs_dir_mapping_need_revalidate(inode))
> res = nfs_revalidate_mapping(inode, file->f_mapping);
> if (res < 0)
> goto out;
> diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
> index 4636b828e957..d1407380338a 100644
> --- a/fs/nfs/inode.c
> +++ b/fs/nfs/inode.c
> @@ -594,6 +594,25 @@ void nfs_setattr_update_inode(struct inode *inode, struct iattr *attr)
> }
> EXPORT_SYMBOL_GPL(nfs_setattr_update_inode);
>
> +static void nfs_request_parent_use_readdirplus(struct dentry *dentry)
> +{
> + struct dentry *parent;
> +
> + parent = dget_parent(dentry);
> + nfs_force_use_readdirplus(parent->d_inode);
> + dput(parent);
> +}
> +
> +static bool nfs_need_revalidate_inode(struct inode *inode)
> +{
> + if (NFS_I(inode)->cache_validity &
> + (NFS_INO_INVALID_ATTR|NFS_INO_INVALID_LABEL))
> + return true;
> + if (nfs_attribute_cache_expired(inode))
> + return true;
> + return false;
> +}
> +
> int nfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat)
> {
> struct inode *inode = dentry->d_inode;
> @@ -622,10 +641,13 @@ int nfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat)
> ((mnt->mnt_flags & MNT_NODIRATIME) && S_ISDIR(inode->i_mode)))
> need_atime = 0;
>
> - if (need_atime)
> - err = __nfs_revalidate_inode(NFS_SERVER(inode), inode);
> - else
> - err = nfs_revalidate_inode(NFS_SERVER(inode), inode);
> + if (need_atime || nfs_need_revalidate_inode(inode)) {
> + struct nfs_server *server = NFS_SERVER(inode);
> +
> + if (server->caps & NFS_CAP_READDIRPLUS)
> + nfs_request_parent_use_readdirplus(dentry);
> + err = __nfs_revalidate_inode(server, inode);
> + }
> if (!err) {
> generic_fillattr(inode, stat);
> stat->ino = nfs_compat_user_ino64(NFS_FILEID(inode));
> @@ -967,9 +989,7 @@ int nfs_attribute_cache_expired(struct inode *inode)
> */
> int nfs_revalidate_inode(struct nfs_server *server, struct inode *inode)
> {
> - if (!(NFS_I(inode)->cache_validity &
> - (NFS_INO_INVALID_ATTR|NFS_INO_INVALID_LABEL))
> - && !nfs_attribute_cache_expired(inode))
> + if (!nfs_need_revalidate_inode(inode))
> return NFS_STALE(inode) ? -ESTALE : 0;
> return __nfs_revalidate_inode(server, inode);
> }
> diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
> index fafdddac8271..7f7c476d0c2c 100644
> --- a/fs/nfs/internal.h
> +++ b/fs/nfs/internal.h
> @@ -300,6 +300,7 @@ extern struct nfs_client *nfs_init_client(struct nfs_client *clp,
> const char *ip_addr);
>
> /* dir.c */
> +extern void nfs_force_use_readdirplus(struct inode *dir);
> extern unsigned long nfs_access_cache_count(struct shrinker *shrink,
> struct shrink_control *sc);
> extern unsigned long nfs_access_cache_scan(struct shrinker *shrink,
> diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
> index 9123ce3fc6a5..2c22722b406f 100644
> --- a/include/linux/nfs_fs.h
> +++ b/include/linux/nfs_fs.h
> @@ -92,6 +92,7 @@ struct nfs_open_context {
> };
>
> struct nfs_open_dir_context {
> + struct list_head list;
> struct rpc_cred *cred;
> unsigned long attr_gencount;
> __u64 dir_cookie;


Thanks Trond!
Yes, that seems to encode exactly what I was aiming for, but does a better
job of it. In my testing it does exactly what I hoped for.

I played around a bit and added:

@@ -754,6 +777,10 @@ int nfs_do_filldir(nfs_readdir_descriptor_t *desc)
*desc->dir_cookie = array->last_cookie;
if (ctx->duped != 0)
ctx->duped = 1;
+ if (desc->ctx->pos == 3) {
+ desc->eof = 1;
+ break;
+ }
}
if (array->eof_index >= 0)
desc->eof = 1;


So that the first getdents call only returns one entry (other than '.' and
'..'), so we only do one GETATTR before we start making READDIR calls.
I suspect this is a bad idea in general, but thought I would mention it. It
results in the second "ls -l" call taking just as long as the first. However
it could have other negative effects.

Tested-by: NeilBrown <[email protected]>

Thanks,
NeilBrown


Attachments:
signature.asc (828.00 B)

2014-02-07 22:08:42

by Trond Myklebust

[permalink] [raw]
Subject: Re: readdir vs. getattr

On Fri, 2014-02-07 at 14:47 -0500, Trond Myklebust wrote:
> On Feb 6, 2014, at 23:30, NeilBrown <[email protected]> wrote:
> > In my test case there is a large directory on the server which doesn't change.
> > The client sends lots of requests to see if it has changed, but it hasn't.
> > (Tested with the labeled-NFS-regression fix applied)
> >
> > Your patch only changes behaviour if nfs_force_lookup_revalidate is called,
> > and that is only called when NFS_ATTR_FATTR_CHANGE is set, or when an NFSv4
> > client modifies the directory. Neither of those are happening.
>
> Right, but I really do not know how to fix that case.
>
> I read your patch as saying: "If we're revalidating a dentry and we find that the directory has not changed but that the dentry’s inode needs revalidation, then invalidate that directory's attribute, acl and readdir cached data. Then drop the dentry”. I have a hard time seeing how that can be beneficial.
>
> If we were to modify it, to only invalidate the readdir cached data, then that might improve matters, but it is still hard for me to see how a single inode needing revalidation can justify a full re-read of the parent directory in the general case. You may end up rereading thousands of readdir entries from the server just because someone did a path lookup.

OK. How about something like the following then? The heuristic
specifically looks for the 'ls -l' fingerprint, and uses that to clear
the readdir cache if and when the server supports readdirplus.

8<--------------------------------------------------------------------
>From e2fdc035d223b6a5bffa55583818ac0aa22b996c Mon Sep 17 00:00:00 2001
From: Trond Myklebust <[email protected]>
Date: Fri, 7 Feb 2014 17:02:08 -0500
Subject: [PATCH] NFS: Be more aggressive in using readdirplus for 'ls -l'
situations

Try to detect 'ls -l' by having nfs_getattr() look at whether or not
there is an opendir() file descriptor for the parent directory.
If so, then assume that we want to force use of readdirplus in order
to avoid the multiple GETATTR calls over the wire.

Signed-off-by: Trond Myklebust <[email protected]>
---
fs/nfs/dir.c | 42 ++++++++++++++++++++++++++++++++++++++----
fs/nfs/inode.c | 34 +++++++++++++++++++++++++++-------
fs/nfs/internal.h | 1 +
include/linux/nfs_fs.h | 1 +
4 files changed, 67 insertions(+), 11 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index be38b573495a..c8e48c26418b 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -69,21 +69,28 @@ const struct address_space_operations nfs_dir_aops = {

static struct nfs_open_dir_context *alloc_nfs_open_dir_context(struct inode *dir, struct rpc_cred *cred)
{
+ struct nfs_inode *nfsi = NFS_I(dir);
struct nfs_open_dir_context *ctx;
ctx = kmalloc(sizeof(*ctx), GFP_KERNEL);
if (ctx != NULL) {
ctx->duped = 0;
- ctx->attr_gencount = NFS_I(dir)->attr_gencount;
+ ctx->attr_gencount = nfsi->attr_gencount;
ctx->dir_cookie = 0;
ctx->dup_cookie = 0;
ctx->cred = get_rpccred(cred);
+ spin_lock(&dir->i_lock);
+ list_add(&ctx->list, &nfsi->open_files);
+ spin_unlock(&dir->i_lock);
return ctx;
}
return ERR_PTR(-ENOMEM);
}

-static void put_nfs_open_dir_context(struct nfs_open_dir_context *ctx)
+static void put_nfs_open_dir_context(struct inode *dir, struct nfs_open_dir_context *ctx)
{
+ spin_lock(&dir->i_lock);
+ list_del(&ctx->list);
+ spin_unlock(&dir->i_lock);
put_rpccred(ctx->cred);
kfree(ctx);
}
@@ -126,7 +133,7 @@ out:
static int
nfs_closedir(struct inode *inode, struct file *filp)
{
- put_nfs_open_dir_context(filp->private_data);
+ put_nfs_open_dir_context(filp->f_path.dentry->d_inode, filp->private_data);
return 0;
}

@@ -437,6 +444,22 @@ void nfs_advise_use_readdirplus(struct inode *dir)
set_bit(NFS_INO_ADVISE_RDPLUS, &NFS_I(dir)->flags);
}

+/*
+ * This function is mainly for use by nfs_getattr().
+ *
+ * If this is an 'ls -l', we want to force use of readdirplus.
+ * Do this by checking if there is an active file descriptor
+ * and calling nfs_advise_use_readdirplus, then forcing a
+ * cache flush.
+ */
+void nfs_force_use_readdirplus(struct inode *dir)
+{
+ if (!list_empty(&NFS_I(dir)->open_files)) {
+ nfs_advise_use_readdirplus(dir);
+ nfs_zap_mapping(dir, dir->i_mapping);
+ }
+}
+
static
void nfs_prime_dcache(struct dentry *parent, struct nfs_entry *entry)
{
@@ -815,6 +838,17 @@ int uncached_readdir(nfs_readdir_descriptor_t *desc)
goto out;
}

+static bool nfs_dir_mapping_need_revalidate(struct inode *dir)
+{
+ struct nfs_inode *nfsi = NFS_I(dir);
+
+ if (nfs_attribute_cache_expired(dir))
+ return true;
+ if (nfsi->cache_validity & NFS_INO_INVALID_DATA)
+ return true;
+ return false;
+}
+
/* The file offset position represents the dirent entry number. A
last cookie cache takes care of the common case of reading the
whole directory.
@@ -847,7 +881,7 @@ static int nfs_readdir(struct file *file, struct dir_context *ctx)
desc->plus = nfs_use_readdirplus(inode, ctx) ? 1 : 0;

nfs_block_sillyrename(dentry);
- if (ctx->pos == 0 || nfs_attribute_cache_expired(inode))
+ if (ctx->pos == 0 || nfs_dir_mapping_need_revalidate(inode))
res = nfs_revalidate_mapping(inode, file->f_mapping);
if (res < 0)
goto out;
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 4636b828e957..d1407380338a 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -594,6 +594,25 @@ void nfs_setattr_update_inode(struct inode *inode, struct iattr *attr)
}
EXPORT_SYMBOL_GPL(nfs_setattr_update_inode);

+static void nfs_request_parent_use_readdirplus(struct dentry *dentry)
+{
+ struct dentry *parent;
+
+ parent = dget_parent(dentry);
+ nfs_force_use_readdirplus(parent->d_inode);
+ dput(parent);
+}
+
+static bool nfs_need_revalidate_inode(struct inode *inode)
+{
+ if (NFS_I(inode)->cache_validity &
+ (NFS_INO_INVALID_ATTR|NFS_INO_INVALID_LABEL))
+ return true;
+ if (nfs_attribute_cache_expired(inode))
+ return true;
+ return false;
+}
+
int nfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat)
{
struct inode *inode = dentry->d_inode;
@@ -622,10 +641,13 @@ int nfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat)
((mnt->mnt_flags & MNT_NODIRATIME) && S_ISDIR(inode->i_mode)))
need_atime = 0;

- if (need_atime)
- err = __nfs_revalidate_inode(NFS_SERVER(inode), inode);
- else
- err = nfs_revalidate_inode(NFS_SERVER(inode), inode);
+ if (need_atime || nfs_need_revalidate_inode(inode)) {
+ struct nfs_server *server = NFS_SERVER(inode);
+
+ if (server->caps & NFS_CAP_READDIRPLUS)
+ nfs_request_parent_use_readdirplus(dentry);
+ err = __nfs_revalidate_inode(server, inode);
+ }
if (!err) {
generic_fillattr(inode, stat);
stat->ino = nfs_compat_user_ino64(NFS_FILEID(inode));
@@ -967,9 +989,7 @@ int nfs_attribute_cache_expired(struct inode *inode)
*/
int nfs_revalidate_inode(struct nfs_server *server, struct inode *inode)
{
- if (!(NFS_I(inode)->cache_validity &
- (NFS_INO_INVALID_ATTR|NFS_INO_INVALID_LABEL))
- && !nfs_attribute_cache_expired(inode))
+ if (!nfs_need_revalidate_inode(inode))
return NFS_STALE(inode) ? -ESTALE : 0;
return __nfs_revalidate_inode(server, inode);
}
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index fafdddac8271..7f7c476d0c2c 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -300,6 +300,7 @@ extern struct nfs_client *nfs_init_client(struct nfs_client *clp,
const char *ip_addr);

/* dir.c */
+extern void nfs_force_use_readdirplus(struct inode *dir);
extern unsigned long nfs_access_cache_count(struct shrinker *shrink,
struct shrink_control *sc);
extern unsigned long nfs_access_cache_scan(struct shrinker *shrink,
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 9123ce3fc6a5..2c22722b406f 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -92,6 +92,7 @@ struct nfs_open_context {
};

struct nfs_open_dir_context {
+ struct list_head list;
struct rpc_cred *cred;
unsigned long attr_gencount;
__u64 dir_cookie;
--
1.8.5.3


--
Trond Myklebust
Linux NFS client maintainer, PrimaryData
[email protected]



2014-02-07 04:30:43

by NeilBrown

[permalink] [raw]
Subject: Re: readdir vs. getattr

On Thu, 06 Feb 2014 17:12:59 -0500 Trond Myklebust
<[email protected]> wrote:

> On Thu, 2014-02-06 at 13:51 +1100, NeilBrown wrote:
> > On Thu, 6 Feb 2014 13:45:39 +1100 NeilBrown <[email protected]> wrote:
> >
> >
> > > The change to nfs_update_inode fixes an issue that had me stumped for a
> > > while. It was still sending lots of GETATTR requests even after it had
> > > switched to READDIRPLUS instead of using cached info. So that might be a
> > > genuine bug that should be fixed independently of this patch.
> >
> > I managed to post the wrong version of the patch, which didn't have this
> > change. Sorry.
> >
> > Here is the real one.
> >
> > NeilBrown
>
> Hi Neil,
>
> Is there any reason why this patch couldn't just be replaced with the
> following?

Well.... the following doesn't make any change in my test case. :-(

In my test case there is a large directory on the server which doesn't change.
The client sends lots of requests to see if it has changed, but it hasn't.
(Tested with the labeled-NFS-regression fix applied)

Your patch only changes behaviour if nfs_force_lookup_revalidate is called,
and that is only called when NFS_ATTR_FATTR_CHANGE is set, or when an NFSv4
client modifies the directory. Neither of those are happening.

Thanks,
NeilBrown


> (Please note the separate fix for the labeled NFS regression that you
> pointed out.)
>
> Cheers
> Trond
>
> 8<-------------------------------------------------------------------
> >From fda6d0fb7988fd7d541ef6d1023bca92b0e10333 Mon Sep 17 00:00:00 2001
> From: Trond Myklebust <[email protected]>
> Date: Thu, 6 Feb 2014 16:45:21 -0500
> Subject: [PATCH] NFS: Use readdirplus in order to efficiently revalidate whole
> directories
>
> When we call nfs_force_lookup_revalidate() in order to force a full
> lookup of all child dentries of a directory, it makes sense to use
> readdirplus to perform that revalidation as efficiently as possible.
>
> Signed-off-by: Trond Myklebust <[email protected]>
> ---
> fs/nfs/dir.c | 20 ++++++++++++++++++++
> 1 file changed, 20 insertions(+)
>
> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> index be38b573495a..2abcae330ad0 100644
> --- a/fs/nfs/dir.c
> +++ b/fs/nfs/dir.c
> @@ -437,6 +437,24 @@ void nfs_advise_use_readdirplus(struct inode *dir)
> set_bit(NFS_INO_ADVISE_RDPLUS, &NFS_I(dir)->flags);
> }
>
> +/*
> + * Force use of readdirplus to ensure efficient revalidation of
> + * the directory child dentries when we know that we will need
> + * to revalidate the whole directory.
> + *
> + * Caller must hold dir->i_lock.
> + */
> +static
> +void nfs_force_use_readdirplus(struct inode *dir)
> +{
> + if (nfs_server_capable(dir, NFS_CAP_READDIRPLUS)) {
> + struct nfs_inode *nfsi = NFS_I(dir);
> +
> + set_bit(NFS_INO_ADVISE_RDPLUS, &nfsi->flags);
> + nfsi->cache_validity |= NFS_INO_INVALID_DATA;
> + }
> +}
> +
> static
> void nfs_prime_dcache(struct dentry *parent, struct nfs_entry *entry)
> {
> @@ -942,12 +960,14 @@ static int nfs_fsync_dir(struct file *filp, loff_t start, loff_t end,
> * This forces the revalidation code in nfs_lookup_revalidate() to do a
> * full lookup on all child dentries of 'dir' whenever a change occurs
> * on the server that might have invalidated our dcache.
> + * Call nfs_force_use_readdirplus for efficiency.
> *
> * The caller should be holding dir->i_lock
> */
> void nfs_force_lookup_revalidate(struct inode *dir)
> {
> NFS_I(dir)->cache_change_attribute++;
> + nfs_force_use_readdirplus(dir);
> }
> EXPORT_SYMBOL_GPL(nfs_force_lookup_revalidate);
>


Attachments:
signature.asc (828.00 B)

2014-02-06 22:13:02

by Trond Myklebust

[permalink] [raw]
Subject: Re: readdir vs. getattr

On Thu, 2014-02-06 at 13:51 +1100, NeilBrown wrote:
> On Thu, 6 Feb 2014 13:45:39 +1100 NeilBrown <[email protected]> wrote:
>
>
> > The change to nfs_update_inode fixes an issue that had me stumped for a
> > while. It was still sending lots of GETATTR requests even after it had
> > switched to READDIRPLUS instead of using cached info. So that might be a
> > genuine bug that should be fixed independently of this patch.
>
> I managed to post the wrong version of the patch, which didn't have this
> change. Sorry.
>
> Here is the real one.
>
> NeilBrown

Hi Neil,

Is there any reason why this patch couldn't just be replaced with the
following?
(Please note the separate fix for the labeled NFS regression that you
pointed out.)

Cheers
Trond

8<-------------------------------------------------------------------
>From fda6d0fb7988fd7d541ef6d1023bca92b0e10333 Mon Sep 17 00:00:00 2001
From: Trond Myklebust <[email protected]>
Date: Thu, 6 Feb 2014 16:45:21 -0500
Subject: [PATCH] NFS: Use readdirplus in order to efficiently revalidate whole
directories

When we call nfs_force_lookup_revalidate() in order to force a full
lookup of all child dentries of a directory, it makes sense to use
readdirplus to perform that revalidation as efficiently as possible.

Signed-off-by: Trond Myklebust <[email protected]>
---
fs/nfs/dir.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index be38b573495a..2abcae330ad0 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -437,6 +437,24 @@ void nfs_advise_use_readdirplus(struct inode *dir)
set_bit(NFS_INO_ADVISE_RDPLUS, &NFS_I(dir)->flags);
}

+/*
+ * Force use of readdirplus to ensure efficient revalidation of
+ * the directory child dentries when we know that we will need
+ * to revalidate the whole directory.
+ *
+ * Caller must hold dir->i_lock.
+ */
+static
+void nfs_force_use_readdirplus(struct inode *dir)
+{
+ if (nfs_server_capable(dir, NFS_CAP_READDIRPLUS)) {
+ struct nfs_inode *nfsi = NFS_I(dir);
+
+ set_bit(NFS_INO_ADVISE_RDPLUS, &nfsi->flags);
+ nfsi->cache_validity |= NFS_INO_INVALID_DATA;
+ }
+}
+
static
void nfs_prime_dcache(struct dentry *parent, struct nfs_entry *entry)
{
@@ -942,12 +960,14 @@ static int nfs_fsync_dir(struct file *filp, loff_t start, loff_t end,
* This forces the revalidation code in nfs_lookup_revalidate() to do a
* full lookup on all child dentries of 'dir' whenever a change occurs
* on the server that might have invalidated our dcache.
+ * Call nfs_force_use_readdirplus for efficiency.
*
* The caller should be holding dir->i_lock
*/
void nfs_force_lookup_revalidate(struct inode *dir)
{
NFS_I(dir)->cache_change_attribute++;
+ nfs_force_use_readdirplus(dir);
}
EXPORT_SYMBOL_GPL(nfs_force_lookup_revalidate);

--
1.8.5.3


--
Trond Myklebust
Linux NFS client maintainer, PrimaryData
[email protected]



2014-02-07 19:47:27

by Trond Myklebust

[permalink] [raw]
Subject: Re: readdir vs. getattr


On Feb 6, 2014, at 23:30, NeilBrown <[email protected]> wrote:

> On Thu, 06 Feb 2014 17:12:59 -0500 Trond Myklebust
> <[email protected]> wrote:
>
>> On Thu, 2014-02-06 at 13:51 +1100, NeilBrown wrote:
>>> On Thu, 6 Feb 2014 13:45:39 +1100 NeilBrown <[email protected]> wrote:
>>>
>>>
>>>> The change to nfs_update_inode fixes an issue that had me stumped for a
>>>> while. It was still sending lots of GETATTR requests even after it had
>>>> switched to READDIRPLUS instead of using cached info. So that might be a
>>>> genuine bug that should be fixed independently of this patch.
>>>
>>> I managed to post the wrong version of the patch, which didn't have this
>>> change. Sorry.
>>>
>>> Here is the real one.
>>>
>>> NeilBrown
>>
>> Hi Neil,
>>
>> Is there any reason why this patch couldn't just be replaced with the
>> following?
>
> Well.... the following doesn't make any change in my test case. :-(
>
> In my test case there is a large directory on the server which doesn't change.
> The client sends lots of requests to see if it has changed, but it hasn't.
> (Tested with the labeled-NFS-regression fix applied)
>
> Your patch only changes behaviour if nfs_force_lookup_revalidate is called,
> and that is only called when NFS_ATTR_FATTR_CHANGE is set, or when an NFSv4
> client modifies the directory. Neither of those are happening.

Right, but I really do not know how to fix that case.

I read your patch as saying: "If we're revalidating a dentry and we find that the directory has not changed but that the dentry?s inode needs revalidation, then invalidate that directory's attribute, acl and readdir cached data. Then drop the dentry?. I have a hard time seeing how that can be beneficial.

If we were to modify it, to only invalidate the readdir cached data, then that might improve matters, but it is still hard for me to see how a single inode needing revalidation can justify a full re-read of the parent directory in the general case. You may end up rereading thousands of readdir entries from the server just because someone did a path lookup.

_________________________________
Trond Myklebust
Linux NFS client maintainer, PrimaryData
[email protected]


2014-02-07 04:21:16

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH] NFS: Do not set NFS_INO_INVALID_LABEL unless server supports labeled NFS

On Thu, 6 Feb 2014 14:53:38 -0500 Trond Myklebust
<[email protected]> wrote:

> Commit aa9c2669626c (NFS: Client implementation of Labeled-NFS) introduces
> a performance regression. When nfs_zap_caches_locked is called, it sets
> the NFS_INO_INVALID_LABEL flag irrespectively of whether or not the
> NFS server supports security labels. Since that flag is never cleared,
> it means that all calls to nfs_revalidate_inode() will now trigger
> an on-the-wire GETATTR call.
>
> This patch ensures that we never set the NFS_INO_INVALID_LABEL unless the
> server advertises support for labeled NFS.
> It also causes nfs_setsecurity() to clear NFS_INO_INVALID_LABEL when it
> has successfully set the security label for the inode.
> Finally it gets rid of the NFS_INO_INVALID_LABEL cruft from nfs_update_inode,
> which has nothing to do with labeled NFS.
>
> Reported-by: Neil Brown <[email protected]>
> Cc: [email protected] # 3.11+
> Signed-off-by: Trond Myklebust <[email protected]>
> ---
>
> Hi Neil,
> Does this fix the GETATTR regression that you reported?

Yes, thanks.
With this patch I don't get streams of GETATTR requests after the READDIR
request which I previously fixed with that oneline change in inode.c

Tested-by: NeilBrown <[email protected]>

Thanks,
NeilBrown


>
> Cheers
> Trond
>
> fs/nfs/inode.c | 14 ++++++++++----
> fs/nfs/internal.h | 9 +++++++++
> 2 files changed, 19 insertions(+), 4 deletions(-)
>
> diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
> index 29cb93653b3c..4636b828e957 100644
> --- a/fs/nfs/inode.c
> +++ b/fs/nfs/inode.c
> @@ -169,7 +169,6 @@ static void nfs_zap_caches_locked(struct inode *inode)
> if (S_ISREG(mode) || S_ISDIR(mode) || S_ISLNK(mode)) {
> nfs_fscache_invalidate(inode);
> nfsi->cache_validity |= NFS_INO_INVALID_ATTR
> - | NFS_INO_INVALID_LABEL
> | NFS_INO_INVALID_DATA
> | NFS_INO_INVALID_ACCESS
> | NFS_INO_INVALID_ACL
> @@ -177,10 +176,10 @@ static void nfs_zap_caches_locked(struct inode *inode)
> nfs_zap_readdir_cookie(nfsi);
> } else
> nfsi->cache_validity |= NFS_INO_INVALID_ATTR
> - | NFS_INO_INVALID_LABEL
> | NFS_INO_INVALID_ACCESS
> | NFS_INO_INVALID_ACL
> | NFS_INO_REVAL_PAGECACHE;
> + nfs_zap_label_cache_locked(nfsi);
> }
>
> void nfs_zap_caches(struct inode *inode)
> @@ -272,6 +271,13 @@ nfs_init_locked(struct inode *inode, void *opaque)
> }
>
> #ifdef CONFIG_NFS_V4_SECURITY_LABEL
> +static void nfs_clear_label_invalid(struct inode *inode)
> +{
> + spin_lock(&inode->i_lock);
> + NFS_I(inode)->cache_validity &= ~NFS_INO_INVALID_LABEL;
> + spin_unlock(&inode->i_lock);
> +}
> +
> void nfs_setsecurity(struct inode *inode, struct nfs_fattr *fattr,
> struct nfs4_label *label)
> {
> @@ -289,6 +295,7 @@ void nfs_setsecurity(struct inode *inode, struct nfs_fattr *fattr,
> __func__,
> (char *)label->label,
> label->len, error);
> + nfs_clear_label_invalid(inode);
> }
> }
>
> @@ -1654,7 +1661,7 @@ static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr)
> inode->i_blocks = fattr->du.nfs2.blocks;
>
> /* Update attrtimeo value if we're out of the unstable period */
> - if (invalid & (NFS_INO_INVALID_ATTR|NFS_INO_INVALID_LABEL)) {
> + if (invalid & NFS_INO_INVALID_ATTR) {
> nfs_inc_stats(inode, NFSIOS_ATTRINVALIDATE);
> nfsi->attrtimeo = NFS_MINATTRTIMEO(inode);
> nfsi->attrtimeo_timestamp = now;
> @@ -1667,7 +1674,6 @@ static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr)
> }
> }
> invalid &= ~NFS_INO_INVALID_ATTR;
> - invalid &= ~NFS_INO_INVALID_LABEL;
> /* Don't invalidate the data if we were to blame */
> if (!(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode)
> || S_ISLNK(inode->i_mode)))
> diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
> index 8b5cc04a8611..fafdddac8271 100644
> --- a/fs/nfs/internal.h
> +++ b/fs/nfs/internal.h
> @@ -279,9 +279,18 @@ static inline void nfs4_label_free(struct nfs4_label *label)
> }
> return;
> }
> +
> +static inline void nfs_zap_label_cache_locked(struct nfs_inode *nfsi)
> +{
> + if (nfs_server_capable(&nfsi->vfs_inode, NFS_CAP_SECURITY_LABEL))
> + nfsi->cache_validity |= NFS_INO_INVALID_LABEL;
> +}
> #else
> static inline struct nfs4_label *nfs4_label_alloc(struct nfs_server *server, gfp_t flags) { return NULL; }
> static inline void nfs4_label_free(void *label) {}
> +static inline void nfs_zap_label_cache_locked(struct nfs_inode *nfsi)
> +{
> +}
> #endif /* CONFIG_NFS_V4_SECURITY_LABEL */
>
> /* proc.c */


Attachments:
signature.asc (828.00 B)

2014-02-06 02:51:16

by NeilBrown

[permalink] [raw]
Subject: Re: readdir vs. getattr


On Thu, 6 Feb 2014 13:45:39 +1100 NeilBrown <[email protected]> wrote:


> The change to nfs_update_inode fixes an issue that had me stumped for a
> while. It was still sending lots of GETATTR requests even after it had
> switched to READDIRPLUS instead of using cached info. So that might be a
> genuine bug that should be fixed independently of this patch.

I managed to post the wrong version of the patch, which didn't have this
change. Sorry.

Here is the real one.

NeilBrown

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index be38b573495a..b237fd7f2e0e 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -845,9 +845,12 @@ static int nfs_readdir(struct file *file, struct dir_context *ctx)
desc->dir_cookie = &dir_ctx->dir_cookie;
desc->decode = NFS_PROTO(inode)->decode_dirent;
desc->plus = nfs_use_readdirplus(inode, ctx) ? 1 : 0;
+ if (desc->plus && ctx->pos == 0)
+ clear_bit(NFS_INO_DID_FLUSH, &NFS_I(inode)->flags);

nfs_block_sillyrename(dentry);
- if (ctx->pos == 0 || nfs_attribute_cache_expired(inode))
+ if (ctx->pos == 0 || nfs_attribute_cache_expired(inode) ||
+ (NFS_I(inode)->cache_validity & NFS_INO_INVALID_DATA))
res = nfs_revalidate_mapping(inode, file->f_mapping);
if (res < 0)
goto out;
@@ -1080,6 +1083,16 @@ static int nfs_lookup_revalidate(struct dentry *dentry, unsigned int flags)

/* Force a full look up iff the parent directory has changed */
if (!nfs_is_exclusive_create(dir, flags) && nfs_check_verifier(dir, dentry)) {
+ if (nfs_server_capable(dir, NFS_CAP_READDIRPLUS)
+ && ((NFS_I(inode)->cache_validity &
+ (NFS_INO_INVALID_ATTR|NFS_INO_INVALID_LABEL))
+ || nfs_attribute_cache_expired(inode))
+ && !test_and_set_bit(NFS_INO_DID_FLUSH, &NFS_I(dir)->flags)
+ ) {
+ nfs_advise_use_readdirplus(dir);
+ goto out_zap_parent;
+ }
+
if (nfs_lookup_verify_inode(inode, flags))
goto out_zap_parent;
goto out_valid;
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 28a0a3cbd3b7..3996e6c7a728 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -1526,6 +1526,7 @@ static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr)

save_cache_validity = nfsi->cache_validity;
nfsi->cache_validity &= ~(NFS_INO_INVALID_ATTR
+ | NFS_INO_INVALID_LABEL
| NFS_INO_INVALID_ATIME
| NFS_INO_REVAL_FORCED
| NFS_INO_REVAL_PAGECACHE);
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 0ae5807480f4..c282ce3e5349 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -218,6 +218,7 @@ struct nfs_inode {
#define NFS_INO_COMMIT (7) /* inode is committing unstable writes */
#define NFS_INO_LAYOUTCOMMIT (9) /* layoutcommit required */
#define NFS_INO_LAYOUTCOMMITTING (10) /* layoutcommit inflight */
+#define NFS_INO_DID_FLUSH (11)

static inline struct nfs_inode *NFS_I(const struct inode *inode)
{


Attachments:
signature.asc (828.00 B)

2014-02-06 19:53:41

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH] NFS: Do not set NFS_INO_INVALID_LABEL unless server supports labeled NFS

Commit aa9c2669626c (NFS: Client implementation of Labeled-NFS) introduces
a performance regression. When nfs_zap_caches_locked is called, it sets
the NFS_INO_INVALID_LABEL flag irrespectively of whether or not the
NFS server supports security labels. Since that flag is never cleared,
it means that all calls to nfs_revalidate_inode() will now trigger
an on-the-wire GETATTR call.

This patch ensures that we never set the NFS_INO_INVALID_LABEL unless the
server advertises support for labeled NFS.
It also causes nfs_setsecurity() to clear NFS_INO_INVALID_LABEL when it
has successfully set the security label for the inode.
Finally it gets rid of the NFS_INO_INVALID_LABEL cruft from nfs_update_inode,
which has nothing to do with labeled NFS.

Reported-by: Neil Brown <[email protected]>
Cc: [email protected] # 3.11+
Signed-off-by: Trond Myklebust <[email protected]>
---

Hi Neil,
Does this fix the GETATTR regression that you reported?

Cheers
Trond

fs/nfs/inode.c | 14 ++++++++++----
fs/nfs/internal.h | 9 +++++++++
2 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 29cb93653b3c..4636b828e957 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -169,7 +169,6 @@ static void nfs_zap_caches_locked(struct inode *inode)
if (S_ISREG(mode) || S_ISDIR(mode) || S_ISLNK(mode)) {
nfs_fscache_invalidate(inode);
nfsi->cache_validity |= NFS_INO_INVALID_ATTR
- | NFS_INO_INVALID_LABEL
| NFS_INO_INVALID_DATA
| NFS_INO_INVALID_ACCESS
| NFS_INO_INVALID_ACL
@@ -177,10 +176,10 @@ static void nfs_zap_caches_locked(struct inode *inode)
nfs_zap_readdir_cookie(nfsi);
} else
nfsi->cache_validity |= NFS_INO_INVALID_ATTR
- | NFS_INO_INVALID_LABEL
| NFS_INO_INVALID_ACCESS
| NFS_INO_INVALID_ACL
| NFS_INO_REVAL_PAGECACHE;
+ nfs_zap_label_cache_locked(nfsi);
}

void nfs_zap_caches(struct inode *inode)
@@ -272,6 +271,13 @@ nfs_init_locked(struct inode *inode, void *opaque)
}

#ifdef CONFIG_NFS_V4_SECURITY_LABEL
+static void nfs_clear_label_invalid(struct inode *inode)
+{
+ spin_lock(&inode->i_lock);
+ NFS_I(inode)->cache_validity &= ~NFS_INO_INVALID_LABEL;
+ spin_unlock(&inode->i_lock);
+}
+
void nfs_setsecurity(struct inode *inode, struct nfs_fattr *fattr,
struct nfs4_label *label)
{
@@ -289,6 +295,7 @@ void nfs_setsecurity(struct inode *inode, struct nfs_fattr *fattr,
__func__,
(char *)label->label,
label->len, error);
+ nfs_clear_label_invalid(inode);
}
}

@@ -1654,7 +1661,7 @@ static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr)
inode->i_blocks = fattr->du.nfs2.blocks;

/* Update attrtimeo value if we're out of the unstable period */
- if (invalid & (NFS_INO_INVALID_ATTR|NFS_INO_INVALID_LABEL)) {
+ if (invalid & NFS_INO_INVALID_ATTR) {
nfs_inc_stats(inode, NFSIOS_ATTRINVALIDATE);
nfsi->attrtimeo = NFS_MINATTRTIMEO(inode);
nfsi->attrtimeo_timestamp = now;
@@ -1667,7 +1674,6 @@ static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr)
}
}
invalid &= ~NFS_INO_INVALID_ATTR;
- invalid &= ~NFS_INO_INVALID_LABEL;
/* Don't invalidate the data if we were to blame */
if (!(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode)
|| S_ISLNK(inode->i_mode)))
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 8b5cc04a8611..fafdddac8271 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -279,9 +279,18 @@ static inline void nfs4_label_free(struct nfs4_label *label)
}
return;
}
+
+static inline void nfs_zap_label_cache_locked(struct nfs_inode *nfsi)
+{
+ if (nfs_server_capable(&nfsi->vfs_inode, NFS_CAP_SECURITY_LABEL))
+ nfsi->cache_validity |= NFS_INO_INVALID_LABEL;
+}
#else
static inline struct nfs4_label *nfs4_label_alloc(struct nfs_server *server, gfp_t flags) { return NULL; }
static inline void nfs4_label_free(void *label) {}
+static inline void nfs_zap_label_cache_locked(struct nfs_inode *nfsi)
+{
+}
#endif /* CONFIG_NFS_V4_SECURITY_LABEL */

/* proc.c */
--
1.8.5.3


2014-02-06 02:45:52

by NeilBrown

[permalink] [raw]
Subject: Re: readdir vs. getattr

On Wed, 29 Jan 2014 07:18:41 -0500 Jeff Layton <[email protected]>
wrote:

> On Wed, 29 Jan 2014 10:21:43 +0100 (CET)
> "Mkrtchyan, Tigran" <[email protected]> wrote:
>
> > (without knowing kernel details)
> >
> > What about some stat counter and trigger readdirplus instead of next getattr if
> > some threshold are reached. Let say directory has more than 8000 entries and
> > getattr was called on 10% of the files. Of course you have to watch this per
> > process.
> >
> > Tigran.
> >
>
> That might be the only possible solution, but as always with these
> sorts of heuristics, it's bound to help some workloads and hurt others.
>
> I'm not sure you'd need to watch that per-process. The pagecache data is
> shared after all. The problem though is that you'd need to keep track
> of which (or how many different) entries in the dir have gotten a
> GETATTR since the last READDIRPLUS.
>
> IOW, if someone is repeatedly hammering a single entry in a directory
> with stat() calls that need the atime, you probably don't want to force
> a READDIRPLUS on the next readdir() call. If however, they're stat'ing
> each entry in the directory in turn for a "ls -l" type workload then
> you probably do want to dump the cache and just go for a READDIRPLUS.
>
> The real way to fix this is probably for someone to pick up and drive
> the xstat()/readdirplus() work. If 'ls -l' used a readdirplus() syscall
> to do its bidding then you'd have a pretty good idea of what you'd need
> to do. That won't help in the short term though...

I agree that improved syscalls are probably the only way to get perfect
solution. So I tried thinking of an imperfect solution that is still a net
positive.

The customer issue that started me looking at this involved a directory with
1 million files. I don't know if this is a genuine use case or just making
sure that it all scales nicely, but it is a case where a partial fix could
make a big difference.

"ls -l" first doest a "getdents" with a 32K buffer, then stats all those
files, then does another getdents.
With 2000 file, that is only one getdents call. With 1,000,000 it would be
hundreds.
It is not possible to know that the first getdents call should force a
readdirplus. It is quite easy to know that the second and later ones should.
This approach will make no difference to the 2000 file case, but will restore
better than 99% of performance in the 1,000,000 file case.

So I'm suggesting the following patch. I don't really understand the cache
rules fully so it might be more complicated than needed, and it might even be
wrong. But it seems to work on some simple testing.

The change to nfs_update_inode fixes an issue that had me stumped for a
while. It was still sending lots of GETATTR requests even after it had
switched to READDIRPLUS instead of using cached info. So that might be a
genuine bug that should be fixed independently of this patch.

The idea is that the first time we find that we look up a file in a directory
and will need to do a GETATTR, we flush the cached data in the directory so
READDIRPLUS will be used in future.

Comments very welcome.

Thanks,
NeilBrown


diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index be38b573495a..b237fd7f2e0e 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -845,9 +845,12 @@ static int nfs_readdir(struct file *file, struct dir_context *ctx)
desc->dir_cookie = &dir_ctx->dir_cookie;
desc->decode = NFS_PROTO(inode)->decode_dirent;
desc->plus = nfs_use_readdirplus(inode, ctx) ? 1 : 0;
+ if (desc->plus && ctx->pos == 0)
+ clear_bit(NFS_INO_DID_FLUSH, &NFS_I(inode)->flags);

nfs_block_sillyrename(dentry);
- if (ctx->pos == 0 || nfs_attribute_cache_expired(inode))
+ if (ctx->pos == 0 || nfs_attribute_cache_expired(inode) ||
+ (NFS_I(inode)->cache_validity & NFS_INO_INVALID_DATA))
res = nfs_revalidate_mapping(inode, file->f_mapping);
if (res < 0)
goto out;
@@ -1080,6 +1083,16 @@ static int nfs_lookup_revalidate(struct dentry *dentry, unsigned int flags)

/* Force a full look up iff the parent directory has changed */
if (!nfs_is_exclusive_create(dir, flags) && nfs_check_verifier(dir, dentry)) {
+ if (nfs_server_capable(dir, NFS_CAP_READDIRPLUS)
+ && ((NFS_I(inode)->cache_validity &
+ (NFS_INO_INVALID_ATTR|NFS_INO_INVALID_LABEL))
+ || nfs_attribute_cache_expired(inode))
+ && !test_and_set_bit(NFS_INO_DID_FLUSH, &NFS_I(dir)->flags)
+ ) {
+ nfs_advise_use_readdirplus(dir);
+ goto out_zap_parent;
+ }
+
if (nfs_lookup_verify_inode(inode, flags))
goto out_zap_parent;
goto out_valid;
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 0ae5807480f4..c282ce3e5349 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -218,6 +218,7 @@ struct nfs_inode {
#define NFS_INO_COMMIT (7) /* inode is committing unstable writes */
#define NFS_INO_LAYOUTCOMMIT (9) /* layoutcommit required */
#define NFS_INO_LAYOUTCOMMITTING (10) /* layoutcommit inflight */
+#define NFS_INO_DID_FLUSH (11)

static inline struct nfs_inode *NFS_I(const struct inode *inode)
{


Attachments:
signature.asc (828.00 B)

2014-02-06 18:08:31

by Jeff Layton

[permalink] [raw]
Subject: Re: readdir vs. getattr

On Thu, 6 Feb 2014 13:51:01 +1100
NeilBrown <[email protected]> wrote:

>
> On Thu, 6 Feb 2014 13:45:39 +1100 NeilBrown <[email protected]> wrote:
>

> The idea is that the first time we find that we look up a file in a directory
> and will need to do a GETATTR, we flush the cached data in the directory so
> READDIRPLUS will be used in future.

Nice. I like that idea. If we had previously used READDIRPLUS then we'd
presumably have the inode in cache and would only need to do the
GETATTR if the attributes had expired. If they had expired on that one
then the cached directory data is probably stale as well.

OTOH, files and directories have different actimeo values. If the
acreg* values are smaller than the acdir* values then we may end up
doing more on the wire READDIR* calls than we would have prior to this
patch.

That's not necessarily wrong, but it might change the behavior on some
workloads. I guess I need to ponder that a bit more... ;)

>
> > The change to nfs_update_inode fixes an issue that had me stumped for a
> > while. It was still sending lots of GETATTR requests even after it had
> > switched to READDIRPLUS instead of using cached info. So that might be a
> > genuine bug that should be fixed independently of this patch.
>
> I managed to post the wrong version of the patch, which didn't have this
> change. Sorry.
>
> Here is the real one.
>
> NeilBrown
>

Looks good overall, comment inline below...

> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> index be38b573495a..b237fd7f2e0e 100644
> --- a/fs/nfs/dir.c
> +++ b/fs/nfs/dir.c
> @@ -845,9 +845,12 @@ static int nfs_readdir(struct file *file, struct dir_context *ctx)
> desc->dir_cookie = &dir_ctx->dir_cookie;
> desc->decode = NFS_PROTO(inode)->decode_dirent;
> desc->plus = nfs_use_readdirplus(inode, ctx) ? 1 : 0;
> + if (desc->plus && ctx->pos == 0)
> + clear_bit(NFS_INO_DID_FLUSH, &NFS_I(inode)->flags);
>
> nfs_block_sillyrename(dentry);
> - if (ctx->pos == 0 || nfs_attribute_cache_expired(inode))
> + if (ctx->pos == 0 || nfs_attribute_cache_expired(inode) ||
> + (NFS_I(inode)->cache_validity & NFS_INO_INVALID_DATA))
> res = nfs_revalidate_mapping(inode, file->f_mapping);
> if (res < 0)
> goto out;
> @@ -1080,6 +1083,16 @@ static int nfs_lookup_revalidate(struct dentry *dentry, unsigned int flags)
>
> /* Force a full look up iff the parent directory has changed */
> if (!nfs_is_exclusive_create(dir, flags) && nfs_check_verifier(dir, dentry)) {
> + if (nfs_server_capable(dir, NFS_CAP_READDIRPLUS)
> + && ((NFS_I(inode)->cache_validity &
> + (NFS_INO_INVALID_ATTR|NFS_INO_INVALID_LABEL))
> + || nfs_attribute_cache_expired(inode))
> + && !test_and_set_bit(NFS_INO_DID_FLUSH, &NFS_I(dir)->flags)
> + ) {
> + nfs_advise_use_readdirplus(dir);
> + goto out_zap_parent;
> + }
> +
> if (nfs_lookup_verify_inode(inode, flags))
> goto out_zap_parent;
> goto out_valid;
> diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
> index 28a0a3cbd3b7..3996e6c7a728 100644
> --- a/fs/nfs/inode.c
> +++ b/fs/nfs/inode.c
> @@ -1526,6 +1526,7 @@ static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr)
>
> save_cache_validity = nfsi->cache_validity;
> nfsi->cache_validity &= ~(NFS_INO_INVALID_ATTR
> + | NFS_INO_INVALID_LABEL
> | NFS_INO_INVALID_ATIME
> | NFS_INO_REVAL_FORCED
> | NFS_INO_REVAL_PAGECACHE);

Well spotted. Honestly, I'm not sure that NFS_INO_INVALID_LABEL really
provides any value at all. I think the idea was to have that indicate
when the security label is invalid for labeled NFS, but it seems to
only be set and cleared in conjunction with NFS_INO_INVALID_ATTR.

It might be best to just dump that flag altogether and rely on
NFS_INO_INVALID_ATTR to indicate an invalid label as well. It really is
just another attribute after all (at least in the current
implementation).

As to your earlier point, yeah I think that fix is probably best done
in a separate patch since it's not directly related to the original
issue.

> diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
> index 0ae5807480f4..c282ce3e5349 100644
> --- a/include/linux/nfs_fs.h
> +++ b/include/linux/nfs_fs.h
> @@ -218,6 +218,7 @@ struct nfs_inode {
> #define NFS_INO_COMMIT (7) /* inode is committing unstable writes */
> #define NFS_INO_LAYOUTCOMMIT (9) /* layoutcommit required */
> #define NFS_INO_LAYOUTCOMMITTING (10) /* layoutcommit inflight */
> +#define NFS_INO_DID_FLUSH (11)
>
> static inline struct nfs_inode *NFS_I(const struct inode *inode)
> {


--
Jeff Layton <[email protected]>