Hello,
I am writing to report an issue on a nfs mount that disappears due to
an inode revalide failure (already sent in January but probably banned
with html format...).
This very old commit
(https://github.com/torvalds/linux/commit/cc89684c9a265828ce061037f1f79f4a68ccd3f7)
exactly show the problem I have and this old resolved issue
(https://bugzilla.kernel.org/show_bug.cgi?id=117651) is probably
failing again today
To sum up, I have a NFS mount inside another NFS mount (for example:
/opt/nfs/mount1 & /opt/nfs/mount1/mount2).
If I kill a task trying to get a file descriptor on
/opt/nfs/mount1/mount2 then it will be unmounted. My simple test code
to reproduce very easily:
int main(int argc, char *argv[]) {
while (1) {
close(open(argv[1], O_RDONLY));
}
}
In logs, I have: "nfs_revalidate_inode: (0:62/845965) getattr failed,
error=-512"
Tested on 5.19 and 6.1 kernel
Best regards,
Sylvain Menu
On Thu, Mar 09, 2023 at 10:42:41AM +0100, Sylvain Menu wrote:
> Hello,
>
> I am writing to report an issue on a nfs mount that disappears due to
> an inode revalide failure (already sent in January but probably banned
> with html format...).
> This very old commit
> (https://github.com/torvalds/linux/commit/cc89684c9a265828ce061037f1f79f4a68ccd3f7)
> exactly show the problem I have and this old resolved issue
> (https://bugzilla.kernel.org/show_bug.cgi?id=117651) is probably
> failing again today
>
> To sum up, I have a NFS mount inside another NFS mount (for example:
> /opt/nfs/mount1 & /opt/nfs/mount1/mount2).
> If I kill a task trying to get a file descriptor on
> /opt/nfs/mount1/mount2 then it will be unmounted. My simple test code
> to reproduce very easily:
>
> int main(int argc, char *argv[]) {
> while (1) {
> close(open(argv[1], O_RDONLY));
> }
> }
>
> In logs, I have: "nfs_revalidate_inode: (0:62/845965) getattr failed,
> error=-512"
>
> Tested on 5.19 and 6.1 kernel
So is this a regression or something that has always been present?
thanks,
greg k-h
I think it's a regression according to the old resolved bugs/tickets
but no idea since when it's broken again
Le jeu. 9 mars 2023 à 11:07, Greg KH <[email protected]> a écrit :
>
> On Thu, Mar 09, 2023 at 10:42:41AM +0100, Sylvain Menu wrote:
> > Hello,
> >
> > I am writing to report an issue on a nfs mount that disappears due to
> > an inode revalide failure (already sent in January but probably banned
> > with html format...).
> > This very old commit
> > (https://github.com/torvalds/linux/commit/cc89684c9a265828ce061037f1f79f4a68ccd3f7)
> > exactly show the problem I have and this old resolved issue
> > (https://bugzilla.kernel.org/show_bug.cgi?id=117651) is probably
> > failing again today
> >
> > To sum up, I have a NFS mount inside another NFS mount (for example:
> > /opt/nfs/mount1 & /opt/nfs/mount1/mount2).
> > If I kill a task trying to get a file descriptor on
> > /opt/nfs/mount1/mount2 then it will be unmounted. My simple test code
> > to reproduce very easily:
> >
> > int main(int argc, char *argv[]) {
> > while (1) {
> > close(open(argv[1], O_RDONLY));
> > }
> > }
> >
> > In logs, I have: "nfs_revalidate_inode: (0:62/845965) getattr failed,
> > error=-512"
> >
> > Tested on 5.19 and 6.1 kernel
>
> So is this a regression or something that has always been present?
>
> thanks,
>
> greg k-h
On Thu, Mar 09, 2023 at 11:17:30AM +0100, Sylvain Menu wrote:
> I think it's a regression according to the old resolved bugs/tickets
> but no idea since when it's broken again
Any chance you can do 'git bisect' to find where it broke and what
commit broke it?
thanks,
greg k-h
No I don't have that, I found the bug in production by no chance.
I tried to dive into the code but it quickly becomes complex for me,
at least it's easy to reproduce with a little script (while(1) timeout
my_c.code)
thanks
sylvain menu
Le jeu. 9 mars 2023 à 11:22, Greg KH <[email protected]> a écrit :
>
> On Thu, Mar 09, 2023 at 11:17:30AM +0100, Sylvain Menu wrote:
> > I think it's a regression according to the old resolved bugs/tickets
> > but no idea since when it's broken again
>
> Any chance you can do 'git bisect' to find where it broke and what
> commit broke it?
>
> thanks,
>
> greg k-h
On Fri, 10 Mar 2023, Sylvain Menu wrote:
> No I don't have that, I found the bug in production by no chance.
> I tried to dive into the code but it quickly becomes complex for me,
> at least it's easy to reproduce with a little script (while(1) timeout
> my_c.code)
>
> thanks
> sylvain menu
>
> Le jeu. 9 mars 2023 à 11:22, Greg KH <[email protected]> a écrit :
> >
> > On Thu, Mar 09, 2023 at 11:17:30AM +0100, Sylvain Menu wrote:
> > > I think it's a regression according to the old resolved bugs/tickets
> > > but no idea since when it's broken again
> >
> > Any chance you can do 'git bisect' to find where it broke and what
> > commit broke it?
Please see
https://lore.kernel.org/linux-nfs/[email protected]/
I posted a patch for this a couple of years ago, but Trond wouldn't take
it.
NeilBrown
> >
> > thanks,
> >
> > greg k-h
>