2017-11-09 01:35:57

by Naofumi Honda

[permalink] [raw]
Subject: Re: [Bug 197817] "Panic: unable to handle kernel page request" in posix_unblock_lock() called from nfs4_laundromat()

Dear Developers:

> https://bugzilla.kernel.org/show_bug.cgi?id=197817
>
> --- Comment #1 from [email protected] ---
> Yes, I think you're right.
>
> Would it be possible for you to submit a patch to fix that typo in those two
> places? (Just mail it to me at [email protected], cc: to
> [email protected]).
>

OK, I have attached the patch.

> It might also be useful to see your original oops.

Sorry, I have the only handwriting memo of console messages.
Maybe useless, but I also attach its scanned copy.

Sincerely yours
Naofumi Honda


Attachments:
oops_memo.pdf (39.77 kB)
fix.patch (1.20 kB)
signature.asc (833.00 B)
This is a digitally signed message part.
Download all attachments

2017-11-09 16:07:29

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [Bug 197817] "Panic: unable to handle kernel page request" in posix_unblock_lock() called from nfs4_laundromat()

On Thu, Nov 09, 2017 at 10:08:28AM +0900, Naofumi Honda wrote:
> Dear Developers:
>
> > https://bugzilla.kernel.org/show_bug.cgi?id=197817
> >
> > --- Comment #1 from [email protected] ---
> > Yes, I think you're right.
> >
> > Would it be possible for you to submit a patch to fix that typo in those two
> > places? (Just mail it to me at [email protected], cc: to
> > [email protected]).
> >
>
> OK, I have attached the patch.

Thanks for the investigation and the fix!

For future reference, we prefer patches to be inline with the email
message (not attached), and prefer them in "unified" format.

But for a one-off patch I can fix it up myself; applied as follows.

--b.

commit c26806a20fa3
Author: Naofumi Honda <[email protected]>
Date: Thu Nov 9 10:57:16 2017 -0500

nfsd: fix panic in posix_unblock_lock called from nfs4_laundromat

From kernel 4.9, my two nfsv4 servers sometimes suffer from
"panic: unable to handle kernel page request"
in posix_unblock_lock() called from nfs4_laundromat().

These panics diseappear if we revert the commit "nfsd: add a LRU list
for blocked locks".

The cause appears to be a typo in nfs4_laundromat(), which is also
present in nfs4_state_shutdown_net().

Cc: [email protected]
Fixes: 7919d0a27f1e "nfsd: add a LRU list for blocked locks"
Cc: [email protected]
Signed-off-by: J. Bruce Fields <[email protected]>

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 923243369bbc..b99830ab63aa 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -4791,7 +4791,7 @@ nfs4_laundromat(struct nfsd_net *nn)
spin_unlock(&nn->blocked_locks_lock);

while (!list_empty(&reaplist)) {
- nbl = list_first_entry(&nn->blocked_locks_lru,
+ nbl = list_first_entry(&reaplist,
struct nfsd4_blocked_lock, nbl_lru);
list_del_init(&nbl->nbl_lru);
posix_unblock_lock(&nbl->nbl_lock);
@@ -7260,7 +7260,7 @@ nfs4_state_shutdown_net(struct net *net)
spin_unlock(&nn->blocked_locks_lock);

while (!list_empty(&reaplist)) {
- nbl = list_first_entry(&nn->blocked_locks_lru,
+ nbl = list_first_entry(&reaplist,
struct nfsd4_blocked_lock, nbl_lru);
list_del_init(&nbl->nbl_lru);
posix_unblock_lock(&nbl->nbl_lock);

2017-11-09 16:15:00

by Jeff Layton

[permalink] [raw]
Subject: Re: [Bug 197817] "Panic: unable to handle kernel page request" in posix_unblock_lock() called from nfs4_laundromat()

On Thu, 2017-11-09 at 11:07 -0500, J. Bruce Fields wrote:
> On Thu, Nov 09, 2017 at 10:08:28AM +0900, Naofumi Honda wrote:
> > Dear Developers:
> >
> > > https://bugzilla.kernel.org/show_bug.cgi?id=197817
> > >
> > > --- Comment #1 from [email protected] ---
> > > Yes, I think you're right.
> > >
> > > Would it be possible for you to submit a patch to fix that typo in those two
> > > places? (Just mail it to me at [email protected], cc: to
> > > [email protected]).
> > >
> >
> > OK, I have attached the patch.
>
> Thanks for the investigation and the fix!
>
> For future reference, we prefer patches to be inline with the email
> message (not attached), and prefer them in "unified" format.
>
> But for a one-off patch I can fix it up myself; applied as follows.
>
> --b.
>
> commit c26806a20fa3
> Author: Naofumi Honda <[email protected]>
> Date: Thu Nov 9 10:57:16 2017 -0500
>
> nfsd: fix panic in posix_unblock_lock called from nfs4_laundromat
>
> From kernel 4.9, my two nfsv4 servers sometimes suffer from
> "panic: unable to handle kernel page request"
> in posix_unblock_lock() called from nfs4_laundromat().
>
> These panics diseappear if we revert the commit "nfsd: add a LRU list
> for blocked locks".
>
> The cause appears to be a typo in nfs4_laundromat(), which is also
> present in nfs4_state_shutdown_net().
>
> Cc: [email protected]
> Fixes: 7919d0a27f1e "nfsd: add a LRU list for blocked locks"
> Cc: [email protected]
> Signed-off-by: J. Bruce Fields <[email protected]>
>
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 923243369bbc..b99830ab63aa 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -4791,7 +4791,7 @@ nfs4_laundromat(struct nfsd_net *nn)
> spin_unlock(&nn->blocked_locks_lock);
>
> while (!list_empty(&reaplist)) {
> - nbl = list_first_entry(&nn->blocked_locks_lru,
> + nbl = list_first_entry(&reaplist,
> struct nfsd4_blocked_lock, nbl_lru);
> list_del_init(&nbl->nbl_lru);
> posix_unblock_lock(&nbl->nbl_lock);
> @@ -7260,7 +7260,7 @@ nfs4_state_shutdown_net(struct net *net)
> spin_unlock(&nn->blocked_locks_lock);
>
> while (!list_empty(&reaplist)) {
> - nbl = list_first_entry(&nn->blocked_locks_lru,
> + nbl = list_first_entry(&reaplist,
> struct nfsd4_blocked_lock, nbl_lru);
> list_del_init(&nbl->nbl_lru);
> posix_unblock_lock(&nbl->nbl_lock);

<facepalm>

Well spotted! I wonder if this might be the cause of some crashes we've
seen as well?

Reviewed-by: Jeff Layton <[email protected]>