2023-11-05 19:29:03

by Dai Ngo

[permalink] [raw]
Subject: General protection fault in nfs4_setup_sequence caused by delegation return task

Hi Trond,

When unmonting a NFS export, nfs_free_server is called. In nfs_free_server,
we call rpc_shutdown_client(server->client) to kill all pending RPC tasks
and wait for them to terminate before continue on to call nfs_put_client.
In nfs_put_client, if the refcounf is drecemented to 0 then we call
nfs_free_client which calls rpc_shutdown_client(clp->cl_rpcclient) to
kill all pending RPC tasks that use nfs_client->cl_rpcclient to send the
request.

Normally this works fine. However, due to some race conditions, if there are
delegation return RPC tasks have not been executed yet when nfs_free_server
is called then this can cause system to crash with general protection fault.

The conditions that can cause the crash are: (1) there are pending delegation
return tasks called from nfs4_state_manager to return idle delegations and
(2) the nfs_client's au_flavor is either RPC_AUTH_GSS_KRB5I or RPC_AUTH_GSS_KRB5P
and (3) the call to nfs_igrab_and_active, from _nfs4_proc_delegreturn, fails
for any reasons and (4) there is a pending RPC task renewing the lease.

Since the delegation return tasks were called with 'issync = 0' the refcount on
nfs_server were dropped (in nfs_client_return_marked_delegations after RPC task
was submited to the RPC layer) and the nfs_igrab_and_active call fails, these
RPC tasks do not hold any refcount on the nfs_server.

When nfs_free_server is called, rpc_shutdown_client(server->client) fails to
kill these delegation return tasks since these tasks using nfs_client->cl_rpcclient
to send the requests. When nfs_put_client is called, nfs_free_client is not
called because there is a pending lease renew RPC task which uses nfs_client->cl_rpcclient
to send the request and also adds a refcount on the nfs_client. This allows
the delegation return tasks to stay alive and continue on after the nfs_server
was freed.

I've seen the NFS client with 5.4 kernel crashes with this stack trace:

!# 0 [ffffb93b8fbdbd78] nfs4_setup_sequence [nfsv4] at ffffffffc0f27e40 fs/nfs/nfs4proc.c:1041:0
# 1 [ffffb93b8fbdbdb8] nfs4_delegreturn_prepare [nfsv4] at ffffffffc0f28ad1 fs/nfs/nfs4proc.c:6355:0
# 2 [ffffb93b8fbdbdd8] rpc_prepare_task [sunrpc] at ffffffffc05e33af net/sunrpc/sched.c:821:0
# 3 [ffffb93b8fbdbde8] __rpc_execute [sunrpc] at ffffffffc05eb527 net/sunrpc/sched.c:925:0
# 4 [ffffb93b8fbdbe48] rpc_async_schedule [sunrpc] at ffffffffc05eb8e0 net/sunrpc/sched.c:1013:0
# 5 [ffffb93b8fbdbe68] process_one_work at ffffffff92ad4289 kernel/workqueue.c:2281:0
# 6 [ffffb93b8fbdbeb0] worker_thread at ffffffff92ad50cf kernel/workqueue.c:2427:0
# 7 [ffffb93b8fbdbf10] kthread at ffffffff92adac05 kernel/kthread.c:296:0
# 8 [ffffb93b8fbdbf58] ret_from_fork at ffffffff93600364 arch/x86/entry/entry_64.S:355:0

Where the params of nfs4_setup_sequence:
client = (struct nfs_client *)0x4d54158ebc6cfc01
args = (struct nfs4_sequence_args *)0xffff998487f85800
res = (struct nfs4_sequence_res *)0xffff998487f85830
task = (struct rpc_task *)0xffff997d41da7d00

The 'client' pointer is invalid since it was extracted from d_data->res.server->nfs_client
and the nfs_server was freed.

I've reviewed the latest kernel 6.6-rc7, even though there are many changes
since 5.4 I could not see any any changes to prevent this scenario to happen
so I believe this problem still exists in 6.6-rc7.

I'd like to get your opinion on this potential issue with the latest kernel
and if the problem still exists then what's the best way to fix it.

Thanks,
-Dai


2023-11-06 16:30:47

by Trond Myklebust

[permalink] [raw]
Subject: Re: General protection fault in nfs4_setup_sequence caused by delegation return task

Hi Dai,

On Sun, 2023-11-05 at 11:27 -0800, [email protected] wrote:
> Hi Trond,
>
> When unmonting a NFS export, nfs_free_server is called. In
> nfs_free_server,
> we call rpc_shutdown_client(server->client) to kill all pending RPC
> tasks
> and wait for them to terminate before continue on to call
> nfs_put_client.
> In nfs_put_client, if the refcounf is drecemented to 0 then we call
> nfs_free_client which calls rpc_shutdown_client(clp->cl_rpcclient) to
> kill all pending RPC tasks that use nfs_client->cl_rpcclient to send
> the
> request.
>
> Normally this works fine. However, due to some race conditions, if
> there are
> delegation return RPC tasks have not been executed yet when
> nfs_free_server
> is called then this can cause system to crash with general protection
> fault.
>
> The conditions that can cause the crash are: (1) there are pending
> delegation
> return tasks called from nfs4_state_manager to return idle
> delegations and
> (2) the nfs_client's au_flavor is either RPC_AUTH_GSS_KRB5I or
> RPC_AUTH_GSS_KRB5P
> and (3) the call to nfs_igrab_and_active, from
> _nfs4_proc_delegreturn, fails
> for any reasons and (4) there is a pending RPC task renewing the
> lease.
>
> Since the delegation return tasks were called with 'issync = 0' the
> refcount on
> nfs_server were dropped (in nfs_client_return_marked_delegations
> after RPC task
> was submited to the RPC layer) and the nfs_igrab_and_active call
> fails, these
> RPC tasks do not hold any refcount on the nfs_server.
>
> When nfs_free_server is called, rpc_shutdown_client(server->client)
> fails to
> kill these delegation return tasks since these tasks using
> nfs_client->cl_rpcclient
> to send the requests. When nfs_put_client is called, nfs_free_client
> is not
> called because there is a pending lease renew RPC task which uses
> nfs_client->cl_rpcclient
> to send the request and also adds a refcount on the nfs_client. This
> allows
> the delegation return tasks to stay alive and continue on after the
> nfs_server
> was freed.
>
> I've seen the NFS client with 5.4 kernel crashes with this stack
> trace:
>
> !# 0 [ffffb93b8fbdbd78] nfs4_setup_sequence [nfsv4] at
> ffffffffc0f27e40 fs/nfs/nfs4proc.c:1041:0
>   # 1 [ffffb93b8fbdbdb8] nfs4_delegreturn_prepare [nfsv4] at
> ffffffffc0f28ad1 fs/nfs/nfs4proc.c:6355:0
>   # 2 [ffffb93b8fbdbdd8] rpc_prepare_task [sunrpc] at
> ffffffffc05e33af net/sunrpc/sched.c:821:0
>   # 3 [ffffb93b8fbdbde8] __rpc_execute [sunrpc] at ffffffffc05eb527
> net/sunrpc/sched.c:925:0
>   # 4 [ffffb93b8fbdbe48] rpc_async_schedule [sunrpc] at
> ffffffffc05eb8e0 net/sunrpc/sched.c:1013:0
>   # 5 [ffffb93b8fbdbe68] process_one_work at ffffffff92ad4289
> kernel/workqueue.c:2281:0
>   # 6 [ffffb93b8fbdbeb0] worker_thread at ffffffff92ad50cf
> kernel/workqueue.c:2427:0
>   # 7 [ffffb93b8fbdbf10] kthread at ffffffff92adac05
> kernel/kthread.c:296:0
>   # 8 [ffffb93b8fbdbf58] ret_from_fork at ffffffff93600364
> arch/x86/entry/entry_64.S:355:0
>         
> Where the params of nfs4_setup_sequence:
>       client = (struct nfs_client *)0x4d54158ebc6cfc01
>       args = (struct nfs4_sequence_args *)0xffff998487f85800
>       res = (struct nfs4_sequence_res *)0xffff998487f85830
>       task = (struct rpc_task *)0xffff997d41da7d00
>
> The 'client' pointer is invalid since it was extracted from d_data-
> >res.server->nfs_client
> and the nfs_server was freed.
>
> I've reviewed the latest kernel 6.6-rc7, even though there are many
> changes
> since 5.4 I could not see any any changes to prevent this scenario to
> happen
> so I believe this problem still exists in 6.6-rc7.
>
> I'd like to get your opinion on this potential issue with the latest
> kernel
> and if the problem still exists then what's the best way to fix it.
>

nfs_inode_evict_delegation() should be calling
nfs_do_return_delegation() with the issync flag set, whereas
nfs_server_return_marked_delegations() should be holding a reference to
the inode before it calls nfs_end_delegation_return().

So where is this combination no inode reference + issync=0 originating
from?

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2023-11-06 18:18:49

by Dai Ngo

[permalink] [raw]
Subject: Re: General protection fault in nfs4_setup_sequence caused by delegation return task


On 11/6/23 8:30 AM, Trond Myklebust wrote:
> Hi Dai,
>
> On Sun, 2023-11-05 at 11:27 -0800, [email protected] wrote:
>> Hi Trond,
>>
>> When unmonting a NFS export, nfs_free_server is called. In
>> nfs_free_server,
>> we call rpc_shutdown_client(server->client) to kill all pending RPC
>> tasks
>> and wait for them to terminate before continue on to call
>> nfs_put_client.
>> In nfs_put_client, if the refcounf is drecemented to 0 then we call
>> nfs_free_client which calls rpc_shutdown_client(clp->cl_rpcclient) to
>> kill all pending RPC tasks that use nfs_client->cl_rpcclient to send
>> the
>> request.
>>
>> Normally this works fine. However, due to some race conditions, if
>> there are
>> delegation return RPC tasks have not been executed yet when
>> nfs_free_server
>> is called then this can cause system to crash with general protection
>> fault.
>>
>> The conditions that can cause the crash are: (1) there are pending
>> delegation
>> return tasks called from nfs4_state_manager to return idle
>> delegations and
>> (2) the nfs_client's au_flavor is either RPC_AUTH_GSS_KRB5I or
>> RPC_AUTH_GSS_KRB5P
>> and (3) the call to nfs_igrab_and_active, from
>> _nfs4_proc_delegreturn, fails
>> for any reasons and (4) there is a pending RPC task renewing the
>> lease.
>>
>> Since the delegation return tasks were called with 'issync = 0' the
>> refcount on
>> nfs_server were dropped (in nfs_client_return_marked_delegations
>> after RPC task
>> was submited to the RPC layer) and the nfs_igrab_and_active call
>> fails, these
>> RPC tasks do not hold any refcount on the nfs_server.
>>
>> When nfs_free_server is called, rpc_shutdown_client(server->client)
>> fails to
>> kill these delegation return tasks since these tasks using
>> nfs_client->cl_rpcclient
>> to send the requests. When nfs_put_client is called, nfs_free_client
>> is not
>> called because there is a pending lease renew RPC task which uses
>> nfs_client->cl_rpcclient
>> to send the request and also adds a refcount on the nfs_client. This
>> allows
>> the delegation return tasks to stay alive and continue on after the
>> nfs_server
>> was freed.
>>
>> I've seen the NFS client with 5.4 kernel crashes with this stack
>> trace:
>>
>> !# 0 [ffffb93b8fbdbd78] nfs4_setup_sequence [nfsv4] at
>> ffffffffc0f27e40 fs/nfs/nfs4proc.c:1041:0
>>   # 1 [ffffb93b8fbdbdb8] nfs4_delegreturn_prepare [nfsv4] at
>> ffffffffc0f28ad1 fs/nfs/nfs4proc.c:6355:0
>>   # 2 [ffffb93b8fbdbdd8] rpc_prepare_task [sunrpc] at
>> ffffffffc05e33af net/sunrpc/sched.c:821:0
>>   # 3 [ffffb93b8fbdbde8] __rpc_execute [sunrpc] at ffffffffc05eb527
>> net/sunrpc/sched.c:925:0
>>   # 4 [ffffb93b8fbdbe48] rpc_async_schedule [sunrpc] at
>> ffffffffc05eb8e0 net/sunrpc/sched.c:1013:0
>>   # 5 [ffffb93b8fbdbe68] process_one_work at ffffffff92ad4289
>> kernel/workqueue.c:2281:0
>>   # 6 [ffffb93b8fbdbeb0] worker_thread at ffffffff92ad50cf
>> kernel/workqueue.c:2427:0
>>   # 7 [ffffb93b8fbdbf10] kthread at ffffffff92adac05
>> kernel/kthread.c:296:0
>>   # 8 [ffffb93b8fbdbf58] ret_from_fork at ffffffff93600364
>> arch/x86/entry/entry_64.S:355:0
>>
>> Where the params of nfs4_setup_sequence:
>>       client = (struct nfs_client *)0x4d54158ebc6cfc01
>>       args = (struct nfs4_sequence_args *)0xffff998487f85800
>>       res = (struct nfs4_sequence_res *)0xffff998487f85830
>>       task = (struct rpc_task *)0xffff997d41da7d00
>>
>> The 'client' pointer is invalid since it was extracted from d_data-
>>> res.server->nfs_client
>> and the nfs_server was freed.
>>
>> I've reviewed the latest kernel 6.6-rc7, even though there are many
>> changes
>> since 5.4 I could not see any any changes to prevent this scenario to
>> happen
>> so I believe this problem still exists in 6.6-rc7.
>>
>> I'd like to get your opinion on this potential issue with the latest
>> kernel
>> and if the problem still exists then what's the best way to fix it.
>>
> nfs_inode_evict_delegation() should be calling
> nfs_do_return_delegation() with the issync flag set, whereas
> nfs_server_return_marked_delegations() should be holding a reference to
> the inode before it calls nfs_end_delegation_return().
>
> So where is this combination no inode reference + issync=0 originating
> from?

The inode reference was dropped in nfs_server_return_marked_delegations
after returning from nfs_end_delegation_return.

I don't quite understand why do we continue on with the delegation return
in _nfs4_proc_delegreturn after nfs_igrab_and_active fails and issync = 0?

-Dai

>

2023-11-06 18:44:43

by Trond Myklebust

[permalink] [raw]
Subject: Re: General protection fault in nfs4_setup_sequence caused by delegation return task

On Mon, 2023-11-06 at 10:18 -0800, [email protected] wrote:
>
> On 11/6/23 8:30 AM, Trond Myklebust wrote:
> > Hi Dai,
> >
> > On Sun, 2023-11-05 at 11:27 -0800, [email protected] wrote:
> > > Hi Trond,
> > >
> > > When unmonting a NFS export, nfs_free_server is called. In
> > > nfs_free_server,
> > > we call rpc_shutdown_client(server->client) to kill all pending
> > > RPC
> > > tasks
> > > and wait for them to terminate before continue on to call
> > > nfs_put_client.
> > > In nfs_put_client, if the refcounf is drecemented to 0 then we
> > > call
> > > nfs_free_client which calls rpc_shutdown_client(clp-
> > > >cl_rpcclient) to
> > > kill all pending RPC tasks that use nfs_client->cl_rpcclient to
> > > send
> > > the
> > > request.
> > >
> > > Normally this works fine. However, due to some race conditions,
> > > if
> > > there are
> > > delegation return RPC tasks have not been executed yet when
> > > nfs_free_server
> > > is called then this can cause system to crash with general
> > > protection
> > > fault.
> > >
> > > The conditions that can cause the crash are: (1) there are
> > > pending
> > > delegation
> > > return tasks called from nfs4_state_manager to return idle
> > > delegations and
> > > (2) the nfs_client's au_flavor is either RPC_AUTH_GSS_KRB5I or
> > > RPC_AUTH_GSS_KRB5P
> > > and (3) the call to nfs_igrab_and_active, from
> > > _nfs4_proc_delegreturn, fails
> > > for any reasons and (4) there is a pending RPC task renewing the
> > > lease.
> > >
> > > Since the delegation return tasks were called with 'issync = 0'
> > > the
> > > refcount on
> > > nfs_server were dropped (in nfs_client_return_marked_delegations
> > > after RPC task
> > > was submited to the RPC layer) and the nfs_igrab_and_active call
> > > fails, these
> > > RPC tasks do not hold any refcount on the nfs_server.
> > >
> > > When nfs_free_server is called, rpc_shutdown_client(server-
> > > >client)
> > > fails to
> > > kill these delegation return tasks since these tasks using
> > > nfs_client->cl_rpcclient
> > > to send the requests. When nfs_put_client is called,
> > > nfs_free_client
> > > is not
> > > called because there is a pending lease renew RPC task which uses
> > > nfs_client->cl_rpcclient
> > > to send the request and also adds a refcount on the nfs_client.
> > > This
> > > allows
> > > the delegation return tasks to stay alive and continue on after
> > > the
> > > nfs_server
> > > was freed.
> > >
> > > I've seen the NFS client with 5.4 kernel crashes with this stack
> > > trace:
> > >
> > > !# 0 [ffffb93b8fbdbd78] nfs4_setup_sequence [nfsv4] at
> > > ffffffffc0f27e40 fs/nfs/nfs4proc.c:1041:0
> > >    # 1 [ffffb93b8fbdbdb8] nfs4_delegreturn_prepare [nfsv4] at
> > > ffffffffc0f28ad1 fs/nfs/nfs4proc.c:6355:0
> > >    # 2 [ffffb93b8fbdbdd8] rpc_prepare_task [sunrpc] at
> > > ffffffffc05e33af net/sunrpc/sched.c:821:0
> > >    # 3 [ffffb93b8fbdbde8] __rpc_execute [sunrpc] at
> > > ffffffffc05eb527
> > > net/sunrpc/sched.c:925:0
> > >    # 4 [ffffb93b8fbdbe48] rpc_async_schedule [sunrpc] at
> > > ffffffffc05eb8e0 net/sunrpc/sched.c:1013:0
> > >    # 5 [ffffb93b8fbdbe68] process_one_work at ffffffff92ad4289
> > > kernel/workqueue.c:2281:0
> > >    # 6 [ffffb93b8fbdbeb0] worker_thread at ffffffff92ad50cf
> > > kernel/workqueue.c:2427:0
> > >    # 7 [ffffb93b8fbdbf10] kthread at ffffffff92adac05
> > > kernel/kthread.c:296:0
> > >    # 8 [ffffb93b8fbdbf58] ret_from_fork at ffffffff93600364
> > > arch/x86/entry/entry_64.S:355:0
> > >          
> > > Where the params of nfs4_setup_sequence:
> > >        client = (struct nfs_client *)0x4d54158ebc6cfc01
> > >        args = (struct nfs4_sequence_args *)0xffff998487f85800
> > >        res = (struct nfs4_sequence_res *)0xffff998487f85830
> > >        task = (struct rpc_task *)0xffff997d41da7d00
> > >
> > > The 'client' pointer is invalid since it was extracted from
> > > d_data-
> > > > res.server->nfs_client
> > > and the nfs_server was freed.
> > >
> > > I've reviewed the latest kernel 6.6-rc7, even though there are
> > > many
> > > changes
> > > since 5.4 I could not see any any changes to prevent this
> > > scenario to
> > > happen
> > > so I believe this problem still exists in 6.6-rc7.
> > >
> > > I'd like to get your opinion on this potential issue with the
> > > latest
> > > kernel
> > > and if the problem still exists then what's the best way to fix
> > > it.
> > >
> > nfs_inode_evict_delegation() should be calling
> > nfs_do_return_delegation() with the issync flag set, whereas
> > nfs_server_return_marked_delegations() should be holding a
> > reference to
> > the inode before it calls nfs_end_delegation_return().
> >
> > So where is this combination no inode reference + issync=0
> > originating
> > from?
>
> The inode reference was dropped in
> nfs_server_return_marked_delegations
> after returning from nfs_end_delegation_return.
>
> I don't quite understand why do we continue on with the delegation
> return
> in _nfs4_proc_delegreturn after nfs_igrab_and_active fails and issync
> = 0?

I'm not seeing how that case can occur at all.

If nfs_server_return_marked_delegations() is holding a reference to the
superblock and to the inode across the call to
nfs_end_delegation_return(), then it should not be possible for the
call to nfs_igrab_and_active() to fail because none of the reference
counts it tries to bump can already be zero.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2023-11-06 20:12:22

by Dai Ngo

[permalink] [raw]
Subject: Re: General protection fault in nfs4_setup_sequence caused by delegation return task


On 11/6/23 10:44 AM, Trond Myklebust wrote:
> On Mon, 2023-11-06 at 10:18 -0800, [email protected] wrote:
>> On 11/6/23 8:30 AM, Trond Myklebust wrote:
>>> Hi Dai,
>>>
>>> On Sun, 2023-11-05 at 11:27 -0800, [email protected] wrote:
>>>> Hi Trond,
>>>>
>>>> When unmonting a NFS export, nfs_free_server is called. In
>>>> nfs_free_server,
>>>> we call rpc_shutdown_client(server->client) to kill all pending
>>>> RPC
>>>> tasks
>>>> and wait for them to terminate before continue on to call
>>>> nfs_put_client.
>>>> In nfs_put_client, if the refcounf is drecemented to 0 then we
>>>> call
>>>> nfs_free_client which calls rpc_shutdown_client(clp-
>>>>> cl_rpcclient) to
>>>> kill all pending RPC tasks that use nfs_client->cl_rpcclient to
>>>> send
>>>> the
>>>> request.
>>>>
>>>> Normally this works fine. However, due to some race conditions,
>>>> if
>>>> there are
>>>> delegation return RPC tasks have not been executed yet when
>>>> nfs_free_server
>>>> is called then this can cause system to crash with general
>>>> protection
>>>> fault.
>>>>
>>>> The conditions that can cause the crash are: (1) there are
>>>> pending
>>>> delegation
>>>> return tasks called from nfs4_state_manager to return idle
>>>> delegations and
>>>> (2) the nfs_client's au_flavor is either RPC_AUTH_GSS_KRB5I or
>>>> RPC_AUTH_GSS_KRB5P
>>>> and (3) the call to nfs_igrab_and_active, from
>>>> _nfs4_proc_delegreturn, fails
>>>> for any reasons and (4) there is a pending RPC task renewing the
>>>> lease.
>>>>
>>>> Since the delegation return tasks were called with 'issync = 0'
>>>> the
>>>> refcount on
>>>> nfs_server were dropped (in nfs_client_return_marked_delegations
>>>> after RPC task
>>>> was submited to the RPC layer) and the nfs_igrab_and_active call
>>>> fails, these
>>>> RPC tasks do not hold any refcount on the nfs_server.
>>>>
>>>> When nfs_free_server is called, rpc_shutdown_client(server-
>>>>> client)
>>>> fails to
>>>> kill these delegation return tasks since these tasks using
>>>> nfs_client->cl_rpcclient
>>>> to send the requests. When nfs_put_client is called,
>>>> nfs_free_client
>>>> is not
>>>> called because there is a pending lease renew RPC task which uses
>>>> nfs_client->cl_rpcclient
>>>> to send the request and also adds a refcount on the nfs_client.
>>>> This
>>>> allows
>>>> the delegation return tasks to stay alive and continue on after
>>>> the
>>>> nfs_server
>>>> was freed.
>>>>
>>>> I've seen the NFS client with 5.4 kernel crashes with this stack
>>>> trace:
>>>>
>>>> !# 0 [ffffb93b8fbdbd78] nfs4_setup_sequence [nfsv4] at
>>>> ffffffffc0f27e40 fs/nfs/nfs4proc.c:1041:0
>>>>    # 1 [ffffb93b8fbdbdb8] nfs4_delegreturn_prepare [nfsv4] at
>>>> ffffffffc0f28ad1 fs/nfs/nfs4proc.c:6355:0
>>>>    # 2 [ffffb93b8fbdbdd8] rpc_prepare_task [sunrpc] at
>>>> ffffffffc05e33af net/sunrpc/sched.c:821:0
>>>>    # 3 [ffffb93b8fbdbde8] __rpc_execute [sunrpc] at
>>>> ffffffffc05eb527
>>>> net/sunrpc/sched.c:925:0
>>>>    # 4 [ffffb93b8fbdbe48] rpc_async_schedule [sunrpc] at
>>>> ffffffffc05eb8e0 net/sunrpc/sched.c:1013:0
>>>>    # 5 [ffffb93b8fbdbe68] process_one_work at ffffffff92ad4289
>>>> kernel/workqueue.c:2281:0
>>>>    # 6 [ffffb93b8fbdbeb0] worker_thread at ffffffff92ad50cf
>>>> kernel/workqueue.c:2427:0
>>>>    # 7 [ffffb93b8fbdbf10] kthread at ffffffff92adac05
>>>> kernel/kthread.c:296:0
>>>>    # 8 [ffffb93b8fbdbf58] ret_from_fork at ffffffff93600364
>>>> arch/x86/entry/entry_64.S:355:0
>>>>
>>>> Where the params of nfs4_setup_sequence:
>>>>        client = (struct nfs_client *)0x4d54158ebc6cfc01
>>>>        args = (struct nfs4_sequence_args *)0xffff998487f85800
>>>>        res = (struct nfs4_sequence_res *)0xffff998487f85830
>>>>        task = (struct rpc_task *)0xffff997d41da7d00
>>>>
>>>> The 'client' pointer is invalid since it was extracted from
>>>> d_data-
>>>>> res.server->nfs_client
>>>> and the nfs_server was freed.
>>>>
>>>> I've reviewed the latest kernel 6.6-rc7, even though there are
>>>> many
>>>> changes
>>>> since 5.4 I could not see any any changes to prevent this
>>>> scenario to
>>>> happen
>>>> so I believe this problem still exists in 6.6-rc7.
>>>>
>>>> I'd like to get your opinion on this potential issue with the
>>>> latest
>>>> kernel
>>>> and if the problem still exists then what's the best way to fix
>>>> it.
>>>>
>>> nfs_inode_evict_delegation() should be calling
>>> nfs_do_return_delegation() with the issync flag set, whereas
>>> nfs_server_return_marked_delegations() should be holding a
>>> reference to
>>> the inode before it calls nfs_end_delegation_return().
>>>
>>> So where is this combination no inode reference + issync=0
>>> originating
>>> from?
>> The inode reference was dropped in
>> nfs_server_return_marked_delegations
>> after returning from nfs_end_delegation_return.
>>
>> I don't quite understand why do we continue on with the delegation
>> return
>> in _nfs4_proc_delegreturn after nfs_igrab_and_active fails and issync
>> = 0?
> I'm not seeing how that case can occur at all.

Since this crash is almost reproducible with 5.4 under a test setup,
I can add a patch to crash the system when nfs_igrab_and_active fails
and issync = 0.

>
> If nfs_server_return_marked_delegations() is holding a reference to the
> superblock and to the inode across the call to
> nfs_end_delegation_return(), then it should not be possible for the
> call to nfs_igrab_and_active() to fail

Yes, this is where I'm struggle to find out how can nfs_igrab_and_active
fails while we already had a refcount on the nfs_server. There must be
a scenario that causes nfs_igrab_and_active to fail since the
nfs4_delegreturndata->inode is NULL in the core dump.

Is it possible that the igrab in nfs_igrab_and_active fails, the inode
was marked with I_FREEING | I_WILL_FREE?

-Dai

> because none of the reference
> counts it tries to bump can already be zero.
>

2023-11-06 20:22:05

by Trond Myklebust

[permalink] [raw]
Subject: Re: General protection fault in nfs4_setup_sequence caused by delegation return task

On Mon, 2023-11-06 at 12:11 -0800, [email protected] wrote:
>
> On 11/6/23 10:44 AM, Trond Myklebust wrote:
> > On Mon, 2023-11-06 at 10:18 -0800, [email protected] wrote:
> > > On 11/6/23 8:30 AM, Trond Myklebust wrote:
> > > > Hi Dai,
> > > >
> > > > On Sun, 2023-11-05 at 11:27 -0800, [email protected] wrote:
> > > > > Hi Trond,
> > > > >
> > > > > When unmonting a NFS export, nfs_free_server is called. In
> > > > > nfs_free_server,
> > > > > we call rpc_shutdown_client(server->client) to kill all
> > > > > pending
> > > > > RPC
> > > > > tasks
> > > > > and wait for them to terminate before continue on to call
> > > > > nfs_put_client.
> > > > > In nfs_put_client, if the refcounf is drecemented to 0 then
> > > > > we
> > > > > call
> > > > > nfs_free_client which calls rpc_shutdown_client(clp-
> > > > > > cl_rpcclient) to
> > > > > kill all pending RPC tasks that use nfs_client->cl_rpcclient
> > > > > to
> > > > > send
> > > > > the
> > > > > request.
> > > > >
> > > > > Normally this works fine. However, due to some race
> > > > > conditions,
> > > > > if
> > > > > there are
> > > > > delegation return RPC tasks have not been executed yet when
> > > > > nfs_free_server
> > > > > is called then this can cause system to crash with general
> > > > > protection
> > > > > fault.
> > > > >
> > > > > The conditions that can cause the crash are: (1) there are
> > > > > pending
> > > > > delegation
> > > > > return tasks called from nfs4_state_manager to return idle
> > > > > delegations and
> > > > > (2) the nfs_client's au_flavor is either RPC_AUTH_GSS_KRB5I
> > > > > or
> > > > > RPC_AUTH_GSS_KRB5P
> > > > > and (3) the call to nfs_igrab_and_active, from
> > > > > _nfs4_proc_delegreturn, fails
> > > > > for any reasons and (4) there is a pending RPC task renewing
> > > > > the
> > > > > lease.
> > > > >
> > > > > Since the delegation return tasks were called with 'issync =
> > > > > 0'
> > > > > the
> > > > > refcount on
> > > > > nfs_server were dropped (in
> > > > > nfs_client_return_marked_delegations
> > > > > after RPC task
> > > > > was submited to the RPC layer) and the nfs_igrab_and_active
> > > > > call
> > > > > fails, these
> > > > > RPC tasks do not hold any refcount on the nfs_server.
> > > > >
> > > > > When nfs_free_server is called, rpc_shutdown_client(server-
> > > > > > client)
> > > > > fails to
> > > > > kill these delegation return tasks since these tasks using
> > > > > nfs_client->cl_rpcclient
> > > > > to send the requests. When nfs_put_client is called,
> > > > > nfs_free_client
> > > > > is not
> > > > > called because there is a pending lease renew RPC task which
> > > > > uses
> > > > > nfs_client->cl_rpcclient
> > > > > to send the request and also adds a refcount on the
> > > > > nfs_client.
> > > > > This
> > > > > allows
> > > > > the delegation return tasks to stay alive and continue on
> > > > > after
> > > > > the
> > > > > nfs_server
> > > > > was freed.
> > > > >
> > > > > I've seen the NFS client with 5.4 kernel crashes with this
> > > > > stack
> > > > > trace:
> > > > >
> > > > > !# 0 [ffffb93b8fbdbd78] nfs4_setup_sequence [nfsv4] at
> > > > > ffffffffc0f27e40 fs/nfs/nfs4proc.c:1041:0
> > > > >     # 1 [ffffb93b8fbdbdb8] nfs4_delegreturn_prepare [nfsv4]
> > > > > at
> > > > > ffffffffc0f28ad1 fs/nfs/nfs4proc.c:6355:0
> > > > >     # 2 [ffffb93b8fbdbdd8] rpc_prepare_task [sunrpc] at
> > > > > ffffffffc05e33af net/sunrpc/sched.c:821:0
> > > > >     # 3 [ffffb93b8fbdbde8] __rpc_execute [sunrpc] at
> > > > > ffffffffc05eb527
> > > > > net/sunrpc/sched.c:925:0
> > > > >     # 4 [ffffb93b8fbdbe48] rpc_async_schedule [sunrpc] at
> > > > > ffffffffc05eb8e0 net/sunrpc/sched.c:1013:0
> > > > >     # 5 [ffffb93b8fbdbe68] process_one_work at
> > > > > ffffffff92ad4289
> > > > > kernel/workqueue.c:2281:0
> > > > >     # 6 [ffffb93b8fbdbeb0] worker_thread at ffffffff92ad50cf
> > > > > kernel/workqueue.c:2427:0
> > > > >     # 7 [ffffb93b8fbdbf10] kthread at ffffffff92adac05
> > > > > kernel/kthread.c:296:0
> > > > >     # 8 [ffffb93b8fbdbf58] ret_from_fork at ffffffff93600364
> > > > > arch/x86/entry/entry_64.S:355:0
> > > > >           
> > > > > Where the params of nfs4_setup_sequence:
> > > > >         client = (struct nfs_client *)0x4d54158ebc6cfc01
> > > > >         args = (struct nfs4_sequence_args
> > > > > *)0xffff998487f85800
> > > > >         res = (struct nfs4_sequence_res *)0xffff998487f85830
> > > > >         task = (struct rpc_task *)0xffff997d41da7d00
> > > > >
> > > > > The 'client' pointer is invalid since it was extracted from
> > > > > d_data-
> > > > > > res.server->nfs_client
> > > > > and the nfs_server was freed.
> > > > >
> > > > > I've reviewed the latest kernel 6.6-rc7, even though there
> > > > > are
> > > > > many
> > > > > changes
> > > > > since 5.4 I could not see any any changes to prevent this
> > > > > scenario to
> > > > > happen
> > > > > so I believe this problem still exists in 6.6-rc7.
> > > > >
> > > > > I'd like to get your opinion on this potential issue with the
> > > > > latest
> > > > > kernel
> > > > > and if the problem still exists then what's the best way to
> > > > > fix
> > > > > it.
> > > > >
> > > > nfs_inode_evict_delegation() should be calling
> > > > nfs_do_return_delegation() with the issync flag set, whereas
> > > > nfs_server_return_marked_delegations() should be holding a
> > > > reference to
> > > > the inode before it calls nfs_end_delegation_return().
> > > >
> > > > So where is this combination no inode reference + issync=0
> > > > originating
> > > > from?
> > > The inode reference was dropped in
> > > nfs_server_return_marked_delegations
> > > after returning from nfs_end_delegation_return.
> > >
> > > I don't quite understand why do we continue on with the
> > > delegation
> > > return
> > > in _nfs4_proc_delegreturn after nfs_igrab_and_active fails and
> > > issync
> > > = 0?
> > I'm not seeing how that case can occur at all.
>
> Since this crash is almost reproducible with 5.4 under a test setup,
> I can add a patch to crash the system when nfs_igrab_and_active fails
> and issync = 0.
>
> >
> > If nfs_server_return_marked_delegations() is holding a reference to
> > the
> > superblock and to the inode across the call to
> > nfs_end_delegation_return(), then it should not be possible for the
> > call to nfs_igrab_and_active() to fail
>
> Yes, this is where I'm struggle to find out how can
> nfs_igrab_and_active
> fails while we already had a refcount on the nfs_server. There must
> be
> a scenario that causes nfs_igrab_and_active to fail since the
> nfs4_delegreturndata->inode is NULL in the core dump.
>
> Is it possible that the igrab in nfs_igrab_and_active fails, the
> inode
> was marked with I_FREEING | I_WILL_FREE?
>

The inode should never be marked with either I_FREEING or I_WILL_FREE
until the call to iput_final(), which isn't called until the inode
i_count is zero.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]