2023-01-11 09:01:26

by Wang Yugui

[permalink] [raw]
Subject: a dead lock of 'umount.nfs4 /nfs/scratch -l'

Hi,

We noticed a dead lock of 'umount.nfs4 /nfs/scratch -l'

kernel: 6.1.5-rc1

The dmesg output of 'sysrq w'

[13493.955032] sysrq: Show Blocked State
[13493.959997] task:umount.nfs4 state:D stack:0 pid:3542745 ppid:3542744 flags:0x00004000
[13493.969628] Call Trace:
[13493.973003] <TASK>
[13493.976018] __schedule+0x2cb/0x880
[13493.980426] ? __bpf_trace_svc_stats_latency+0x10/0x10 [sunrpc]
[13493.987342] ? rpc_destroy_wait_queue+0x10/0x10 [sunrpc]
[13493.993637] schedule+0x50/0xc0
[13493.997697] rpc_wait_bit_killable+0xd/0x60 [sunrpc]
[13494.003671] __wait_on_bit+0x75/0x90
[13494.008168] out_of_line_wait_on_bit+0x91/0xb0
[13494.013547] ? sched_core_clone_cookie+0x90/0x90
[13494.019101] __rpc_execute+0x14b/0x490 [sunrpc]
[13494.024603] ? kmem_cache_alloc+0x41/0x530
[13494.029610] rpc_execute+0xc5/0x100 [sunrpc]
[13494.034835] rpc_run_task+0x14b/0x1b0 [sunrpc]
[13494.040252] rpc_call_sync+0x50/0xa0 [sunrpc]
[13494.045566] nfs4_proc_destroy_session+0x80/0x100 [nfsv4]
[13494.051926] nfs4_destroy_session+0x24/0x90 [nfsv4]
[13494.057767] nfs41_shutdown_client+0xfd/0x120 [nfsv4]
[13494.063774] nfs4_free_client+0x21/0xb0 [nfsv4]
[13494.069240] nfs_free_server+0x44/0xb0 [nfs]
[13494.074418] nfs_kill_super+0x2b/0x40 [nfs]
[13494.079490] deactivate_locked_super+0x2c/0x70
[13494.084811] cleanup_mnt+0xb8/0x140
[13494.089147] task_work_run+0x6a/0xb0
[13494.093587] exit_to_user_mode_prepare+0x1b9/0x1c0
[13494.099232] syscall_exit_to_user_mode+0x12/0x30
[13494.104717] do_syscall_64+0x67/0x80
[13494.109125] ? syscall_exit_to_user_mode+0x12/0x30
[13494.114799] ? do_syscall_64+0x67/0x80
[13494.119426] ? do_syscall_64+0x67/0x80
[13494.124042] ? do_syscall_64+0x67/0x80
[13494.128649] ? exc_page_fault+0x64/0x140
[13494.133400] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[13494.139306] RIP: 0033:0x7fc32f839e9b
[13494.143726] RSP: 002b:00007ffe670f6018 EFLAGS: 00000202 ORIG_RAX: 00000000000000a6
[13494.152183] RAX: 0000000000000000 RBX: 000055f4aad71920 RCX: 00007fc32f839e9b
[13494.160218] RDX: 0000000000000003 RSI: 0000000000000002 RDI: 000055f4aad72600
[13494.168237] RBP: 0000000000000002 R08: 0000000000000007 R09: 000055f4aad71010
[13494.176277] R10: 00007fc32fbc0bc0 R11: 0000000000000202 R12: 000055f4aad72600
[13494.184313] R13: 00007fc33025f244 R14: 000055f4aad71a30 R15: 000055f4aad71b50
[13494.192334] </TASK>

Best Regards
Wang Yugui ([email protected])
2023/01/11



2023-01-11 09:43:40

by Wang Yugui

[permalink] [raw]
Subject: Re: a dead lock of 'umount.nfs4 /nfs/scratch -l'

Hi,

> Hi,
>
> We noticed a dead lock of 'umount.nfs4 /nfs/scratch -l'

reproducer:

mount /dev/sda1 /mnt/test/
mount /dev/sda2 /mnt/scratch/
systemctl restart nfs-server.service
mount.nfs4 127.0.0.1:/mnt/test/ /nfs/test/
mount.nfs4 127.0.0.1:/mnt/scratch/ /nfs/scratch/
systemctl stop nfs-server.service
umount -l /nfs/scratch #OK
umount -l /nfs/test #dead lock

Best Regards
Wang Yugui ([email protected])
2023/01/11

> kernel: 6.1.5-rc1
>
> The dmesg output of 'sysrq w'
>
> [13493.955032] sysrq: Show Blocked State
> [13493.959997] task:umount.nfs4 state:D stack:0 pid:3542745 ppid:3542744 flags:0x00004000
> [13493.969628] Call Trace:
> [13493.973003] <TASK>
> [13493.976018] __schedule+0x2cb/0x880
> [13493.980426] ? __bpf_trace_svc_stats_latency+0x10/0x10 [sunrpc]
> [13493.987342] ? rpc_destroy_wait_queue+0x10/0x10 [sunrpc]
> [13493.993637] schedule+0x50/0xc0
> [13493.997697] rpc_wait_bit_killable+0xd/0x60 [sunrpc]
> [13494.003671] __wait_on_bit+0x75/0x90
> [13494.008168] out_of_line_wait_on_bit+0x91/0xb0
> [13494.013547] ? sched_core_clone_cookie+0x90/0x90
> [13494.019101] __rpc_execute+0x14b/0x490 [sunrpc]
> [13494.024603] ? kmem_cache_alloc+0x41/0x530
> [13494.029610] rpc_execute+0xc5/0x100 [sunrpc]
> [13494.034835] rpc_run_task+0x14b/0x1b0 [sunrpc]
> [13494.040252] rpc_call_sync+0x50/0xa0 [sunrpc]
> [13494.045566] nfs4_proc_destroy_session+0x80/0x100 [nfsv4]
> [13494.051926] nfs4_destroy_session+0x24/0x90 [nfsv4]
> [13494.057767] nfs41_shutdown_client+0xfd/0x120 [nfsv4]
> [13494.063774] nfs4_free_client+0x21/0xb0 [nfsv4]
> [13494.069240] nfs_free_server+0x44/0xb0 [nfs]
> [13494.074418] nfs_kill_super+0x2b/0x40 [nfs]
> [13494.079490] deactivate_locked_super+0x2c/0x70
> [13494.084811] cleanup_mnt+0xb8/0x140
> [13494.089147] task_work_run+0x6a/0xb0
> [13494.093587] exit_to_user_mode_prepare+0x1b9/0x1c0
> [13494.099232] syscall_exit_to_user_mode+0x12/0x30
> [13494.104717] do_syscall_64+0x67/0x80
> [13494.109125] ? syscall_exit_to_user_mode+0x12/0x30
> [13494.114799] ? do_syscall_64+0x67/0x80
> [13494.119426] ? do_syscall_64+0x67/0x80
> [13494.124042] ? do_syscall_64+0x67/0x80
> [13494.128649] ? exc_page_fault+0x64/0x140
> [13494.133400] entry_SYSCALL_64_after_hwframe+0x63/0xcd
> [13494.139306] RIP: 0033:0x7fc32f839e9b
> [13494.143726] RSP: 002b:00007ffe670f6018 EFLAGS: 00000202 ORIG_RAX: 00000000000000a6
> [13494.152183] RAX: 0000000000000000 RBX: 000055f4aad71920 RCX: 00007fc32f839e9b
> [13494.160218] RDX: 0000000000000003 RSI: 0000000000000002 RDI: 000055f4aad72600
> [13494.168237] RBP: 0000000000000002 R08: 0000000000000007 R09: 000055f4aad71010
> [13494.176277] R10: 00007fc32fbc0bc0 R11: 0000000000000202 R12: 000055f4aad72600
> [13494.184313] R13: 00007fc33025f244 R14: 000055f4aad71a30 R15: 000055f4aad71b50
> [13494.192334] </TASK>
>
> Best Regards
> Wang Yugui ([email protected])
> 2023/01/11
>


2023-01-12 09:46:44

by Wang Yugui

[permalink] [raw]
Subject: Re: a dead lock of 'umount.nfs4 /nfs/scratch -l'

Hi,

> Hi,
>
> > Hi,
> >
> > We noticed a dead lock of 'umount.nfs4 /nfs/scratch -l'
>
> reproducer:
>
> mount /dev/sda1 /mnt/test/
> mount /dev/sda2 /mnt/scratch/
> systemctl restart nfs-server.service
> mount.nfs4 127.0.0.1:/mnt/test/ /nfs/test/
> mount.nfs4 127.0.0.1:/mnt/scratch/ /nfs/scratch/
> systemctl stop nfs-server.service
> umount -l /nfs/scratch #OK
> umount -l /nfs/test #dead lock
>
> Best Regards
> Wang Yugui ([email protected])
> 2023/01/11
>
> > kernel: 6.1.5-rc1

This problem happen on kernel 6.2.0-rc3+(upstream) too.

Best Regards
Wang Yugui ([email protected])
2023/01/12

> >
> > The dmesg output of 'sysrq w'
> >
> > [13493.955032] sysrq: Show Blocked State
> > [13493.959997] task:umount.nfs4 state:D stack:0 pid:3542745 ppid:3542744 flags:0x00004000
> > [13493.969628] Call Trace:
> > [13493.973003] <TASK>
> > [13493.976018] __schedule+0x2cb/0x880
> > [13493.980426] ? __bpf_trace_svc_stats_latency+0x10/0x10 [sunrpc]
> > [13493.987342] ? rpc_destroy_wait_queue+0x10/0x10 [sunrpc]
> > [13493.993637] schedule+0x50/0xc0
> > [13493.997697] rpc_wait_bit_killable+0xd/0x60 [sunrpc]
> > [13494.003671] __wait_on_bit+0x75/0x90
> > [13494.008168] out_of_line_wait_on_bit+0x91/0xb0
> > [13494.013547] ? sched_core_clone_cookie+0x90/0x90
> > [13494.019101] __rpc_execute+0x14b/0x490 [sunrpc]
> > [13494.024603] ? kmem_cache_alloc+0x41/0x530
> > [13494.029610] rpc_execute+0xc5/0x100 [sunrpc]
> > [13494.034835] rpc_run_task+0x14b/0x1b0 [sunrpc]
> > [13494.040252] rpc_call_sync+0x50/0xa0 [sunrpc]
> > [13494.045566] nfs4_proc_destroy_session+0x80/0x100 [nfsv4]
> > [13494.051926] nfs4_destroy_session+0x24/0x90 [nfsv4]
> > [13494.057767] nfs41_shutdown_client+0xfd/0x120 [nfsv4]
> > [13494.063774] nfs4_free_client+0x21/0xb0 [nfsv4]
> > [13494.069240] nfs_free_server+0x44/0xb0 [nfs]
> > [13494.074418] nfs_kill_super+0x2b/0x40 [nfs]
> > [13494.079490] deactivate_locked_super+0x2c/0x70
> > [13494.084811] cleanup_mnt+0xb8/0x140
> > [13494.089147] task_work_run+0x6a/0xb0
> > [13494.093587] exit_to_user_mode_prepare+0x1b9/0x1c0
> > [13494.099232] syscall_exit_to_user_mode+0x12/0x30
> > [13494.104717] do_syscall_64+0x67/0x80
> > [13494.109125] ? syscall_exit_to_user_mode+0x12/0x30
> > [13494.114799] ? do_syscall_64+0x67/0x80
> > [13494.119426] ? do_syscall_64+0x67/0x80
> > [13494.124042] ? do_syscall_64+0x67/0x80
> > [13494.128649] ? exc_page_fault+0x64/0x140
> > [13494.133400] entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > [13494.139306] RIP: 0033:0x7fc32f839e9b
> > [13494.143726] RSP: 002b:00007ffe670f6018 EFLAGS: 00000202 ORIG_RAX: 00000000000000a6
> > [13494.152183] RAX: 0000000000000000 RBX: 000055f4aad71920 RCX: 00007fc32f839e9b
> > [13494.160218] RDX: 0000000000000003 RSI: 0000000000000002 RDI: 000055f4aad72600
> > [13494.168237] RBP: 0000000000000002 R08: 0000000000000007 R09: 000055f4aad71010
> > [13494.176277] R10: 00007fc32fbc0bc0 R11: 0000000000000202 R12: 000055f4aad72600
> > [13494.184313] R13: 00007fc33025f244 R14: 000055f4aad71a30 R15: 000055f4aad71b50
> > [13494.192334] </TASK>
> >
> > Best Regards
> > Wang Yugui ([email protected])
> > 2023/01/11
> >
>


2023-01-13 14:57:59

by Chuck Lever

[permalink] [raw]
Subject: Re: a dead lock of 'umount.nfs4 /nfs/scratch -l'



> On Jan 12, 2023, at 4:30 AM, Wang Yugui <[email protected]> wrote:
>
> Hi,
>
>> Hi,
>>
>>> Hi,
>>>
>>> We noticed a dead lock of 'umount.nfs4 /nfs/scratch -l'
>>
>> reproducer:
>>
>> mount /dev/sda1 /mnt/test/
>> mount /dev/sda2 /mnt/scratch/
>> systemctl restart nfs-server.service
>> mount.nfs4 127.0.0.1:/mnt/test/ /nfs/test/
>> mount.nfs4 127.0.0.1:/mnt/scratch/ /nfs/scratch/
>> systemctl stop nfs-server.service
>> umount -l /nfs/scratch #OK
>> umount -l /nfs/test #dead lock
>>
>> Best Regards
>> Wang Yugui ([email protected])
>> 2023/01/11
>>
>>> kernel: 6.1.5-rc1
>
> This problem happen on kernel 6.2.0-rc3+(upstream) too.

Can you clarify:

- By "deadlock" do you mean the system becomes unresponsive, or that
just the mount is stuck?

- Can you reproduce in a non-loopback scenario: a separate client and
server?


> Best Regards
> Wang Yugui ([email protected])
> 2023/01/12
>
>>>
>>> The dmesg output of 'sysrq w'
>>>
>>> [13493.955032] sysrq: Show Blocked State
>>> [13493.959997] task:umount.nfs4 state:D stack:0 pid:3542745 ppid:3542744 flags:0x00004000
>>> [13493.969628] Call Trace:
>>> [13493.973003] <TASK>
>>> [13493.976018] __schedule+0x2cb/0x880
>>> [13493.980426] ? __bpf_trace_svc_stats_latency+0x10/0x10 [sunrpc]
>>> [13493.987342] ? rpc_destroy_wait_queue+0x10/0x10 [sunrpc]
>>> [13493.993637] schedule+0x50/0xc0
>>> [13493.997697] rpc_wait_bit_killable+0xd/0x60 [sunrpc]
>>> [13494.003671] __wait_on_bit+0x75/0x90
>>> [13494.008168] out_of_line_wait_on_bit+0x91/0xb0
>>> [13494.013547] ? sched_core_clone_cookie+0x90/0x90
>>> [13494.019101] __rpc_execute+0x14b/0x490 [sunrpc]
>>> [13494.024603] ? kmem_cache_alloc+0x41/0x530
>>> [13494.029610] rpc_execute+0xc5/0x100 [sunrpc]
>>> [13494.034835] rpc_run_task+0x14b/0x1b0 [sunrpc]
>>> [13494.040252] rpc_call_sync+0x50/0xa0 [sunrpc]
>>> [13494.045566] nfs4_proc_destroy_session+0x80/0x100 [nfsv4]
>>> [13494.051926] nfs4_destroy_session+0x24/0x90 [nfsv4]
>>> [13494.057767] nfs41_shutdown_client+0xfd/0x120 [nfsv4]
>>> [13494.063774] nfs4_free_client+0x21/0xb0 [nfsv4]
>>> [13494.069240] nfs_free_server+0x44/0xb0 [nfs]
>>> [13494.074418] nfs_kill_super+0x2b/0x40 [nfs]
>>> [13494.079490] deactivate_locked_super+0x2c/0x70
>>> [13494.084811] cleanup_mnt+0xb8/0x140
>>> [13494.089147] task_work_run+0x6a/0xb0
>>> [13494.093587] exit_to_user_mode_prepare+0x1b9/0x1c0
>>> [13494.099232] syscall_exit_to_user_mode+0x12/0x30
>>> [13494.104717] do_syscall_64+0x67/0x80
>>> [13494.109125] ? syscall_exit_to_user_mode+0x12/0x30
>>> [13494.114799] ? do_syscall_64+0x67/0x80
>>> [13494.119426] ? do_syscall_64+0x67/0x80
>>> [13494.124042] ? do_syscall_64+0x67/0x80
>>> [13494.128649] ? exc_page_fault+0x64/0x140
>>> [13494.133400] entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>> [13494.139306] RIP: 0033:0x7fc32f839e9b
>>> [13494.143726] RSP: 002b:00007ffe670f6018 EFLAGS: 00000202 ORIG_RAX: 00000000000000a6
>>> [13494.152183] RAX: 0000000000000000 RBX: 000055f4aad71920 RCX: 00007fc32f839e9b
>>> [13494.160218] RDX: 0000000000000003 RSI: 0000000000000002 RDI: 000055f4aad72600
>>> [13494.168237] RBP: 0000000000000002 R08: 0000000000000007 R09: 000055f4aad71010
>>> [13494.176277] R10: 00007fc32fbc0bc0 R11: 0000000000000202 R12: 000055f4aad72600
>>> [13494.184313] R13: 00007fc33025f244 R14: 000055f4aad71a30 R15: 000055f4aad71b50
>>> [13494.192334] </TASK>
>>>
>>> Best Regards
>>> Wang Yugui ([email protected])
>>> 2023/01/11

--
Chuck Lever



2023-01-13 15:51:33

by Trond Myklebust

[permalink] [raw]
Subject: Re: a dead lock of 'umount.nfs4 /nfs/scratch -l'


> On Jan 13, 2023, at 09:41, Chuck Lever III <[email protected]> wrote:
>
>
>
>> On Jan 12, 2023, at 4:30 AM, Wang Yugui <[email protected]> wrote:
>>
>> Hi,
>>
>>> Hi,
>>>
>>>> Hi,
>>>>
>>>> We noticed a dead lock of 'umount.nfs4 /nfs/scratch -l'
>>>
>>> reproducer:
>>>
>>> mount /dev/sda1 /mnt/test/
>>> mount /dev/sda2 /mnt/scratch/
>>> systemctl restart nfs-server.service
>>> mount.nfs4 127.0.0.1:/mnt/test/ /nfs/test/
>>> mount.nfs4 127.0.0.1:/mnt/scratch/ /nfs/scratch/
>>> systemctl stop nfs-server.service
>>> umount -l /nfs/scratch #OK
>>> umount -l /nfs/test #dead lock
>>>
>>> Best Regards
>>> Wang Yugui ([email protected])
>>> 2023/01/11
>>>
>>>> kernel: 6.1.5-rc1
>>
>> This problem happen on kernel 6.2.0-rc3+(upstream) too.
>
> Can you clarify:
>
> - By "deadlock" do you mean the system becomes unresponsive, or that
> just the mount is stuck?
>
> - Can you reproduce in a non-loopback scenario: a separate client and
> server?
>

I’m not seeing how the use of the ‘-l’ flag is at all relevant here. The exact same thing will happen if you don’t use ‘-l’. All the latter does is hide the fact that it is happening from user space.

As far as I’m concerned, this is pretty much expected behaviour when you turn off the server before unmounting. It means that the client can’t flush any remaining dirty data to the server and it can’t clean up state. So just don’t do that?

_________________________________
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]

2023-01-13 17:12:47

by Wang Yugui

[permalink] [raw]
Subject: Re: a dead lock of 'umount.nfs4 /nfs/scratch -l'

Hi,

> > On Jan 12, 2023, at 4:30 AM, Wang Yugui <[email protected]> wrote:
> >
> > Hi,
> >
> >> Hi,
> >>
> >>> Hi,
> >>>
> >>> We noticed a dead lock of 'umount.nfs4 /nfs/scratch -l'
> >>
> >> reproducer:
> >>
> >> mount /dev/sda1 /mnt/test/
> >> mount /dev/sda2 /mnt/scratch/
> >> systemctl restart nfs-server.service
> >> mount.nfs4 127.0.0.1:/mnt/test/ /nfs/test/
> >> mount.nfs4 127.0.0.1:/mnt/scratch/ /nfs/scratch/
> >> systemctl stop nfs-server.service
> >> umount -l /nfs/scratch #OK
> >> umount -l /nfs/test #dead lock
> >>
> >> Best Regards
> >> Wang Yugui ([email protected])
> >> 2023/01/11
> >>
> >>> kernel: 6.1.5-rc1
> >
> > This problem happen on kernel 6.2.0-rc3+(upstream) too.
>
> Can you clarify:
>
> - By "deadlock" do you mean the system becomes unresponsive, or that
> just the mount is stuck?

Just the 'mount -l' is stuck.
'Ctrl +C' can stop the 'mount -l', and then the mount point disapear.

> - Can you reproduce in a non-loopback scenario: a separate client and
> server?

Yes. It happened on separate nfs client and server too.

tested kernel version: 5.15.85, 6.1.5, 6.2.0-rc3+(upstream)

Best Regards
Wang Yugui ([email protected])
2023/01/14

2023-01-13 17:12:55

by Wang Yugui

[permalink] [raw]
Subject: Re: a dead lock of 'umount.nfs4 /nfs/scratch -l'

Hi,

>
> > On Jan 13, 2023, at 09:41, Chuck Lever III <[email protected]> wrote:
> >
> >
> >
> >> On Jan 12, 2023, at 4:30 AM, Wang Yugui <[email protected]> wrote:
> >>
> >> Hi,
> >>
> >>> Hi,
> >>>
> >>>> Hi,
> >>>>
> >>>> We noticed a dead lock of 'umount.nfs4 /nfs/scratch -l'
> >>>
> >>> reproducer:
> >>>
> >>> mount /dev/sda1 /mnt/test/
> >>> mount /dev/sda2 /mnt/scratch/
> >>> systemctl restart nfs-server.service
> >>> mount.nfs4 127.0.0.1:/mnt/test/ /nfs/test/
> >>> mount.nfs4 127.0.0.1:/mnt/scratch/ /nfs/scratch/
> >>> systemctl stop nfs-server.service
> >>> umount -l /nfs/scratch #OK
> >>> umount -l /nfs/test #dead lock
> >>>
> >>> Best Regards
> >>> Wang Yugui ([email protected])
> >>> 2023/01/11
> >>>
> >>>> kernel: 6.1.5-rc1
> >>
> >> This problem happen on kernel 6.2.0-rc3+(upstream) too.
> >
> > Can you clarify:
> >
> > - By "deadlock" do you mean the system becomes unresponsive, or that
> > just the mount is stuck?
> >
> > - Can you reproduce in a non-loopback scenario: a separate client and
> > server?
> >
>
> I??m not seeing how the use of the ??-l?? flag is at all relevant here. The exact same thing will happen if you don??t use ??-l??. All the latter does is hide the fact that it is happening from user space.
>
> As far as I??m concerned, this is pretty much expected behaviour when you turn off the server before unmounting. It means that the client can??t flush any remaining dirty data to the server and it can??t clean up state. So just don??t do that?

In the case, 'df -h' will fail to work without the 'umount -l'.

so I thought we should make 'umount -l' to works.

Best Regards
Wang Yugui ([email protected])
2023/01/14

2023-01-13 17:46:33

by Trond Myklebust

[permalink] [raw]
Subject: Re: a dead lock of 'umount.nfs4 /nfs/scratch -l'

On Sat, 2023-01-14 at 01:06 +0800, Wang Yugui wrote:
> Hi,
>
> >
> > > On Jan 13, 2023, at 09:41, Chuck Lever III
> > > <[email protected]> wrote:
> > >
> > >
> > >
> > > > On Jan 12, 2023, at 4:30 AM, Wang Yugui
> > > > <[email protected]> wrote:
> > > >
> > > > Hi,
> > > >
> > > > > Hi,
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > We noticed a dead lock of 'umount.nfs4 /nfs/scratch -l'
> > > > >
> > > > > reproducer:
> > > > >
> > > > > mount /dev/sda1 /mnt/test/
> > > > > mount /dev/sda2 /mnt/scratch/
> > > > > systemctl restart nfs-server.service
> > > > > mount.nfs4 127.0.0.1:/mnt/test/ /nfs/test/
> > > > > mount.nfs4 127.0.0.1:/mnt/scratch/ /nfs/scratch/
> > > > > systemctl stop nfs-server.service
> > > > > umount -l /nfs/scratch #OK
> > > > > umount -l /nfs/test #dead lock
> > > > >
> > > > > Best Regards
> > > > > Wang Yugui ([email protected])
> > > > > 2023/01/11
> > > > >
> > > > > > kernel: 6.1.5-rc1
> > > >
> > > > This problem happen on kernel 6.2.0-rc3+(upstream) too.
> > >
> > > Can you clarify:
> > >
> > > - By "deadlock" do you mean the system becomes unresponsive, or
> > > that
> > >  just the mount is stuck?
> > >
> > > - Can you reproduce in a non-loopback scenario: a separate client
> > > and
> > >  server?
> > >
> >
> > I’m not seeing how the use of the ‘-l’ flag is at all relevant
> > here. The exact same thing will happen if you don’t use ‘-l’. All
> > the latter does is hide the fact that it is happening from user
> > space.
> >
> > As far as I’m concerned, this is pretty much expected behaviour
> > when you turn off the server before unmounting. It means that the
> > client can’t flush any remaining dirty data to the server and it
> > can’t clean up state. So just don’t do that?
>
> In the case, 'df -h' will fail to work without the 'umount -l'.
>
> so I thought we should make 'umount -l' to works.
>

The NFS filesystem doesn't know or care about the flags you use to call
the umount() system call. That's all handled by the VFS.
All NFS knows is that the VFS told it to clean up the super block
because it is no longer in use.

The calls to nfs4_proc_destroy_session() and nfs4_destroy_clientid()
will both eventually time out and allow the unmount to complete. So it
is not as if this is a permanent hang that forces you to reboot.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]