2024-04-09 11:28:27

by Rickard x Andersson

[permalink] [raw]
Subject: [PATCH v2 0/1] smb client hang

From: Rickard x Andersson <[email protected]>

When applied to Linx 6.8.2 this patch makes one of our test cases work.
I sent a first version of this patch for Linux 6.1 in October without
getting any feedback. I have now tested and the error is still
existing in the latest kernel versions.

Rickard x Andersson (1):
smb: client: Fix hang in smb2_reconnect

fs/smb/client/transport.c | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)

--
2.30.2



2024-04-09 11:28:41

by Rickard x Andersson

[permalink] [raw]
Subject: [PATCH v2 1/1] smb: client: Fix hang in smb2_reconnect

From: Rickard x Andersson <[email protected]>

Test case:
mount -t cifs //192.168.0.1/test y -o
port=19999,ro,vers=2.1,sec=none,echo_interval=1
kill smbd with SIGSTOP
umount /tmp/y

Gives the following error:
INFO: task umount:466 blocked for more than 122 seconds.
Not tainted 6.8.2-axis9-devel #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
task:umount state:D stack:0 pid:466 tgid:466 ppid:464 flags:0x00000004
__schedule from schedule+0x34/0x108
schedule from schedule_preempt_disabled+0x24/0x34
schedule_preempt_disabled from __mutex_lock.constprop.0+0x330/0x8b0
__mutex_lock.constprop.0 from smb2_reconnect+0x278/0x8fc [cifs]
smb2_reconnect [cifs] from SMB2_open_init+0x54/0x9f4 [cifs]
SMB2_open_init [cifs] from smb2_query_info_compound+0x1a0/0x500[cifs]
smb2_query_info_compound [cifs] from smb2_queryfs+0x64/0x134 [cifs]
smb2_queryfs [cifs] from cifs_statfs+0xc8/0x318 [cifs]
cifs_statfs [cifs] from statfs_by_dentry+0x60/0x84
statfs_by_dentry from fd_statfs+0x30/0x74
fd_statfs from sys_fstatfs64+0x40/0x6c
sys_fstatfs64 from ret_fast_syscall+0x0/0x54

The umount task is blocked waiting on the session mutex. The reason it
never gets the session mutex is because 'kworker/0:3' is holding the
mutex and is waiting for response (see line 1209 in
fs/smb/client/smb2pdu.c.

Stack trace of 'kworker/0:3' just before calling wait_for_response:
CPU: 0 PID: 220 Comm: kworker/0:3 Not tainted 6.8.2-axis9-devel #1
Hardware name: Freescale i.MX6 SoloX (Device Tree)
Workqueue: cifsiod smb2_reconnect_server [cifs]
unwind_backtrace from show_stack+0x18/0x1c
show_stack from dump_stack_lvl+0x24/0x2c
dump_stack_lvl from compound_send_recv+0x7bc/0xac8 [cifs]
compound_send_recv [cifs] from cifs_send_recv+0x34/0x3c [cifs]
cifs_send_recv [cifs] from SMB2_negotiate+0x410/0x13dc [cifs]
SMB2_negotiate [cifs] from smb2_negotiate+0x4c/0x58 [cifs]
smb2_negotiate [cifs] from cifs_negotiate_protocol+0x9c/0x100 [cifs]
cifs_negotiate_protocol [cifs] from smb2_reconnect+0x418/0x8fc [cifs]
smb2_reconnect [cifs] from smb2_reconnect_server+0x1dc/0x514 [cifs]
smb2_reconnect_server [cifs] from process_one_work+0x188/0x3ec
process_one_work from worker_thread+0x1fc/0x430
worker_thread from kthread+0x110/0x130
kthread from ret_from_fork+0x14/0x28

Change-Id: I53439ffb007c9c51d77ce40fb655a34e5ca291ec
Signed-off-by: Rickard x Andersson <[email protected]>
---
fs/smb/client/transport.c | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/fs/smb/client/transport.c b/fs/smb/client/transport.c
index 994d70193432..96b8f8757ddc 100644
--- a/fs/smb/client/transport.c
+++ b/fs/smb/client/transport.c
@@ -32,6 +32,8 @@
/* Max number of iovectors we can use off the stack when sending requests. */
#define CIFS_MAX_IOV_SIZE 8

+#define RESPONSE_TIMEOUT_SECS 50
+
void
cifs_wake_up_task(struct mid_q_entry *mid)
{
@@ -735,13 +737,14 @@ static int allocate_mid(struct cifs_ses *ses, struct smb_hdr *in_buf,
static int
wait_for_response(struct TCP_Server_Info *server, struct mid_q_entry *midQ)
{
- int error;
+ int ret;
+
+ ret = wait_event_killable_timeout(server->response_q,
+ midQ->mid_state != MID_REQUEST_SUBMITTED &&
+ midQ->mid_state != MID_RESPONSE_RECEIVED,
+ RESPONSE_TIMEOUT_SECS*HZ);

- error = wait_event_state(server->response_q,
- midQ->mid_state != MID_REQUEST_SUBMITTED &&
- midQ->mid_state != MID_RESPONSE_RECEIVED,
- (TASK_KILLABLE|TASK_FREEZABLE_UNSAFE));
- if (error < 0)
+ if ((ret < 0) || (ret == 0))
return -ERESTARTSYS;

return 0;
--
2.30.2


2024-04-09 17:20:05

by Steve French

[permalink] [raw]
Subject: Re: [PATCH v2 1/1] smb: client: Fix hang in smb2_reconnect

I tried this on 6.8-rc7 and current 6.9-rc3 and it didn't fail on either.

I tried (to Samba, and using the mount parms you suggested for
cifs.ko) with both "killall -s SIGSTOP smbd" and also with "killall
smbd" - but unmount worked fine in both cases.

Do you know if this fails to ksmbd or Samba or ...? Other ideas for
how to repro this?

On Tue, Apr 9, 2024 at 6:29 AM Rickard Andersson <[email protected]> wrote:
>
> From: Rickard x Andersson <[email protected]>
>
> Test case:
> mount -t cifs //192.168.0.1/test y -o
> port=19999,ro,vers=2.1,sec=none,echo_interval=1
> kill smbd with SIGSTOP
> umount /tmp/y
>
> Gives the following error:
> INFO: task umount:466 blocked for more than 122 seconds.
> Not tainted 6.8.2-axis9-devel #1
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
> message.
> task:umount state:D stack:0 pid:466 tgid:466 ppid:464 flags:0x00000004
> __schedule from schedule+0x34/0x108
> schedule from schedule_preempt_disabled+0x24/0x34
> schedule_preempt_disabled from __mutex_lock.constprop.0+0x330/0x8b0
> __mutex_lock.constprop.0 from smb2_reconnect+0x278/0x8fc [cifs]
> smb2_reconnect [cifs] from SMB2_open_init+0x54/0x9f4 [cifs]
> SMB2_open_init [cifs] from smb2_query_info_compound+0x1a0/0x500[cifs]
> smb2_query_info_compound [cifs] from smb2_queryfs+0x64/0x134 [cifs]
> smb2_queryfs [cifs] from cifs_statfs+0xc8/0x318 [cifs]
> cifs_statfs [cifs] from statfs_by_dentry+0x60/0x84
> statfs_by_dentry from fd_statfs+0x30/0x74
> fd_statfs from sys_fstatfs64+0x40/0x6c
> sys_fstatfs64 from ret_fast_syscall+0x0/0x54
>
> The umount task is blocked waiting on the session mutex. The reason it
> never gets the session mutex is because 'kworker/0:3' is holding the
> mutex and is waiting for response (see line 1209 in
> fs/smb/client/smb2pdu.c.
>
> Stack trace of 'kworker/0:3' just before calling wait_for_response:
> CPU: 0 PID: 220 Comm: kworker/0:3 Not tainted 6.8.2-axis9-devel #1
> Hardware name: Freescale i.MX6 SoloX (Device Tree)
> Workqueue: cifsiod smb2_reconnect_server [cifs]
> unwind_backtrace from show_stack+0x18/0x1c
> show_stack from dump_stack_lvl+0x24/0x2c
> dump_stack_lvl from compound_send_recv+0x7bc/0xac8 [cifs]
> compound_send_recv [cifs] from cifs_send_recv+0x34/0x3c [cifs]
> cifs_send_recv [cifs] from SMB2_negotiate+0x410/0x13dc [cifs]
> SMB2_negotiate [cifs] from smb2_negotiate+0x4c/0x58 [cifs]
> smb2_negotiate [cifs] from cifs_negotiate_protocol+0x9c/0x100 [cifs]
> cifs_negotiate_protocol [cifs] from smb2_reconnect+0x418/0x8fc [cifs]
> smb2_reconnect [cifs] from smb2_reconnect_server+0x1dc/0x514 [cifs]
> smb2_reconnect_server [cifs] from process_one_work+0x188/0x3ec
> process_one_work from worker_thread+0x1fc/0x430
> worker_thread from kthread+0x110/0x130
> kthread from ret_from_fork+0x14/0x28
>
> Change-Id: I53439ffb007c9c51d77ce40fb655a34e5ca291ec
> Signed-off-by: Rickard x Andersson <[email protected]>
> ---
> fs/smb/client/transport.c | 15 +++++++++------
> 1 file changed, 9 insertions(+), 6 deletions(-)
>
> diff --git a/fs/smb/client/transport.c b/fs/smb/client/transport.c
> index 994d70193432..96b8f8757ddc 100644
> --- a/fs/smb/client/transport.c
> +++ b/fs/smb/client/transport.c
> @@ -32,6 +32,8 @@
> /* Max number of iovectors we can use off the stack when sending requests. */
> #define CIFS_MAX_IOV_SIZE 8
>
> +#define RESPONSE_TIMEOUT_SECS 50
> +
> void
> cifs_wake_up_task(struct mid_q_entry *mid)
> {
> @@ -735,13 +737,14 @@ static int allocate_mid(struct cifs_ses *ses, struct smb_hdr *in_buf,
> static int
> wait_for_response(struct TCP_Server_Info *server, struct mid_q_entry *midQ)
> {
> - int error;
> + int ret;
> +
> + ret = wait_event_killable_timeout(server->response_q,
> + midQ->mid_state != MID_REQUEST_SUBMITTED &&
> + midQ->mid_state != MID_RESPONSE_RECEIVED,
> + RESPONSE_TIMEOUT_SECS*HZ);
>
> - error = wait_event_state(server->response_q,
> - midQ->mid_state != MID_REQUEST_SUBMITTED &&
> - midQ->mid_state != MID_RESPONSE_RECEIVED,
> - (TASK_KILLABLE|TASK_FREEZABLE_UNSAFE));
> - if (error < 0)
> + if ((ret < 0) || (ret == 0))
> return -ERESTARTSYS;
>
> return 0;
> --
> 2.30.2
>
>


--
Thanks,

Steve

2024-04-13 09:03:02

by Steve French

[permalink] [raw]
Subject: Re: [PATCH v2 1/1] smb: client: Fix hang in smb2_reconnect

Shyam and I tried some experiments and there are two cases where the
patch breaks:
1) ChangeNotify will time out
2) Certainly byte range lock calls (they can be allowed to block) will timeout

An obvious alternative would be to not make this change for the
commands like ChangeNotify and blocking locks but allow it for the
others.

On Tue, Apr 9, 2024 at 6:29 AM Rickard Andersson <[email protected]> wrote:
>
> From: Rickard x Andersson <[email protected]>
>
> Test case:
> mount -t cifs //192.168.0.1/test y -o
> port=19999,ro,vers=2.1,sec=none,echo_interval=1
> kill smbd with SIGSTOP
> umount /tmp/y
>
> Gives the following error:
> INFO: task umount:466 blocked for more than 122 seconds.
> Not tainted 6.8.2-axis9-devel #1
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
> message.
> task:umount state:D stack:0 pid:466 tgid:466 ppid:464 flags:0x00000004
> __schedule from schedule+0x34/0x108
> schedule from schedule_preempt_disabled+0x24/0x34
> schedule_preempt_disabled from __mutex_lock.constprop.0+0x330/0x8b0
> __mutex_lock.constprop.0 from smb2_reconnect+0x278/0x8fc [cifs]
> smb2_reconnect [cifs] from SMB2_open_init+0x54/0x9f4 [cifs]
> SMB2_open_init [cifs] from smb2_query_info_compound+0x1a0/0x500[cifs]
> smb2_query_info_compound [cifs] from smb2_queryfs+0x64/0x134 [cifs]
> smb2_queryfs [cifs] from cifs_statfs+0xc8/0x318 [cifs]
> cifs_statfs [cifs] from statfs_by_dentry+0x60/0x84
> statfs_by_dentry from fd_statfs+0x30/0x74
> fd_statfs from sys_fstatfs64+0x40/0x6c
> sys_fstatfs64 from ret_fast_syscall+0x0/0x54
>
> The umount task is blocked waiting on the session mutex. The reason it
> never gets the session mutex is because 'kworker/0:3' is holding the
> mutex and is waiting for response (see line 1209 in
> fs/smb/client/smb2pdu.c.
>
> Stack trace of 'kworker/0:3' just before calling wait_for_response:
> CPU: 0 PID: 220 Comm: kworker/0:3 Not tainted 6.8.2-axis9-devel #1
> Hardware name: Freescale i.MX6 SoloX (Device Tree)
> Workqueue: cifsiod smb2_reconnect_server [cifs]
> unwind_backtrace from show_stack+0x18/0x1c
> show_stack from dump_stack_lvl+0x24/0x2c
> dump_stack_lvl from compound_send_recv+0x7bc/0xac8 [cifs]
> compound_send_recv [cifs] from cifs_send_recv+0x34/0x3c [cifs]
> cifs_send_recv [cifs] from SMB2_negotiate+0x410/0x13dc [cifs]
> SMB2_negotiate [cifs] from smb2_negotiate+0x4c/0x58 [cifs]
> smb2_negotiate [cifs] from cifs_negotiate_protocol+0x9c/0x100 [cifs]
> cifs_negotiate_protocol [cifs] from smb2_reconnect+0x418/0x8fc [cifs]
> smb2_reconnect [cifs] from smb2_reconnect_server+0x1dc/0x514 [cifs]
> smb2_reconnect_server [cifs] from process_one_work+0x188/0x3ec
> process_one_work from worker_thread+0x1fc/0x430
> worker_thread from kthread+0x110/0x130
> kthread from ret_from_fork+0x14/0x28
>
> Change-Id: I53439ffb007c9c51d77ce40fb655a34e5ca291ec
> Signed-off-by: Rickard x Andersson <[email protected]>
> ---
> fs/smb/client/transport.c | 15 +++++++++------
> 1 file changed, 9 insertions(+), 6 deletions(-)
>
> diff --git a/fs/smb/client/transport.c b/fs/smb/client/transport.c
> index 994d70193432..96b8f8757ddc 100644
> --- a/fs/smb/client/transport.c
> +++ b/fs/smb/client/transport.c
> @@ -32,6 +32,8 @@
> /* Max number of iovectors we can use off the stack when sending requests. */
> #define CIFS_MAX_IOV_SIZE 8
>
> +#define RESPONSE_TIMEOUT_SECS 50
> +
> void
> cifs_wake_up_task(struct mid_q_entry *mid)
> {
> @@ -735,13 +737,14 @@ static int allocate_mid(struct cifs_ses *ses, struct smb_hdr *in_buf,
> static int
> wait_for_response(struct TCP_Server_Info *server, struct mid_q_entry *midQ)
> {
> - int error;
> + int ret;
> +
> + ret = wait_event_killable_timeout(server->response_q,
> + midQ->mid_state != MID_REQUEST_SUBMITTED &&
> + midQ->mid_state != MID_RESPONSE_RECEIVED,
> + RESPONSE_TIMEOUT_SECS*HZ);
>
> - error = wait_event_state(server->response_q,
> - midQ->mid_state != MID_REQUEST_SUBMITTED &&
> - midQ->mid_state != MID_RESPONSE_RECEIVED,
> - (TASK_KILLABLE|TASK_FREEZABLE_UNSAFE));
> - if (error < 0)
> + if ((ret < 0) || (ret == 0))
> return -ERESTARTSYS;
>
> return 0;
> --
> 2.30.2
>
>


--
Thanks,

Steve

2024-04-15 12:06:00

by Rickard x Andersson

[permalink] [raw]
Subject: Re: [PATCH v2 1/1] smb: client: Fix hang in smb2_reconnect

On 4/13/24 21:40, Steve French wrote:
> Is there any way to verify if the same problem can be reproduced with
> 6.9-rc3 or later


I have now reproduced the problem also on 6.9.0-rc3:


INFO: task umount:511 blocked for more than 614 seconds.
Not tainted 6.9.0-rc3-axis9-devel #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:umount state:D stack:0 pid:511 tgid:511 ppid:509
flags:0x00000004
Call trace:
__schedule from schedule+0x34/0x14c
schedule from schedule_preempt_disabled+0x24/0x34
schedule_preempt_disabled from __mutex_lock.constprop.0+0x330/0x8b0
__mutex_lock.constprop.0 from smb2_reconnect+0x114/0x8fc [cifs]
smb2_reconnect [cifs] from SMB2_open_init+0x58/0xa5c [cifs]
SMB2_open_init [cifs] from smb2_query_info_compound+0x1a4/0x50c [cifs]
smb2_query_info_compound [cifs] from smb2_queryfs+0x64/0x134 [cifs]
smb2_queryfs [cifs] from cifs_statfs+0xc8/0x318 [cifs]
cifs_statfs [cifs] from statfs_by_dentry+0x60/0x84
statfs_by_dentry from fd_statfs+0x30/0x74
fd_statfs from sys_fstatfs64+0x40/0x6c
sys_fstatfs64 from ret_fast_syscall+0x0/0x54
Exception stack(0xc749ffa8 to 0xc749fff0)
ffa0: 01716218 00000000 00000003 00000058 beb62b90
00000000
ffc0: 01716218 00000000 00000003 0000010b b6fbd138 01716448 017162a4
beb62b90
ffe0: 0000010b beb62ad4 b6f0861f b6e87866


Best regards,
Rickard


2024-04-15 14:38:10

by Rickard x Andersson

[permalink] [raw]
Subject: Re: [PATCH v2 1/1] smb: client: Fix hang in smb2_reconnect

On 4/13/24 11:02, Steve French wrote:
> Shyam and I tried some experiments and there are two cases where the
> patch breaks:
> 1) ChangeNotify will time out
> 2) Certainly byte range lock calls (they can be allowed to block) will timeout
>
> An obvious alternative would be to not make this change for the
> commands like ChangeNotify and blocking locks but allow it for the
> others.
>

Would it make sense to make the patch less intrusive by doing something
like the following?:

@@ -1209,7 +1216,12 @@ compound_send_recv(const unsigned int xid, struct
cifs_ses *ses,
spin_unlock(&ses->ses_lock);

for (i = 0; i < num_rqst; i++) {
- rc = wait_for_response(server, midQ[i]);
+ if (flags == CIFS_NEG_OP) {

+ rc = wait_for_response(server, midQ[i], 50*1000);
+ }
+ else
+ rc = wait_for_response(server, midQ[i], -1);
if (rc != 0)
break;



So, we are just waiting with timeout in the case it is a "CIFS_NEG_OP".

Note that I am not familiar at all with this code base.

Best regards,
Rickard