A crash was found when dumping SMC-D connections. It can be reproduced
by following steps:
- run nginx/wrk test:
smc_run nginx
smc_run wrk -t 16 -c 1000 -d <duration> -H 'Connection: Close' <URL>
- continuously dump SMC-D connections in parallel:
watch -n 1 'smcss -D'
BUG: kernel NULL pointer dereference, address: 0000000000000030
CPU: 2 PID: 7204 Comm: smcss Kdump: loaded Tainted: G E 6.7.0+ #55
RIP: 0010:__smc_diag_dump.constprop.0+0x5e5/0x620 [smc_diag]
Call Trace:
<TASK>
? __die+0x24/0x70
? page_fault_oops+0x66/0x150
? exc_page_fault+0x69/0x140
? asm_exc_page_fault+0x26/0x30
? __smc_diag_dump.constprop.0+0x5e5/0x620 [smc_diag]
? __kmalloc_node_track_caller+0x35d/0x430
? __alloc_skb+0x77/0x170
smc_diag_dump_proto+0xd0/0xf0 [smc_diag]
smc_diag_dump+0x26/0x60 [smc_diag]
netlink_dump+0x19f/0x320
__netlink_dump_start+0x1dc/0x300
smc_diag_handler_dump+0x6a/0x80 [smc_diag]
? __pfx_smc_diag_dump+0x10/0x10 [smc_diag]
sock_diag_rcv_msg+0x121/0x140
? __pfx_sock_diag_rcv_msg+0x10/0x10
netlink_rcv_skb+0x5a/0x110
sock_diag_rcv+0x28/0x40
netlink_unicast+0x22a/0x330
netlink_sendmsg+0x1f8/0x420
__sock_sendmsg+0xb0/0xc0
____sys_sendmsg+0x24e/0x300
? copy_msghdr_from_user+0x62/0x80
___sys_sendmsg+0x7c/0xd0
? __do_fault+0x34/0x160
? do_read_fault+0x5f/0x100
? do_fault+0xb0/0x110
? __handle_mm_fault+0x2b0/0x6c0
__sys_sendmsg+0x4d/0x80
do_syscall_64+0x69/0x180
entry_SYSCALL_64_after_hwframe+0x6e/0x76
It is possible that the connection is in process of being established
when we dump it. Assumed that the connection has been registered in a
link group by smc_conn_create() but the rmb_desc has not yet been
initialized by smc_buf_create(), thus causing the illegal access to
conn->rmb_desc. So fix it by checking before dump.
Fixes: 4b1b7d3b30a6 ("net/smc: add SMC-D diag support")
Signed-off-by: Wen Gu <[email protected]>
---
v2->v1: corrected the commit in Fixes tag.
(https://lore.kernel.org/netdev/[email protected]/)
net/smc/smc_diag.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/smc/smc_diag.c b/net/smc/smc_diag.c
index 52f7c4f1e767..5a33908015f3 100644
--- a/net/smc/smc_diag.c
+++ b/net/smc/smc_diag.c
@@ -164,7 +164,7 @@ static int __smc_diag_dump(struct sock *sk, struct sk_buff *skb,
}
if (smc_conn_lgr_valid(&smc->conn) && smc->conn.lgr->is_smcd &&
(req->diag_ext & (1 << (SMC_DIAG_DMBINFO - 1))) &&
- !list_empty(&smc->conn.lgr->list)) {
+ !list_empty(&smc->conn.lgr->list) && smc->conn.rmb_desc) {
struct smc_connection *conn = &smc->conn;
struct smcd_diag_dmbinfo dinfo;
struct smcd_dev *smcd = conn->lgr->smcd;
--
2.32.0.3.g01195cf9f
On Thu, Jan 18, 2024 at 12:32:10PM +0800, Wen Gu wrote:
>A crash was found when dumping SMC-D connections. It can be reproduced
>by following steps:
>
>- run nginx/wrk test:
> smc_run nginx
> smc_run wrk -t 16 -c 1000 -d <duration> -H 'Connection: Close' <URL>
>
>- continuously dump SMC-D connections in parallel:
> watch -n 1 'smcss -D'
>
> BUG: kernel NULL pointer dereference, address: 0000000000000030
> CPU: 2 PID: 7204 Comm: smcss Kdump: loaded Tainted: G E 6.7.0+ #55
> RIP: 0010:__smc_diag_dump.constprop.0+0x5e5/0x620 [smc_diag]
> Call Trace:
> <TASK>
> ? __die+0x24/0x70
> ? page_fault_oops+0x66/0x150
> ? exc_page_fault+0x69/0x140
> ? asm_exc_page_fault+0x26/0x30
> ? __smc_diag_dump.constprop.0+0x5e5/0x620 [smc_diag]
> ? __kmalloc_node_track_caller+0x35d/0x430
> ? __alloc_skb+0x77/0x170
> smc_diag_dump_proto+0xd0/0xf0 [smc_diag]
> smc_diag_dump+0x26/0x60 [smc_diag]
> netlink_dump+0x19f/0x320
> __netlink_dump_start+0x1dc/0x300
> smc_diag_handler_dump+0x6a/0x80 [smc_diag]
> ? __pfx_smc_diag_dump+0x10/0x10 [smc_diag]
> sock_diag_rcv_msg+0x121/0x140
> ? __pfx_sock_diag_rcv_msg+0x10/0x10
> netlink_rcv_skb+0x5a/0x110
> sock_diag_rcv+0x28/0x40
> netlink_unicast+0x22a/0x330
> netlink_sendmsg+0x1f8/0x420
> __sock_sendmsg+0xb0/0xc0
> ____sys_sendmsg+0x24e/0x300
> ? copy_msghdr_from_user+0x62/0x80
> ___sys_sendmsg+0x7c/0xd0
> ? __do_fault+0x34/0x160
> ? do_read_fault+0x5f/0x100
> ? do_fault+0xb0/0x110
> ? __handle_mm_fault+0x2b0/0x6c0
> __sys_sendmsg+0x4d/0x80
> do_syscall_64+0x69/0x180
> entry_SYSCALL_64_after_hwframe+0x6e/0x76
>
>It is possible that the connection is in process of being established
>when we dump it. Assumed that the connection has been registered in a
>link group by smc_conn_create() but the rmb_desc has not yet been
>initialized by smc_buf_create(), thus causing the illegal access to
>conn->rmb_desc. So fix it by checking before dump.
>
>Fixes: 4b1b7d3b30a6 ("net/smc: add SMC-D diag support")
>Signed-off-by: Wen Gu <[email protected]>
Reviewed-by: Dust Li <[email protected]>
Best regards,
Dust
>---
>v2->v1: corrected the commit in Fixes tag.
>(https://lore.kernel.org/netdev/[email protected]/)
>
> net/smc/smc_diag.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/net/smc/smc_diag.c b/net/smc/smc_diag.c
>index 52f7c4f1e767..5a33908015f3 100644
>--- a/net/smc/smc_diag.c
>+++ b/net/smc/smc_diag.c
>@@ -164,7 +164,7 @@ static int __smc_diag_dump(struct sock *sk, struct sk_buff *skb,
> }
> if (smc_conn_lgr_valid(&smc->conn) && smc->conn.lgr->is_smcd &&
> (req->diag_ext & (1 << (SMC_DIAG_DMBINFO - 1))) &&
>- !list_empty(&smc->conn.lgr->list)) {
>+ !list_empty(&smc->conn.lgr->list) && smc->conn.rmb_desc) {
> struct smc_connection *conn = &smc->conn;
> struct smcd_diag_dmbinfo dinfo;
> struct smcd_dev *smcd = conn->lgr->smcd;
>--
>2.32.0.3.g01195cf9f
On 18.01.24 05:32, Wen Gu wrote:
> A crash was found when dumping SMC-D connections. It can be reproduced
> by following steps:
>
> - run nginx/wrk test:
> smc_run nginx
> smc_run wrk -t 16 -c 1000 -d <duration> -H 'Connection: Close' <URL>
>
> - continuously dump SMC-D connections in parallel:
> watch -n 1 'smcss -D'
>
> BUG: kernel NULL pointer dereference, address: 0000000000000030
> CPU: 2 PID: 7204 Comm: smcss Kdump: loaded Tainted: G E 6.7.0+ #55
> RIP: 0010:__smc_diag_dump.constprop.0+0x5e5/0x620 [smc_diag]
> Call Trace:
> <TASK>
> ? __die+0x24/0x70
> ? page_fault_oops+0x66/0x150
> ? exc_page_fault+0x69/0x140
> ? asm_exc_page_fault+0x26/0x30
> ? __smc_diag_dump.constprop.0+0x5e5/0x620 [smc_diag]
> ? __kmalloc_node_track_caller+0x35d/0x430
> ? __alloc_skb+0x77/0x170
> smc_diag_dump_proto+0xd0/0xf0 [smc_diag]
> smc_diag_dump+0x26/0x60 [smc_diag]
> netlink_dump+0x19f/0x320
> __netlink_dump_start+0x1dc/0x300
> smc_diag_handler_dump+0x6a/0x80 [smc_diag]
> ? __pfx_smc_diag_dump+0x10/0x10 [smc_diag]
> sock_diag_rcv_msg+0x121/0x140
> ? __pfx_sock_diag_rcv_msg+0x10/0x10
> netlink_rcv_skb+0x5a/0x110
> sock_diag_rcv+0x28/0x40
> netlink_unicast+0x22a/0x330
> netlink_sendmsg+0x1f8/0x420
> __sock_sendmsg+0xb0/0xc0
> ____sys_sendmsg+0x24e/0x300
> ? copy_msghdr_from_user+0x62/0x80
> ___sys_sendmsg+0x7c/0xd0
> ? __do_fault+0x34/0x160
> ? do_read_fault+0x5f/0x100
> ? do_fault+0xb0/0x110
> ? __handle_mm_fault+0x2b0/0x6c0
> __sys_sendmsg+0x4d/0x80
> do_syscall_64+0x69/0x180
> entry_SYSCALL_64_after_hwframe+0x6e/0x76
>
> It is possible that the connection is in process of being established
> when we dump it. Assumed that the connection has been registered in a
> link group by smc_conn_create() but the rmb_desc has not yet been
> initialized by smc_buf_create(), thus causing the illegal access to
> conn->rmb_desc. So fix it by checking before dump.
>
> Fixes: 4b1b7d3b30a6 ("net/smc: add SMC-D diag support")
> Signed-off-by: Wen Gu <[email protected]>
> ---
> v2->v1: corrected the commit in Fixes tag.
> (https://lore.kernel.org/netdev/[email protected]/)
>
> net/smc/smc_diag.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/smc/smc_diag.c b/net/smc/smc_diag.c
> index 52f7c4f1e767..5a33908015f3 100644
> --- a/net/smc/smc_diag.c
> +++ b/net/smc/smc_diag.c
> @@ -164,7 +164,7 @@ static int __smc_diag_dump(struct sock *sk, struct sk_buff *skb,
> }
> if (smc_conn_lgr_valid(&smc->conn) && smc->conn.lgr->is_smcd &&
> (req->diag_ext & (1 << (SMC_DIAG_DMBINFO - 1))) &&
> - !list_empty(&smc->conn.lgr->list)) {
> + !list_empty(&smc->conn.lgr->list) && smc->conn.rmb_desc) {
> struct smc_connection *conn = &smc->conn;
> struct smcd_diag_dmbinfo dinfo;
> struct smcd_dev *smcd = conn->lgr->smcd;
That sounds reasonable to me! Thank you for the fix!
Reviewed-by: Wenjia Zhang <[email protected]>
Hello:
This patch was applied to netdev/net.git (main)
by David S. Miller <[email protected]>:
On Thu, 18 Jan 2024 12:32:10 +0800 you wrote:
> A crash was found when dumping SMC-D connections. It can be reproduced
> by following steps:
>
> - run nginx/wrk test:
> smc_run nginx
> smc_run wrk -t 16 -c 1000 -d <duration> -H 'Connection: Close' <URL>
>
> [...]
Here is the summary with links:
- [net,v2] net/smc: fix illegal rmb_desc access in SMC-D connection dump
https://git.kernel.org/netdev/net/c/dbc153fd3c14
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html