SMC connections might fail to be registered to a link group due to
things like unable to find a link to assign to in its creation. As
a result, connection creation will return a failure and most
resources related to the connection won't be applied or initialized,
such as conn->abort_work or conn->lnk.
If smc_conn_free() is invoked later, it will try to access the
resources related to the connection, which wasn't initialized, thus
causing a panic.
Here is an example, a SMC-R connection failed to be registered
to a link group and conn->lnk is NULL. The following crash will
happen if smc_conn_free() tries to access conn->lnk in
smc_cdc_tx_dismiss_slots().
BUG: kernel NULL pointer dereference, address: 0000000000000168
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 4 PID: 68 Comm: kworker/4:1 Kdump: loaded Tainted: G E 5.16.0-rc5+ #52
Workqueue: smc_hs_wq smc_listen_work [smc]
RIP: 0010:smc_wr_tx_dismiss_slots+0x1e/0xc0 [smc]
Call Trace:
<TASK>
smc_conn_free+0xd8/0x100 [smc]
smc_lgr_cleanup_early+0x15/0x90 [smc]
smc_listen_work+0x302/0x1230 [smc]
? process_one_work+0x25c/0x600
process_one_work+0x25c/0x600
worker_thread+0x4f/0x3a0
? process_one_work+0x600/0x600
kthread+0x15d/0x1a0
? set_kthread_struct+0x40/0x40
ret_from_fork+0x1f/0x30
</TASK>
This patch tries to fix this by resetting conn->lgr to NULL if an
abnormal exit occurs in smc_lgr_register_conn(), thus avoiding the
crash caused by accessing the uninitialized resources in smc_conn_free(),
and scheduling the link group's free work if it is new created.
Fixes: 56bc3b2094b4 ("net/smc: assign link to a new connection")
Signed-off-by: Wen Gu <[email protected]>
---
net/smc/smc_core.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
index 412bc85..8edc43a 100644
--- a/net/smc/smc_core.c
+++ b/net/smc/smc_core.c
@@ -171,8 +171,10 @@ static int smc_lgr_register_conn(struct smc_connection *conn, bool first)
if (!conn->lgr->is_smcd) {
rc = smcr_lgr_conn_assign_link(conn, first);
- if (rc)
+ if (rc) {
+ conn->lgr = NULL;
return rc;
+ }
}
/* find a new alert_token_local value not yet used by some connection
* in this link group
@@ -1835,8 +1837,10 @@ int smc_conn_create(struct smc_sock *smc, struct smc_init_info *ini)
write_lock_bh(&lgr->conns_lock);
rc = smc_lgr_register_conn(conn, true);
write_unlock_bh(&lgr->conns_lock);
- if (rc)
+ if (rc) {
+ smc_lgr_schedule_free_work(lgr);
goto out;
+ }
}
conn->local_tx_ctrl.common.type = SMC_CDC_MSG_TYPE;
conn->local_tx_ctrl.len = SMC_WR_TX_SIZE;
--
1.8.3.1
On 04/01/2022 03:59, Wen Gu wrote:
> SMC connections might fail to be registered to a link group due to
> things like unable to find a link to assign to in its creation. As
> a result, connection creation will return a failure and most
> resources related to the connection won't be applied or initialized,
> such as conn->abort_work or conn->lnk.
Patch looks good to me, but one more thing to think about:
Would it be better to invoke __smc_lgr_terminate() instead of smc_lgr_schedule_free_work()
when a link group was created but cannot be used now? This would immediately free up all
allocated resources for this unusable link group instead of starting the default 10-minute
timer until the link group is freed.
__smc_lgr_terminate() takes care of completely removing the link group in the context of
its caller. It is also used from within smc_lgr_cleanup_early() that is used when the very
first connection of a link group is aborted.
Thanks for your reply.
On 2022/1/4 5:58 pm, Karsten Graul wrote:
> On 04/01/2022 03:59, Wen Gu wrote:
>> SMC connections might fail to be registered to a link group due to
>> things like unable to find a link to assign to in its creation. As
>> a result, connection creation will return a failure and most
>> resources related to the connection won't be applied or initialized,
>> such as conn->abort_work or conn->lnk.
>
> Patch looks good to me, but one more thing to think about:
>
> Would it be better to invoke __smc_lgr_terminate() instead of smc_lgr_schedule_free_work()
> when a link group was created but cannot be used now? This would immediately free up all
> allocated resources for this unusable link group instead of starting the default 10-minute
> timer until the link group is freed.
> __smc_lgr_terminate() takes care of completely removing the link group in the context of
> its caller. It is also used from within smc_lgr_cleanup_early() that is used when the very
> first connection of a link group is aborted.
Thanks for your suggestion.
I also agree with using link group termination function for a immediate free.
I will improve it and send a v3 patch.
Thanks,
Wen Gu