We encountered some crashes recently and they are caused by the
race between the access and free of link/link group in abnormal
smc link group termination. The crashes can be reproduced in
frequent abnormal link group termination, like setting RNICs up/down.
This set of patches tries to fix this by extending the life cycle
of link/link group to ensure that they won't be referred to after
cleared or freed.
Wen Gu (3):
net/smc: Resolve the race between link group access and termination
net/smc: Introduce a new conn->lgr validity check helper
net/smc: Resolve the race between SMC-R link access and clear
net/smc/af_smc.c | 6 +++-
net/smc/smc.h | 1 +
net/smc/smc_cdc.c | 3 +-
net/smc/smc_clc.c | 2 +-
net/smc/smc_core.c | 94 ++++++++++++++++++++++++++++++++++++++++++++++--------
net/smc/smc_core.h | 12 +++++++
net/smc/smc_diag.c | 6 ++--
7 files changed, 104 insertions(+), 20 deletions(-)
--
1.8.3.1
We encountered some crashes caused by the race between the access
and the termination of link groups.
Here are some of panic stacks we met:
1) Race between smc_clc_wait_msg() and __smc_lgr_terminate()
BUG: kernel NULL pointer dereference, address: 00000000000002f0
Workqueue: smc_hs_wq smc_listen_work [smc]
RIP: 0010:smc_clc_wait_msg+0x3eb/0x5c0 [smc]
Call Trace:
<TASK>
? smc_clc_send_accept+0x45/0xa0 [smc]
? smc_clc_send_accept+0x45/0xa0 [smc]
smc_listen_work+0x783/0x1220 [smc]
? finish_task_switch+0xc4/0x2e0
? process_one_work+0x1ad/0x3c0
process_one_work+0x1ad/0x3c0
worker_thread+0x4c/0x390
? rescuer_thread+0x320/0x320
kthread+0x149/0x190
? set_kthread_struct+0x40/0x40
ret_from_fork+0x1f/0x30
</TASK>
smc_listen_work() abnormal case like port error
---------------------------------------------------------------
| __smc_lgr_terminate()
| |- smc_conn_kill()
| |- smc_lgr_unregister_conn()
| |- set conn->lgr = NULL
smc_clc_wait_msg() |
|- access conn->lgr (panic) |
2) Race between smc_setsockopt() and __smc_lgr_terminate()
BUG: kernel NULL pointer dereference, address: 00000000000002e8
RIP: 0010:smc_setsockopt+0x17a/0x280 [smc]
Call Trace:
<TASK>
__sys_setsockopt+0xfc/0x190
__x64_sys_setsockopt+0x20/0x30
do_syscall_64+0x34/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
</TASK>
smc_setsockopt() abnormal case like port error
--------------------------------------------------------------
| __smc_lgr_terminate()
| |- smc_conn_kill()
| |- smc_lgr_unregister_conn()
| |- set conn->lgr = NULL
mod_delayed_work() |
|- access conn->lgr (panic) |
There are some other panic places and they are caused by the
similar reason as described above, which is accessing link
group after termination, thus getting a NULL pointer or invalid
resource.
Currently, there seems to be no synchronization between the
link group access and a sudden termination of it. This patch
tries to fix this by introducing reference count of link group
and not freeing link group until reference count is zero.
Link group might be referred to by links or smc connections. So
the operation to the link group reference count can be concluded
as follows:
object [hold or initialized as 1] [put]
--------------------------------------------------------------------
link group smc_lgr_create() smc_lgr_free()
connections smc_lgr_register_conn() smc_conn_free()
links smcr_link_init() smcr_link_clear()
Througth this way, we extend the life cycle of link group and
ensure it is longer than the life cycle of connections and links
above it, so that avoid invalid access to link group after its
termination.
Signed-off-by: Wen Gu <[email protected]>
---
net/smc/smc.h | 1 +
net/smc/smc_core.c | 47 ++++++++++++++++++++++++++++++++++++++++++-----
net/smc/smc_core.h | 3 +++
3 files changed, 46 insertions(+), 5 deletions(-)
diff --git a/net/smc/smc.h b/net/smc/smc.h
index 1a4fc1c..3d0b8e3 100644
--- a/net/smc/smc.h
+++ b/net/smc/smc.h
@@ -221,6 +221,7 @@ struct smc_connection {
*/
u64 peer_token; /* SMC-D token of peer */
u8 killed : 1; /* abnormal termination */
+ u8 freed : 1; /* normal termiation */
u8 out_of_sync : 1; /* out of sync with peer */
};
diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
index cd3c3b8..26a113d 100644
--- a/net/smc/smc_core.c
+++ b/net/smc/smc_core.c
@@ -186,6 +186,7 @@ static int smc_lgr_register_conn(struct smc_connection *conn, bool first)
conn->alert_token_local = 0;
}
smc_lgr_add_alert_token(conn);
+ smc_lgr_hold(conn->lgr); /* lgr_put in smc_conn_free() */
conn->lgr->conns_num++;
return 0;
}
@@ -218,7 +219,6 @@ static void smc_lgr_unregister_conn(struct smc_connection *conn)
__smc_lgr_unregister_conn(conn);
}
write_unlock_bh(&lgr->conns_lock);
- conn->lgr = NULL;
}
int smc_nl_get_sys_info(struct sk_buff *skb, struct netlink_callback *cb)
@@ -749,6 +749,7 @@ int smcr_link_init(struct smc_link_group *lgr, struct smc_link *lnk,
lnk->path_mtu = lnk->smcibdev->pattr[lnk->ibport - 1].active_mtu;
lnk->link_id = smcr_next_link_id(lgr);
lnk->lgr = lgr;
+ smc_lgr_hold(lgr); /* lgr_put in smcr_link_clear() */
lnk->link_idx = link_idx;
smc_ibdev_cnt_inc(lnk);
smcr_copy_dev_info_to_link(lnk);
@@ -841,6 +842,7 @@ static int smc_lgr_create(struct smc_sock *smc, struct smc_init_info *ini)
lgr->terminating = 0;
lgr->freeing = 0;
lgr->vlan_id = ini->vlan_id;
+ refcount_set(&lgr->refcnt, 1); /* set lgr refcnt to 1 */
mutex_init(&lgr->sndbufs_lock);
mutex_init(&lgr->rmbs_lock);
rwlock_init(&lgr->conns_lock);
@@ -1120,8 +1122,22 @@ void smc_conn_free(struct smc_connection *conn)
{
struct smc_link_group *lgr = conn->lgr;
- if (!lgr)
+ if (!lgr || conn->freed)
+ /* The connection has never been registered in a
+ * link group, or has already been freed.
+ *
+ * Check to ensure that the refcnt of link group
+ * won't be put incorrectly.
+ */
return;
+
+ conn->freed = 1;
+ if (!conn->alert_token_local)
+ /* The connection was registered in a link group
+ * defore, but now it is unregistered from it.
+ */
+ goto lgr_put;
+
if (lgr->is_smcd) {
if (!list_empty(&lgr->list))
smc_ism_unset_conn(conn);
@@ -1138,6 +1154,8 @@ void smc_conn_free(struct smc_connection *conn)
if (!lgr->conns_num)
smc_lgr_schedule_free_work(lgr);
+lgr_put:
+ smc_lgr_put(lgr); /* lgr_hold in smc_lgr_register_conn() */
}
/* unregister a link from a buf_desc */
@@ -1209,6 +1227,7 @@ void smcr_link_clear(struct smc_link *lnk, bool log)
smc_ib_destroy_queue_pair(lnk);
smc_ib_dealloc_protection_domain(lnk);
smc_wr_free_link_mem(lnk);
+ smc_lgr_put(lnk->lgr); /* lgr_hold in smcr_link_init() */
smc_ibdev_cnt_dec(lnk);
put_device(&lnk->smcibdev->ibdev->dev);
smcibdev = lnk->smcibdev;
@@ -1283,6 +1302,15 @@ static void smc_lgr_free_bufs(struct smc_link_group *lgr)
__smc_lgr_free_bufs(lgr, true);
}
+/* won't be freed until no one accesses to lgr anymore */
+static void __smc_lgr_free(struct smc_link_group *lgr)
+{
+ smc_lgr_free_bufs(lgr);
+ if (!lgr->is_smcd)
+ smc_wr_free_lgr_mem(lgr);
+ kfree(lgr);
+}
+
/* remove a link group */
static void smc_lgr_free(struct smc_link_group *lgr)
{
@@ -1298,7 +1326,6 @@ static void smc_lgr_free(struct smc_link_group *lgr)
smc_llc_lgr_clear(lgr);
}
- smc_lgr_free_bufs(lgr);
destroy_workqueue(lgr->tx_wq);
if (lgr->is_smcd) {
smc_ism_put_vlan(lgr->smcd, lgr->vlan_id);
@@ -1306,11 +1333,21 @@ static void smc_lgr_free(struct smc_link_group *lgr)
if (!atomic_dec_return(&lgr->smcd->lgr_cnt))
wake_up(&lgr->smcd->lgrs_deleted);
} else {
- smc_wr_free_lgr_mem(lgr);
if (!atomic_dec_return(&lgr_cnt))
wake_up(&lgrs_deleted);
}
- kfree(lgr);
+ smc_lgr_put(lgr); /* theoretically last lgr_put */
+}
+
+void smc_lgr_hold(struct smc_link_group *lgr)
+{
+ refcount_inc(&lgr->refcnt);
+}
+
+void smc_lgr_put(struct smc_link_group *lgr)
+{
+ if (refcount_dec_and_test(&lgr->refcnt))
+ __smc_lgr_free(lgr);
}
static void smc_sk_wake_ups(struct smc_sock *smc)
diff --git a/net/smc/smc_core.h b/net/smc/smc_core.h
index 73d0c35..edbd401 100644
--- a/net/smc/smc_core.h
+++ b/net/smc/smc_core.h
@@ -249,6 +249,7 @@ struct smc_link_group {
u8 terminating : 1;/* lgr is terminating */
u8 freeing : 1; /* lgr is being freed */
+ refcount_t refcnt; /* lgr reference count */
bool is_smcd; /* SMC-R or SMC-D */
u8 smc_version;
u8 negotiated_eid[SMC_MAX_EID_LEN];
@@ -470,6 +471,8 @@ static inline void smc_set_pci_values(struct pci_dev *pci_dev,
void smc_lgr_cleanup_early(struct smc_link_group *lgr);
void smc_lgr_terminate_sched(struct smc_link_group *lgr);
+void smc_lgr_hold(struct smc_link_group *lgr);
+void smc_lgr_put(struct smc_link_group *lgr);
void smcr_port_add(struct smc_ib_device *smcibdev, u8 ibport);
void smcr_port_err(struct smc_ib_device *smcibdev, u8 ibport);
void smc_smcd_terminate(struct smcd_dev *dev, u64 peer_gid,
--
1.8.3.1
We encountered some crashes caused by the race between smc-r
link access and link clear that triggered by abnormal link
group termination, such as port error.
Here is an example of this kind of crashes:
BUG: kernel NULL pointer dereference, address: 0000000000000000
Workqueue: smc_hs_wq smc_listen_work [smc]
RIP: 0010:smc_llc_flow_initiate+0x44/0x190 [smc]
Call Trace:
<TASK>
? __smc_buf_create+0x75a/0x950 [smc]
smcr_lgr_reg_rmbs+0x2a/0xbf [smc]
smc_listen_work+0xf72/0x1230 [smc]
? process_one_work+0x25c/0x600
process_one_work+0x25c/0x600
worker_thread+0x4f/0x3a0
? process_one_work+0x600/0x600
kthread+0x15d/0x1a0
? set_kthread_struct+0x40/0x40
ret_from_fork+0x1f/0x30
</TASK>
smc_listen_work() __smc_lgr_terminate()
---------------------------------------------------------------
| smc_lgr_free()
| |- smcr_link_clear()
| |- memset(lnk, 0)
smc_listen_rdma_reg() |
|- smcr_lgr_reg_rmbs() |
|- smc_llc_flow_initiate() |
|- access lnk->lgr (panic) |
These crashes are similarly caused by clearing SMC-R link
resources when some functions is still accessing to them. So
this patch tries to fix the issue by introducing reference
count of smc-r links and ensuring that the sensitive resources
of links are not cleared until reference count is zero.
The operation to the SMC-R link reference count can be concluded
as follows:
object [hold or initialized as 1] [put]
--------------------------------------------------------------------
links smcr_link_init() smcr_link_clear()
connections smcr_lgr_conn_assign_link() smc_conn_free()
Through this way, the clear of SMC-R links is later than the
free of all the smc connections above it, thus avoiding the
unsafe reference to SMC-R links.
Signed-off-by: Wen Gu <[email protected]>
---
net/smc/smc_core.c | 39 +++++++++++++++++++++++++++++++++------
net/smc/smc_core.h | 4 ++++
2 files changed, 37 insertions(+), 6 deletions(-)
diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
index c27a7d5..ddb088a 100644
--- a/net/smc/smc_core.c
+++ b/net/smc/smc_core.c
@@ -155,6 +155,7 @@ static int smcr_lgr_conn_assign_link(struct smc_connection *conn, bool first)
if (!conn->lnk)
return SMC_CLC_DECL_NOACTLINK;
atomic_inc(&conn->lnk->conn_cnt);
+ smcr_link_hold(conn->lnk); /* link_put in smc_conn_free() */
return 0;
}
@@ -746,6 +747,8 @@ int smcr_link_init(struct smc_link_group *lgr, struct smc_link *lnk,
}
get_device(&lnk->smcibdev->ibdev->dev);
atomic_inc(&lnk->smcibdev->lnk_cnt);
+ refcount_set(&lnk->refcnt, 1); /* link refcnt is set to 1 */
+ lnk->clearing = 0;
lnk->path_mtu = lnk->smcibdev->pattr[lnk->ibport - 1].active_mtu;
lnk->link_id = smcr_next_link_id(lgr);
lnk->lgr = lgr;
@@ -994,8 +997,12 @@ void smc_switch_link_and_count(struct smc_connection *conn,
struct smc_link *to_lnk)
{
atomic_dec(&conn->lnk->conn_cnt);
+ /* put old link, hold in smcr_lgr_conn_assign_link() */
+ smcr_link_put(conn->lnk);
conn->lnk = to_lnk;
atomic_inc(&conn->lnk->conn_cnt);
+ /* hold new link, put in smc_conn_free() */
+ smcr_link_hold(conn->lnk);
}
struct smc_link *smc_switch_conns(struct smc_link_group *lgr,
@@ -1127,7 +1134,7 @@ void smc_conn_free(struct smc_connection *conn)
* link group, or has already been freed.
*
* Check to ensure that the refcnt of link group
- * won't be put incorrectly.
+ * or link won't be put incorrectly.
*/
return;
@@ -1155,6 +1162,8 @@ void smc_conn_free(struct smc_connection *conn)
if (!lgr->conns_num)
smc_lgr_schedule_free_work(lgr);
lgr_put:
+ if (!lgr->is_smcd)
+ smcr_link_put(conn->lnk); /* link_hold in smcr_lgr_conn_assign_link() */
smc_lgr_put(lgr); /* lgr_hold in smc_lgr_register_conn() */
}
@@ -1211,13 +1220,23 @@ static void smcr_rtoken_clear_link(struct smc_link *lnk)
}
}
+static void __smcr_link_clear(struct smc_link *lnk)
+{
+ smc_wr_free_link_mem(lnk);
+ smc_lgr_put(lnk->lgr); /* lgr_hold in smcr_link_init() */
+ memset(lnk, 0, sizeof(struct smc_link));
+ lnk->state = SMC_LNK_UNUSED;
+}
+
/* must be called under lgr->llc_conf_mutex lock */
void smcr_link_clear(struct smc_link *lnk, bool log)
{
struct smc_ib_device *smcibdev;
- if (!lnk->lgr || lnk->state == SMC_LNK_UNUSED)
+ if (lnk->clearing || !lnk->lgr ||
+ lnk->state == SMC_LNK_UNUSED)
return;
+ lnk->clearing = 1;
lnk->peer_qpn = 0;
smc_llc_link_clear(lnk, log);
smcr_buf_unmap_lgr(lnk);
@@ -1226,15 +1245,23 @@ void smcr_link_clear(struct smc_link *lnk, bool log)
smc_wr_free_link(lnk);
smc_ib_destroy_queue_pair(lnk);
smc_ib_dealloc_protection_domain(lnk);
- smc_wr_free_link_mem(lnk);
- smc_lgr_put(lnk->lgr); /* lgr_hold in smcr_link_init() */
smc_ibdev_cnt_dec(lnk);
put_device(&lnk->smcibdev->ibdev->dev);
smcibdev = lnk->smcibdev;
- memset(lnk, 0, sizeof(struct smc_link));
- lnk->state = SMC_LNK_UNUSED;
if (!atomic_dec_return(&smcibdev->lnk_cnt))
wake_up(&smcibdev->lnks_deleted);
+ smcr_link_put(lnk); /* theoretically last link_put */
+}
+
+void smcr_link_hold(struct smc_link *lnk)
+{
+ refcount_inc(&lnk->refcnt);
+}
+
+void smcr_link_put(struct smc_link *lnk)
+{
+ if (refcount_dec_and_test(&lnk->refcnt))
+ __smcr_link_clear(lnk);
}
static void smcr_buf_free(struct smc_link_group *lgr, bool is_rmb,
diff --git a/net/smc/smc_core.h b/net/smc/smc_core.h
index 630298b..cbf0fc1 100644
--- a/net/smc/smc_core.h
+++ b/net/smc/smc_core.h
@@ -137,6 +137,8 @@ struct smc_link {
u8 peer_link_uid[SMC_LGR_ID_SIZE]; /* peer uid */
u8 link_idx; /* index in lgr link array */
u8 link_is_asym; /* is link asymmetric? */
+ u8 clearing : 1; /* link is being cleared */
+ refcount_t refcnt; /* link reference count */
struct smc_link_group *lgr; /* parent link group */
struct work_struct link_down_wrk; /* wrk to bring link down */
char ibname[IB_DEVICE_NAME_MAX]; /* ib device name */
@@ -509,6 +511,8 @@ void smc_rtoken_set2(struct smc_link_group *lgr, int rtok_idx, int link_id,
int smcr_link_init(struct smc_link_group *lgr, struct smc_link *lnk,
u8 link_idx, struct smc_init_info *ini);
void smcr_link_clear(struct smc_link *lnk, bool log);
+void smcr_link_hold(struct smc_link *lnk);
+void smcr_link_put(struct smc_link *lnk);
void smc_switch_link_and_count(struct smc_connection *conn,
struct smc_link *to_lnk);
int smcr_buf_map_lgr(struct smc_link *lnk);
--
1.8.3.1
It is no longer suitable to identify whether a smc connection
is registered in a link group through checking if conn->lgr
is NULL, because conn->lgr won't be reset even the connection
is unregistered from a link group.
So this patch introduces a new helper smc_conn_lgr_valid() and
replaces all the check of conn->lgr in original implementation
with the new helper to judge if conn->lgr is valid to use.
Signed-off-by: Wen Gu <[email protected]>
---
net/smc/af_smc.c | 6 +++++-
net/smc/smc_cdc.c | 3 ++-
net/smc/smc_clc.c | 2 +-
net/smc/smc_core.c | 14 ++++++++------
net/smc/smc_core.h | 5 +++++
net/smc/smc_diag.c | 6 +++---
6 files changed, 24 insertions(+), 12 deletions(-)
diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 9713059..7c27bb6 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -634,9 +634,13 @@ static void smc_conn_abort(struct smc_sock *smc, int local_first)
{
struct smc_connection *conn = &smc->conn;
struct smc_link_group *lgr = conn->lgr;
+ bool lgr_valid = false;
+
+ if (smc_conn_lgr_valid(conn))
+ lgr_valid = true;
smc_conn_free(conn);
- if (local_first)
+ if (local_first && lgr_valid)
smc_lgr_cleanup_early(lgr);
}
diff --git a/net/smc/smc_cdc.c b/net/smc/smc_cdc.c
index 84c8a43..9d5a971 100644
--- a/net/smc/smc_cdc.c
+++ b/net/smc/smc_cdc.c
@@ -197,7 +197,8 @@ int smc_cdc_get_slot_and_msg_send(struct smc_connection *conn)
{
int rc;
- if (!conn->lgr || (conn->lgr->is_smcd && conn->lgr->peer_shutdown))
+ if (!smc_conn_lgr_valid(conn) ||
+ (conn->lgr->is_smcd && conn->lgr->peer_shutdown))
return -EPIPE;
if (conn->lgr->is_smcd) {
diff --git a/net/smc/smc_clc.c b/net/smc/smc_clc.c
index 8409ab7..5a21108 100644
--- a/net/smc/smc_clc.c
+++ b/net/smc/smc_clc.c
@@ -774,7 +774,7 @@ int smc_clc_send_decline(struct smc_sock *smc, u32 peer_diag_info, u8 version)
dclc.os_type = version == SMC_V1 ? 0 : SMC_CLC_OS_LINUX;
dclc.hdr.typev2 = (peer_diag_info == SMC_CLC_DECL_SYNCERR) ?
SMC_FIRST_CONTACT_MASK : 0;
- if ((!smc->conn.lgr || !smc->conn.lgr->is_smcd) &&
+ if ((!smc_conn_lgr_valid(&smc->conn) || !smc->conn.lgr->is_smcd) &&
smc_ib_is_valid_local_systemid())
memcpy(dclc.id_for_peer, local_systemid,
sizeof(local_systemid));
diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
index 26a113d..c27a7d5 100644
--- a/net/smc/smc_core.c
+++ b/net/smc/smc_core.c
@@ -212,7 +212,7 @@ static void smc_lgr_unregister_conn(struct smc_connection *conn)
{
struct smc_link_group *lgr = conn->lgr;
- if (!lgr)
+ if (!smc_conn_lgr_valid(conn))
return;
write_lock_bh(&lgr->conns_lock);
if (conn->alert_token_local) {
@@ -1132,7 +1132,7 @@ void smc_conn_free(struct smc_connection *conn)
return;
conn->freed = 1;
- if (!conn->alert_token_local)
+ if (!smc_conn_lgr_valid(conn))
/* The connection was registered in a link group
* defore, but now it is unregistered from it.
*/
@@ -2262,14 +2262,16 @@ static int __smc_buf_create(struct smc_sock *smc, bool is_smcd, bool is_rmb)
void smc_sndbuf_sync_sg_for_cpu(struct smc_connection *conn)
{
- if (!conn->lgr || conn->lgr->is_smcd || !smc_link_active(conn->lnk))
+ if (!smc_conn_lgr_valid(conn) || conn->lgr->is_smcd ||
+ !smc_link_active(conn->lnk))
return;
smc_ib_sync_sg_for_cpu(conn->lnk, conn->sndbuf_desc, DMA_TO_DEVICE);
}
void smc_sndbuf_sync_sg_for_device(struct smc_connection *conn)
{
- if (!conn->lgr || conn->lgr->is_smcd || !smc_link_active(conn->lnk))
+ if (!smc_conn_lgr_valid(conn) || conn->lgr->is_smcd ||
+ !smc_link_active(conn->lnk))
return;
smc_ib_sync_sg_for_device(conn->lnk, conn->sndbuf_desc, DMA_TO_DEVICE);
}
@@ -2278,7 +2280,7 @@ void smc_rmb_sync_sg_for_cpu(struct smc_connection *conn)
{
int i;
- if (!conn->lgr || conn->lgr->is_smcd)
+ if (!smc_conn_lgr_valid(conn) || conn->lgr->is_smcd)
return;
for (i = 0; i < SMC_LINKS_PER_LGR_MAX; i++) {
if (!smc_link_active(&conn->lgr->lnk[i]))
@@ -2292,7 +2294,7 @@ void smc_rmb_sync_sg_for_device(struct smc_connection *conn)
{
int i;
- if (!conn->lgr || conn->lgr->is_smcd)
+ if (!smc_conn_lgr_valid(conn) || conn->lgr->is_smcd)
return;
for (i = 0; i < SMC_LINKS_PER_LGR_MAX; i++) {
if (!smc_link_active(&conn->lgr->lnk[i]))
diff --git a/net/smc/smc_core.h b/net/smc/smc_core.h
index edbd401..630298b 100644
--- a/net/smc/smc_core.h
+++ b/net/smc/smc_core.h
@@ -408,6 +408,11 @@ static inline struct smc_connection *smc_lgr_find_conn(
return res;
}
+static inline bool smc_conn_lgr_valid(struct smc_connection *conn)
+{
+ return conn->lgr && conn->alert_token_local;
+}
+
/* returns true if the specified link is usable */
static inline bool smc_link_usable(struct smc_link *lnk)
{
diff --git a/net/smc/smc_diag.c b/net/smc/smc_diag.c
index c952986a..25ef26b 100644
--- a/net/smc/smc_diag.c
+++ b/net/smc/smc_diag.c
@@ -89,7 +89,7 @@ static int __smc_diag_dump(struct sock *sk, struct sk_buff *skb,
r->diag_state = sk->sk_state;
if (smc->use_fallback)
r->diag_mode = SMC_DIAG_MODE_FALLBACK_TCP;
- else if (smc->conn.lgr && smc->conn.lgr->is_smcd)
+ else if (smc_conn_lgr_valid(&smc->conn) && smc->conn.lgr->is_smcd)
r->diag_mode = SMC_DIAG_MODE_SMCD;
else
r->diag_mode = SMC_DIAG_MODE_SMCR;
@@ -142,7 +142,7 @@ static int __smc_diag_dump(struct sock *sk, struct sk_buff *skb,
goto errout;
}
- if (smc->conn.lgr && !smc->conn.lgr->is_smcd &&
+ if (smc_conn_lgr_valid(&smc->conn) && !smc->conn.lgr->is_smcd &&
(req->diag_ext & (1 << (SMC_DIAG_LGRINFO - 1))) &&
!list_empty(&smc->conn.lgr->list)) {
struct smc_diag_lgrinfo linfo = {
@@ -162,7 +162,7 @@ static int __smc_diag_dump(struct sock *sk, struct sk_buff *skb,
if (nla_put(skb, SMC_DIAG_LGRINFO, sizeof(linfo), &linfo) < 0)
goto errout;
}
- if (smc->conn.lgr && smc->conn.lgr->is_smcd &&
+ if (smc_conn_lgr_valid(&smc->conn) && smc->conn.lgr->is_smcd &&
(req->diag_ext & (1 << (SMC_DIAG_DMBINFO - 1))) &&
!list_empty(&smc->conn.lgr->list)) {
struct smc_connection *conn = &smc->conn;
--
1.8.3.1
On 10/01/2022 10:26, Wen Gu wrote:
> We encountered some crashes caused by the race between the access
> and the termination of link groups.
>
<snip>
>
> diff --git a/net/smc/smc.h b/net/smc/smc.h
> index 1a4fc1c..3d0b8e3 100644
> --- a/net/smc/smc.h
> +++ b/net/smc/smc.h
> @@ -221,6 +221,7 @@ struct smc_connection {
> */
> u64 peer_token; /* SMC-D token of peer */
> u8 killed : 1; /* abnormal termination */
> + u8 freed : 1; /* normal termiation */
> u8 out_of_sync : 1; /* out of sync with peer */
> };
>
> diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
> index cd3c3b8..26a113d 100644
> --- a/net/smc/smc_core.c
> +++ b/net/smc/smc_core.c
> @@ -186,6 +186,7 @@ static int smc_lgr_register_conn(struct smc_connection *conn, bool first)
> conn->alert_token_local = 0;
> }
> smc_lgr_add_alert_token(conn);
> + smc_lgr_hold(conn->lgr); /* lgr_put in smc_conn_free() */
> conn->lgr->conns_num++;
> return 0;
> }
> @@ -218,7 +219,6 @@ static void smc_lgr_unregister_conn(struct smc_connection *conn)
> __smc_lgr_unregister_conn(conn);
> }
> write_unlock_bh(&lgr->conns_lock);
> - conn->lgr = NULL;
> }
>
> int smc_nl_get_sys_info(struct sk_buff *skb, struct netlink_callback *cb)
> @@ -749,6 +749,7 @@ int smcr_link_init(struct smc_link_group *lgr, struct smc_link *lnk,
> lnk->path_mtu = lnk->smcibdev->pattr[lnk->ibport - 1].active_mtu;
> lnk->link_id = smcr_next_link_id(lgr);
> lnk->lgr = lgr;
> + smc_lgr_hold(lgr); /* lgr_put in smcr_link_clear() */
> lnk->link_idx = link_idx;
> smc_ibdev_cnt_inc(lnk);
> smcr_copy_dev_info_to_link(lnk);
> @@ -841,6 +842,7 @@ static int smc_lgr_create(struct smc_sock *smc, struct smc_init_info *ini)
> lgr->terminating = 0;
> lgr->freeing = 0;
> lgr->vlan_id = ini->vlan_id;
> + refcount_set(&lgr->refcnt, 1); /* set lgr refcnt to 1 */
> mutex_init(&lgr->sndbufs_lock);
> mutex_init(&lgr->rmbs_lock);
> rwlock_init(&lgr->conns_lock);
> @@ -1120,8 +1122,22 @@ void smc_conn_free(struct smc_connection *conn)
> {
> struct smc_link_group *lgr = conn->lgr;
>
> - if (!lgr)
> + if (!lgr || conn->freed)
> + /* The connection has never been registered in a
> + * link group, or has already been freed.
> + *
> + * Check to ensure that the refcnt of link group
> + * won't be put incorrectly.
I would delete the second sentence here, its obvious enough.
> + */
> return;
> +
> + conn->freed = 1;
> + if (!conn->alert_token_local)
> + /* The connection was registered in a link group
> + * defore, but now it is unregistered from it.
'before' ... But would maybe the following be more exact:
'Connection already unregistered from link group.'
We still review the patches...
> + */
> + goto lgr_put;
> +
> if (lgr->is_smcd) {
> if (!list_empty(&lgr->list))
> smc_ism_unset_conn(conn);
> @@ -1138,6 +1154,8 @@ void smc_conn_free(struct smc_connection *conn)
>
> if (!lgr->conns_num)
> smc_lgr_schedule_free_work(lgr);
> +lgr_put:
> + smc_lgr_put(lgr); /* lgr_hold in smc_lgr_register_conn() */
> }
>
> /* unregister a link from a buf_desc */
> @@ -1209,6 +1227,7 @@ void smcr_link_clear(struct smc_link *lnk, bool log)
> smc_ib_destroy_queue_pair(lnk);
> smc_ib_dealloc_protection_domain(lnk);
> smc_wr_free_link_mem(lnk);
> + smc_lgr_put(lnk->lgr); /* lgr_hold in smcr_link_init() */
> smc_ibdev_cnt_dec(lnk);
> put_device(&lnk->smcibdev->ibdev->dev);
> smcibdev = lnk->smcibdev;
> @@ -1283,6 +1302,15 @@ static void smc_lgr_free_bufs(struct smc_link_group *lgr)
> __smc_lgr_free_bufs(lgr, true);
> }
>
> +/* won't be freed until no one accesses to lgr anymore */
> +static void __smc_lgr_free(struct smc_link_group *lgr)
> +{
> + smc_lgr_free_bufs(lgr);
> + if (!lgr->is_smcd)
> + smc_wr_free_lgr_mem(lgr);
> + kfree(lgr);
> +}
> +
> /* remove a link group */
> static void smc_lgr_free(struct smc_link_group *lgr)
> {
> @@ -1298,7 +1326,6 @@ static void smc_lgr_free(struct smc_link_group *lgr)
> smc_llc_lgr_clear(lgr);
> }
>
> - smc_lgr_free_bufs(lgr);
> destroy_workqueue(lgr->tx_wq);
> if (lgr->is_smcd) {
> smc_ism_put_vlan(lgr->smcd, lgr->vlan_id);
> @@ -1306,11 +1333,21 @@ static void smc_lgr_free(struct smc_link_group *lgr)
> if (!atomic_dec_return(&lgr->smcd->lgr_cnt))
> wake_up(&lgr->smcd->lgrs_deleted);
> } else {
> - smc_wr_free_lgr_mem(lgr);
> if (!atomic_dec_return(&lgr_cnt))
> wake_up(&lgrs_deleted);
> }
> - kfree(lgr);
> + smc_lgr_put(lgr); /* theoretically last lgr_put */
> +}
> +
> +void smc_lgr_hold(struct smc_link_group *lgr)
> +{
> + refcount_inc(&lgr->refcnt);
> +}
> +
> +void smc_lgr_put(struct smc_link_group *lgr)
> +{
> + if (refcount_dec_and_test(&lgr->refcnt))
> + __smc_lgr_free(lgr);
> }
>
> static void smc_sk_wake_ups(struct smc_sock *smc)
> diff --git a/net/smc/smc_core.h b/net/smc/smc_core.h
> index 73d0c35..edbd401 100644
> --- a/net/smc/smc_core.h
> +++ b/net/smc/smc_core.h
> @@ -249,6 +249,7 @@ struct smc_link_group {
> u8 terminating : 1;/* lgr is terminating */
> u8 freeing : 1; /* lgr is being freed */
>
> + refcount_t refcnt; /* lgr reference count */
> bool is_smcd; /* SMC-R or SMC-D */
> u8 smc_version;
> u8 negotiated_eid[SMC_MAX_EID_LEN];
> @@ -470,6 +471,8 @@ static inline void smc_set_pci_values(struct pci_dev *pci_dev,
>
> void smc_lgr_cleanup_early(struct smc_link_group *lgr);
> void smc_lgr_terminate_sched(struct smc_link_group *lgr);
> +void smc_lgr_hold(struct smc_link_group *lgr);
> +void smc_lgr_put(struct smc_link_group *lgr);
> void smcr_port_add(struct smc_ib_device *smcibdev, u8 ibport);
> void smcr_port_err(struct smc_ib_device *smcibdev, u8 ibport);
> void smc_smcd_terminate(struct smcd_dev *dev, u64 peer_gid,
--
Karsten
On 2022/1/10 8:25 pm, Karsten Graul wrote:
> On 10/01/2022 10:26, Wen Gu wrote:
>> We encountered some crashes caused by the race between the access
>> and the termination of link groups.
>>
>> @@ -1120,8 +1122,22 @@ void smc_conn_free(struct smc_connection *conn)
>> {
>> struct smc_link_group *lgr = conn->lgr;
>>
>> - if (!lgr)
>> + if (!lgr || conn->freed)
>> + /* The connection has never been registered in a
>> + * link group, or has already been freed.
>> + *
>> + * Check to ensure that the refcnt of link group
>> + * won't be put incorrectly.
>
> I would delete the second sentence here, its obvious enough.
>
>> + */
>> return;
>> +
>> + conn->freed = 1;
>> + if (!conn->alert_token_local)
>> + /* The connection was registered in a link group
>> + * defore, but now it is unregistered from it.
>
> 'before' ... But would maybe the following be more exact:
>
> 'Connection already unregistered from link group.'
>
>
> We still review the patches...
>
Thanks for your detailed and patient review. The comments will
be improved as you suggested.
Thanks,
Wen Gu
On 10/01/2022 10:26, Wen Gu wrote:
> We encountered some crashes caused by the race between the access
> and the termination of link groups.
>
>
> +/* won't be freed until no one accesses to lgr anymore */
> +static void __smc_lgr_free(struct smc_link_group *lgr)
> +{
> + smc_lgr_free_bufs(lgr);
> + if (!lgr->is_smcd)
> + smc_wr_free_lgr_mem(lgr);
> + kfree(lgr);
> +}
> +
> /* remove a link group */
> static void smc_lgr_free(struct smc_link_group *lgr)
> {
> @@ -1298,7 +1326,6 @@ static void smc_lgr_free(struct smc_link_group *lgr)
> smc_llc_lgr_clear(lgr);
> }
>
> - smc_lgr_free_bufs(lgr);
> destroy_workqueue(lgr->tx_wq);
> if (lgr->is_smcd) {
> smc_ism_put_vlan(lgr->smcd, lgr->vlan_id);
> @@ -1306,11 +1333,21 @@ static void smc_lgr_free(struct smc_link_group *lgr)
> if (!atomic_dec_return(&lgr->smcd->lgr_cnt))
> wake_up(&lgr->smcd->lgrs_deleted);
> } else {
> - smc_wr_free_lgr_mem(lgr);
> if (!atomic_dec_return(&lgr_cnt))
> wake_up(&lgrs_deleted);
These waiters (seaparate ones for smcd and smcr) are used to wait for all lgrs
to be deleted when a module unload or reboot was triggered, so it must only be
woken up when the lgr is actually freed.
On 10/01/2022 10:26, Wen Gu wrote:
> @@ -1226,15 +1245,23 @@ void smcr_link_clear(struct smc_link *lnk, bool log)
> smc_wr_free_link(lnk);
> smc_ib_destroy_queue_pair(lnk);
> smc_ib_dealloc_protection_domain(lnk);
> - smc_wr_free_link_mem(lnk);
> - smc_lgr_put(lnk->lgr); /* lgr_hold in smcr_link_init() */
> smc_ibdev_cnt_dec(lnk);
> put_device(&lnk->smcibdev->ibdev->dev);
> smcibdev = lnk->smcibdev;
> - memset(lnk, 0, sizeof(struct smc_link));
> - lnk->state = SMC_LNK_UNUSED;
> if (!atomic_dec_return(&smcibdev->lnk_cnt))
> wake_up(&smcibdev->lnks_deleted);
Same here, waiter should not be woken up until the link memory is actually freed.
> + smcr_link_put(lnk); /* theoretically last link_put */
> +}
> +
> +void smcr_link_hold(struct smc_link *lnk)
> +{
> + refcount_inc(&lnk->refcnt);
> +}
> +
> +void smcr_link_put(struct smc_link *lnk)
> +{
> + if (refcount_dec_and_test(&lnk->refcnt))
> + __smcr_link_clear(lnk);
> }
Thanks for your review.
On 2022/1/11 4:23 pm, Karsten Graul wrote:
> On 10/01/2022 10:26, Wen Gu wrote:
>> We encountered some crashes caused by the race between the access
>> and the termination of link groups.
>>
>
> These waiters (seaparate ones for smcd and smcr) are used to wait for all lgrs
> to be deleted when a module unload or reboot was triggered, so it must only be
> woken up when the lgr is actually freed.
Thanks for your reminding, I will move the wake-up code to __smc_lgr_free().
And maybe the vlan put and device put of smcd are also need to be moved
to __smc_lgr_free()?, because it also seems to be more suitable to put these
resources when lgr is actually freed. What do you think?
Thanks,
Wen Gu
On 11/01/2022 16:36, Wen Gu wrote:
> Thanks for your review.
>
> On 2022/1/11 4:23 pm, Karsten Graul wrote:
>> On 10/01/2022 10:26, Wen Gu wrote:
>>> We encountered some crashes caused by the race between the access
>>> and the termination of link groups.
>>>
>>
>> These waiters (seaparate ones for smcd and smcr) are used to wait for all lgrs
>> to be deleted when a module unload or reboot was triggered, so it must only be
>> woken up when the lgr is actually freed.
>
> Thanks for your reminding, I will move the wake-up code to __smc_lgr_free().
>
> And maybe the vlan put and device put of smcd are also need to be moved
> to __smc_lgr_free()?, because it also seems to be more suitable to put these
> resources when lgr is actually freed. What do you think?
Keep the calls to smc_ism_put_vlan() and put_device() in smc_lgr_free(),
thats okay for SMC-D.
Thanks for your review.
On 2022/1/11 4:40 pm, Karsten Graul wrote:
> On 10/01/2022 10:26, Wen Gu wrote:
>> @@ -1226,15 +1245,23 @@ void smcr_link_clear(struct smc_link *lnk, bool log)
>> smc_wr_free_link(lnk);
>> smc_ib_destroy_queue_pair(lnk);
>> smc_ib_dealloc_protection_domain(lnk);
>> - smc_wr_free_link_mem(lnk);
>> - smc_lgr_put(lnk->lgr); /* lgr_hold in smcr_link_init() */
>> smc_ibdev_cnt_dec(lnk);
>> put_device(&lnk->smcibdev->ibdev->dev);
>> smcibdev = lnk->smcibdev;
>> - memset(lnk, 0, sizeof(struct smc_link));
>> - lnk->state = SMC_LNK_UNUSED;
>> if (!atomic_dec_return(&smcibdev->lnk_cnt))
>> wake_up(&smcibdev->lnks_deleted);
>
> Same here, waiter should not be woken up until the link memory is actually freed.
>
OK, I will correct this as well.
And similarly I want to move smc_ibdev_cnt_dec() and put_device() to
__smcr_link_clear() as well to ensure that put link related resources
only when link is actually cleared. What do you think?
Thanks,
Wen Gu
On 2022/1/11 11:46 pm, Karsten Graul wrote:
> On 11/01/2022 16:36, Wen Gu wrote:
>> Thanks for your review.
>>
>> On 2022/1/11 4:23 pm, Karsten Graul wrote:
>>> On 10/01/2022 10:26, Wen Gu wrote:
>>>> We encountered some crashes caused by the race between the access
>>>> and the termination of link groups.
>>>>
>>>
>>> These waiters (seaparate ones for smcd and smcr) are used to wait for all lgrs
>>> to be deleted when a module unload or reboot was triggered, so it must only be
>>> woken up when the lgr is actually freed.
>>
>> Thanks for your reminding, I will move the wake-up code to __smc_lgr_free().
>>
>> And maybe the vlan put and device put of smcd are also need to be moved
>> to __smc_lgr_free()?, because it also seems to be more suitable to put these
>> resources when lgr is actually freed. What do you think?
>
> Keep the calls to smc_ism_put_vlan() and put_device() in smc_lgr_free(),
> thats okay for SMC-D.
OK.
Thanks,
Wen Gu
On 11/01/2022 16:49, Wen Gu wrote:
> Thanks for your review.
>
> On 2022/1/11 4:40 pm, Karsten Graul wrote:
>> On 10/01/2022 10:26, Wen Gu wrote:
>>> @@ -1226,15 +1245,23 @@ void smcr_link_clear(struct smc_link *lnk, bool log)
>>> smc_wr_free_link(lnk);
>>> smc_ib_destroy_queue_pair(lnk);
>>> smc_ib_dealloc_protection_domain(lnk);
>>> - smc_wr_free_link_mem(lnk);
>>> - smc_lgr_put(lnk->lgr); /* lgr_hold in smcr_link_init() */
>>> smc_ibdev_cnt_dec(lnk);
>>> put_device(&lnk->smcibdev->ibdev->dev);
>>> smcibdev = lnk->smcibdev;
>>> - memset(lnk, 0, sizeof(struct smc_link));
>>> - lnk->state = SMC_LNK_UNUSED;
>>> if (!atomic_dec_return(&smcibdev->lnk_cnt))
>>> wake_up(&smcibdev->lnks_deleted);
>>
>> Same here, waiter should not be woken up until the link memory is actually freed.
>>
>
> OK, I will correct this as well.
>
> And similarly I want to move smc_ibdev_cnt_dec() and put_device() to
> __smcr_link_clear() as well to ensure that put link related resources
> only when link is actually cleared. What do you think?
I think that's a good idea, yes.
On 2022/1/12 00:02, Karsten Graul wrote:
> On 11/01/2022 16:49, Wen Gu wrote:
>>
>> OK, I will correct this as well.
>>
>> And similarly I want to move smc_ibdev_cnt_dec() and put_device() to
>> __smcr_link_clear() as well to ensure that put link related resources
>> only when link is actually cleared. What do you think?
>
> I think that's a good idea, yes.
Thank you.
Not in a hurry, just want to ask should I send a v2 with these changes
or continue to wait for subsequent review of v1?
Thanks,
Wen Gu
On 11/01/2022 17:44, Wen Gu wrote:
>
>
> On 2022/1/12 00:02, Karsten Graul wrote:
>> On 11/01/2022 16:49, Wen Gu wrote:
>>>
>>> OK, I will correct this as well.
>>>
>>> And similarly I want to move smc_ibdev_cnt_dec() and put_device() to
>>> __smcr_link_clear() as well to ensure that put link related resources
>>> only when link is actually cleared. What do you think?
>>
>> I think that's a good idea, yes.
>
> Thank you.
>
> Not in a hurry, just want to ask should I send a v2 with these changes
> or continue to wait for subsequent review of v1?
>
Yeah I think its time for a v2, thank you!