Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756554AbcCBWEA (ORCPT ); Wed, 2 Mar 2016 17:04:00 -0500 Received: from smtp1.ccs.ornl.gov ([160.91.199.38]:59318 "EHLO smtp1.ccs.ornl.gov" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756056AbcCBWCy (ORCPT ); Wed, 2 Mar 2016 17:02:54 -0500 From: James Simmons To: Greg Kroah-Hartman , devel@driverdev.osuosl.org, Andreas Dilger , Oleg Drokin Cc: Linux Kernel Mailing List , Lustre Development List , Liang Zhen Subject: [PATCH 23/27] staging: lustre: take extra refcount in kiblnd_connreq_done Date: Wed, 2 Mar 2016 17:02:06 -0500 Message-Id: <1456956130-6110-24-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.7.1 In-Reply-To: <1456956130-6110-1-git-send-email-jsimmons@infradead.org> References: <1456956130-6110-1-git-send-email-jsimmons@infradead.org> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2474 Lines: 71 From: Liang Zhen refcount taken by cmid is not reliable after kiblnd_connreq_done released the glock because this connection is visible to other threads, another thread can find and close this connection right after kiblnd_connreq_done released the glock, if kiblnd_cm_callback for RDMA_CM_EVENT_DISCONNECTED is called, it can release the connection refcount taken by cmid. It means the connection could be destroyed before kiblnd_connreq_done() finish operations on it. Signed-off-by: Liang Zhen ntel-bug-id: https://jira.hpdd.intel.com/browse/LU-7210 Reviewed-on: http://review.whamcloud.com/17527 Reviewed-by: Doug Oucharek Reviewed-by: James Simmons Tested-by: James Simmons Reviewed-by: Oleg Drokin --- .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c | 16 ++++++++++++---- 1 files changed, 12 insertions(+), 4 deletions(-) diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c index fb3873a..11e12ae 100644 --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c @@ -939,8 +939,6 @@ kiblnd_check_sends(kib_conn_t *conn) kiblnd_queue_tx_locked(tx, conn); } - kiblnd_conn_addref(conn); /* 1 ref for me.... (see b21911) */ - for (;;) { int credit; @@ -966,8 +964,6 @@ kiblnd_check_sends(kib_conn_t *conn) } spin_unlock(&conn->ibc_lock); - - kiblnd_conn_decref(conn); /* ...until here */ } static void @@ -2132,6 +2128,16 @@ kiblnd_connreq_done(kib_conn_t *conn, int status) return; } + /** + * refcount taken by cmid is not reliable after I released the glock + * because this connection is visible to other threads now, another + * thread can find and close this connection right after I released + * the glock, if kiblnd_cm_callback for RDMA_CM_EVENT_DISCONNECTED is + * called, it can release the connection refcount taken by cmid. + * It means the connection could be destroyed before I finish my + * operations on it. + */ + kiblnd_conn_addref(conn); write_unlock_irqrestore(&kiblnd_data.kib_global_lock, flags); /* Schedule blocked txs */ @@ -2147,6 +2153,8 @@ kiblnd_connreq_done(kib_conn_t *conn, int status) /* schedule blocked rxs */ kiblnd_handle_early_rxs(conn); + + kiblnd_conn_decref(conn); } static void -- 1.7.1