Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756216AbcCBWC7 (ORCPT ); Wed, 2 Mar 2016 17:02:59 -0500 Received: from smtp1.ccs.ornl.gov ([160.91.199.38]:59318 "EHLO smtp1.ccs.ornl.gov" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756107AbcCBWC4 (ORCPT ); Wed, 2 Mar 2016 17:02:56 -0500 From: James Simmons To: Greg Kroah-Hartman , devel@driverdev.osuosl.org, Andreas Dilger , Oleg Drokin Cc: Linux Kernel Mailing List , Lustre Development List , Doug Oucharek Subject: [PATCH 24/27] staging: lustre: Change connect peer failed cleanup order Date: Wed, 2 Mar 2016 17:02:07 -0500 Message-Id: <1456956130-6110-25-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.7.1 In-Reply-To: <1456956130-6110-1-git-send-email-jsimmons@infradead.org> References: <1456956130-6110-1-git-send-email-jsimmons@infradead.org> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1730 Lines: 45 From: Doug Oucharek A race condition has been found where connd is cleaning up failed connections, the peer ref counter goes to zero, but we stil have a connecting counter > 0. One possible race is when we are retrying a connection by calling kiblnd_connect_peer() which itself fails and decrements the peer ref counter and gets swapped out before it can decrement the connecting counter. connd swaps in and cleans up the connection where it sees a peer ref counter of 1 and a connecting counter of 1. This will trigger the assert seen in LU-7210 when it decrements the peer counter. The solution: be sure to decrement the connecting counter before decrementing the peer counter in the peer connect failure path. Signed-off-by: Doug Oucharek Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7210 Reviewed-on: http://review.whamcloud.com/17004 Reviewed-by: James Simmons Reviewed-by: Amir Shehata Reviewed-by: Oleg Drokin --- .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c index 11e12ae..9428166 100644 --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c @@ -1299,8 +1299,10 @@ kiblnd_connect_peer(kib_peer_t *peer) return; failed2: + kiblnd_peer_connect_failed(peer, 1, rc); kiblnd_peer_decref(peer); /* cmid's ref */ rdma_destroy_id(cmid); + return; failed: kiblnd_peer_connect_failed(peer, 1, rc); } -- 1.7.1