Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758408Ab2KVW3q (ORCPT ); Thu, 22 Nov 2012 17:29:46 -0500 Received: from mail.kernel.org ([198.145.19.201]:49021 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753631Ab2KVSdz (ORCPT ); Thu, 22 Nov 2012 13:33:55 -0500 From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org, stable@vger.kernel.org Cc: Greg Kroah-Hartman , alan@lxorguk.ukuu.org.uk, Alex Elder , Sage Weil Subject: [ 165/171] rbd: reset BACKOFF if unable to re-queue Date: Wed, 21 Nov 2012 16:41:51 -0800 Message-Id: <20121122004049.899615915@linuxfoundation.org> X-Mailer: git-send-email 1.8.0.197.g5a90748 In-Reply-To: <20121122004033.298367941@linuxfoundation.org> References: <20121122004033.298367941@linuxfoundation.org> User-Agent: quilt/0.60-2.1.2 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2268 Lines: 64 3.4-stable review patch. If anyone has any objections, please let me know. ------------------ From: Alex Elder (cherry picked from commit 588377d6199034c36d335e7df5818b731fea072c) If ceph_fault() is unable to queue work after a delay, it sets the BACKOFF connection flag so con_work() will attempt to do so. In con_work(), when BACKOFF is set, if queue_delayed_work() doesn't result in newly-queued work, it simply ignores this condition and proceeds as if no backoff delay were desired. There are two problems with this--one of which is a bug. The first problem is simply that the intended behavior is to back off, and if we aren't able queue the work item to run after a delay we're not doing that. The only reason queue_delayed_work() won't queue work is if the provided work item is already queued. In the messenger, this means that con_work() is already scheduled to be run again. So if we simply set the BACKOFF flag again when this occurs, we know the next con_work() call will again attempt to hold off activity on the connection until after the delay. The second problem--the bug--is a leak of a reference count. If queue_delayed_work() returns 0 in con_work(), con->ops->put() drops the connection reference held on entry to con_work(). However, processing is (was) allowed to continue, and at the end of the function a second con->ops->put() is called. This patch fixes both problems. Signed-off-by: Alex Elder Reviewed-by: Sage Weil Signed-off-by: Greg Kroah-Hartman --- net/ceph/messenger.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) --- a/net/ceph/messenger.c +++ b/net/ceph/messenger.c @@ -2296,10 +2296,11 @@ restart: mutex_unlock(&con->mutex); return; } else { - con->ops->put(con); dout("con_work %p FAILED to back off %lu\n", con, con->delay); + set_bit(CON_FLAG_BACKOFF, &con->flags); } + goto done; } if (con->state == CON_STATE_STANDBY) { -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/