Received: by 2002:a5b:505:0:0:0:0:0 with SMTP id o5csp3394976ybp; Sun, 6 Oct 2019 11:00:19 -0700 (PDT) X-Google-Smtp-Source: APXvYqzSV7TmFihWgpm15O80iYRlRxhU2aEN3x5RgYGSkOaLouKpptrV/kQuCnyn98E9ZkkkVTJQ X-Received: by 2002:a17:906:18a1:: with SMTP id c1mr20742893ejf.4.1570384819457; Sun, 06 Oct 2019 11:00:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1570384819; cv=none; d=google.com; s=arc-20160816; b=Gx5FySe0Jm9VQb8E5rz//kbZABTZ2sibRjkD59WkZw10qFJmc8bE57UO6rNOt5Nf9d UmI58dhuIZtIsJb9NOtpelLLH34ZzjjRdDbyKC3/WbdCFAajolLm9AwdLTpT0exyleOp jHnRtmOVPlPYkpV5rndoo3ya3rhqNV+cQLO0dtvqa5JLPWtsA9IMYMQXLXMUONKFMeLF mf169HqLPbWNiCMFBZNe+LL71ljcS+gyyN4TRPZ1B+uEYVmRcZW8fl1q6We9unpDc9RI usyegIJ2ZAtPRoxonUIK9yaR8CxwH66U8In08tpHljlFVoyGCNSSBE8HQrbs2kBPMX2/ NrNQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=lGxrzq9AlFeYZpjvhLVQMeOSgLBvuNnvE4PA9SQC61Y=; b=CzxtB3dH2KcaB5/3nnVWAiqzx9vplt1wbUx0xAtpckFzGwb/SC+A1nCNnbJ7EMBIUR JJIeP3+ZoPWy8fudzJwbOUM1hgZV41MgmcJM+3CeAJvaKtXD26yWl0DzlH0xD7pn1zZt u1Xuvonau/GjB6mOFApxxEUgLqmmVmMYlnrGrq/ML/kJxVQ3ErhlH092BcqoWpp92oD0 vFFiSWVy6knfTutBolkmuRIQp5JHYs2v8ZL7au0M7gwhJWU1n/ZzUMoZbzO2NmT757qE k+yFAdTvGOgDOhK3/tMCtwgNbSiiuQnpLEcxYGdUcpptmZAY1ObfQ0XZUKfBW5iN3Yp5 ASvQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=To5ys4tH; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p38si7354794edc.449.2019.10.06.10.59.55; Sun, 06 Oct 2019 11:00:19 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=To5ys4tH; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728504AbfJFRbH (ORCPT + 99 others); Sun, 6 Oct 2019 13:31:07 -0400 Received: from mail.kernel.org ([198.145.29.99]:57174 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726949AbfJFRbF (ORCPT ); Sun, 6 Oct 2019 13:31:05 -0400 Received: from localhost (83-86-89-107.cable.dynamic.v4.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 8FFE72133F; Sun, 6 Oct 2019 17:31:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1570383065; bh=+7ukwZYjRuqp0HaB2giqWF8t5YMps7LpkoRS2U8DhoA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=To5ys4tHXG7LK/9wekfvLZq65o5Goj+Oa4Pz/HxCd7SOZj2B10eetQGZPShc1mhng U2X4ZTNTOqVa12Wu2JLckHqaGBy1lGJJpKjCM8tHzN/xo4UQ95+Y2SzCEo6Xn2n/xp btHcIDKMkdy+RPTLhP/3jsDviHE2/93UAEe0RkoM= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Changwei Ge , Joseph Qi , Mark Fasheh , Joel Becker , Junxiao Bi , Andrew Morton , Linus Torvalds , Sasha Levin Subject: [PATCH 4.19 072/106] ocfs2: wait for recovering done after direct unlock request Date: Sun, 6 Oct 2019 19:21:18 +0200 Message-Id: <20191006171153.744922590@linuxfoundation.org> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20191006171124.641144086@linuxfoundation.org> References: <20191006171124.641144086@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Changwei Ge [ Upstream commit 0a3775e4f883912944481cf2ef36eb6383a9cc74 ] There is a scenario causing ocfs2 umount hang when multiple hosts are rebooting at the same time. NODE1 NODE2 NODE3 send unlock requset to NODE2 dies become recovery master recover NODE2 find NODE2 dead mark resource RECOVERING directly remove lock from grant list calculate usage but RECOVERING marked **miss the window of purging clear RECOVERING To reproduce this issue, crash a host and then umount ocfs2 from another node. To solve this, just let unlock progress wait for recovery done. Link: http://lkml.kernel.org/r/1550124866-20367-1-git-send-email-gechangwei@live.cn Signed-off-by: Changwei Ge Reviewed-by: Joseph Qi Cc: Mark Fasheh Cc: Joel Becker Cc: Junxiao Bi Cc: Changwei Ge Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin --- fs/ocfs2/dlm/dlmunlock.c | 23 +++++++++++++++++++---- 1 file changed, 19 insertions(+), 4 deletions(-) diff --git a/fs/ocfs2/dlm/dlmunlock.c b/fs/ocfs2/dlm/dlmunlock.c index 63d701cd1e2e7..c8e9b7031d9ad 100644 --- a/fs/ocfs2/dlm/dlmunlock.c +++ b/fs/ocfs2/dlm/dlmunlock.c @@ -105,7 +105,8 @@ static enum dlm_status dlmunlock_common(struct dlm_ctxt *dlm, enum dlm_status status; int actions = 0; int in_use; - u8 owner; + u8 owner; + int recovery_wait = 0; mlog(0, "master_node = %d, valblk = %d\n", master_node, flags & LKM_VALBLK); @@ -208,9 +209,12 @@ static enum dlm_status dlmunlock_common(struct dlm_ctxt *dlm, } if (flags & LKM_CANCEL) lock->cancel_pending = 0; - else - lock->unlock_pending = 0; - + else { + if (!lock->unlock_pending) + recovery_wait = 1; + else + lock->unlock_pending = 0; + } } /* get an extra ref on lock. if we are just switching @@ -244,6 +248,17 @@ leave: spin_unlock(&res->spinlock); wake_up(&res->wq); + if (recovery_wait) { + spin_lock(&res->spinlock); + /* Unlock request will directly succeed after owner dies, + * and the lock is already removed from grant list. We have to + * wait for RECOVERING done or we miss the chance to purge it + * since the removement is much faster than RECOVERING proc. + */ + __dlm_wait_on_lockres_flags(res, DLM_LOCK_RES_RECOVERING); + spin_unlock(&res->spinlock); + } + /* let the caller's final dlm_lock_put handle the actual kfree */ if (actions & DLM_UNLOCK_FREE_LOCK) { /* this should always be coupled with list removal */ -- 2.20.1