Received: by 2002:a25:824b:0:0:0:0:0 with SMTP id d11csp6023584ybn; Sun, 29 Sep 2019 10:40:00 -0700 (PDT) X-Google-Smtp-Source: APXvYqyHKzFBe7DAdWq3qY5sbtHNlSCXcGYae1xmu7Cpjl+YKoKYS8b728VFN7uaudWE04usZBPI X-Received: by 2002:a50:8ad1:: with SMTP id k17mr15816680edk.243.1569778799927; Sun, 29 Sep 2019 10:39:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1569778799; cv=none; d=google.com; s=arc-20160816; b=lm2FunryObR6cnITsEkS88LTsZhBNzMI4TH/1SF2JTZSjnDcnJK2/CpumcgRL7xGAD k1QTwwEo6HYobZb+KU2SOBcNMMEyHepKiOJsFR8DsAD+vIZOsuoJ1b2VKq37EPmrmocr iJpDumr3/v+hSOK1zDe12NuBlC+QecZ+zSxc1s+cvYcMRqtYzMbvExjK1f0x8RCxB/jo D6YA/kaMR2d6BlDr6OwB8CWf+D5i+9GOukkjdPM0FpG27bj4QJOb1kY7RvZEgmSTIl13 BrkJypNEceArzv2C9f4A3nhQuzkqacitTWnC2urm0KkZ37LG5siKz9q+K7DWjh5XrwPn lTiQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=q5MDREnLfpQwGgey3jRVOXD7Dy25CKNGlLxEmGtitTw=; b=MBLDqYYwezJAW/ncL/knmUC8NqqsXqUYCr5JLXo+Sl+2nvGBB9zF4m47UYqolypYPy dp4pf709yFPWEOah3BN7OTE6ontpkWiy39qhUP4ZZwd3HZAKMOb/S3c6r5o4REX6ZlBV ENBqQLU/av6LB2eEYLWTvsjPkmOONaLocWA21mDjcwGa9KFnT2s9XohNXxk8cKSDS0GJ Q55F1iT0KszkMz2WqMK30aTZVqSFNJXOjFwnUZKrU3/4CD2eLtim1PlDQMqw4si4Y+iF aJsVs6oJB2FQ7d03RuspCQdt85LVpMbdhcPKgsVkabnveppMbXldGxNLjd039aoA2rvu DFag== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=UoC9joQQ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id gh12si5734238ejb.191.2019.09.29.10.39.35; Sun, 29 Sep 2019 10:39:59 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=UoC9joQQ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730572AbfI2RfX (ORCPT + 99 others); Sun, 29 Sep 2019 13:35:23 -0400 Received: from mail.kernel.org ([198.145.29.99]:47396 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729675AbfI2RfQ (ORCPT ); Sun, 29 Sep 2019 13:35:16 -0400 Received: from sasha-vm.mshome.net (c-73-47-72-35.hsd1.nh.comcast.net [73.47.72.35]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 9D0D121906; Sun, 29 Sep 2019 17:35:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1569778515; bh=63/jWXKufjyiFg2ahV8Id1e3jfvZ9rfSOZpiyU+pRdk=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=UoC9joQQrnCRESraY+tWbQiLBlvLBLDglxn8GvCp5z+q+X47nXKmwU0ShdsuookZB r1NecfcD7zg7DS0qFXO61TE8UhQoefYTa7rVPr93bqAtnQ8yTinqXZ9WvuY8Vf7eH1 izJpgDUPobgPnDcZL/jnyItTYuuzwIio5XbGZRqI= From: Sasha Levin To: linux-kernel@vger.kernel.org, stable@vger.kernel.org Cc: Changwei Ge , Joseph Qi , Mark Fasheh , Joel Becker , Junxiao Bi , Andrew Morton , Linus Torvalds , Sasha Levin Subject: [PATCH AUTOSEL 4.19 28/33] ocfs2: wait for recovering done after direct unlock request Date: Sun, 29 Sep 2019 13:34:16 -0400 Message-Id: <20190929173424.9361-28-sashal@kernel.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190929173424.9361-1-sashal@kernel.org> References: <20190929173424.9361-1-sashal@kernel.org> MIME-Version: 1.0 X-stable: review X-Patchwork-Hint: Ignore Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Changwei Ge [ Upstream commit 0a3775e4f883912944481cf2ef36eb6383a9cc74 ] There is a scenario causing ocfs2 umount hang when multiple hosts are rebooting at the same time. NODE1 NODE2 NODE3 send unlock requset to NODE2 dies become recovery master recover NODE2 find NODE2 dead mark resource RECOVERING directly remove lock from grant list calculate usage but RECOVERING marked **miss the window of purging clear RECOVERING To reproduce this issue, crash a host and then umount ocfs2 from another node. To solve this, just let unlock progress wait for recovery done. Link: http://lkml.kernel.org/r/1550124866-20367-1-git-send-email-gechangwei@live.cn Signed-off-by: Changwei Ge Reviewed-by: Joseph Qi Cc: Mark Fasheh Cc: Joel Becker Cc: Junxiao Bi Cc: Changwei Ge Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin --- fs/ocfs2/dlm/dlmunlock.c | 23 +++++++++++++++++++---- 1 file changed, 19 insertions(+), 4 deletions(-) diff --git a/fs/ocfs2/dlm/dlmunlock.c b/fs/ocfs2/dlm/dlmunlock.c index 63d701cd1e2e7..c8e9b7031d9ad 100644 --- a/fs/ocfs2/dlm/dlmunlock.c +++ b/fs/ocfs2/dlm/dlmunlock.c @@ -105,7 +105,8 @@ static enum dlm_status dlmunlock_common(struct dlm_ctxt *dlm, enum dlm_status status; int actions = 0; int in_use; - u8 owner; + u8 owner; + int recovery_wait = 0; mlog(0, "master_node = %d, valblk = %d\n", master_node, flags & LKM_VALBLK); @@ -208,9 +209,12 @@ static enum dlm_status dlmunlock_common(struct dlm_ctxt *dlm, } if (flags & LKM_CANCEL) lock->cancel_pending = 0; - else - lock->unlock_pending = 0; - + else { + if (!lock->unlock_pending) + recovery_wait = 1; + else + lock->unlock_pending = 0; + } } /* get an extra ref on lock. if we are just switching @@ -244,6 +248,17 @@ static enum dlm_status dlmunlock_common(struct dlm_ctxt *dlm, spin_unlock(&res->spinlock); wake_up(&res->wq); + if (recovery_wait) { + spin_lock(&res->spinlock); + /* Unlock request will directly succeed after owner dies, + * and the lock is already removed from grant list. We have to + * wait for RECOVERING done or we miss the chance to purge it + * since the removement is much faster than RECOVERING proc. + */ + __dlm_wait_on_lockres_flags(res, DLM_LOCK_RES_RECOVERING); + spin_unlock(&res->spinlock); + } + /* let the caller's final dlm_lock_put handle the actual kfree */ if (actions & DLM_UNLOCK_FREE_LOCK) { /* this should always be coupled with list removal */ -- 2.20.1