Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752524AbcDSASp (ORCPT ); Mon, 18 Apr 2016 20:18:45 -0400 Received: from szxga02-in.huawei.com ([119.145.14.65]:57061 "EHLO szxga02-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751918AbcDSASo (ORCPT ); Mon, 18 Apr 2016 20:18:44 -0400 Subject: Re: [PATCH 3.4 61/92] ocfs2/dlm: fix deadlock when dispatch assert master To: Joseph Qi , References: <1460976338-5631-1-git-send-email-lizf@kernel.org> <1460976397-5688-61-git-send-email-lizf@kernel.org> <5714C536.4070308@huawei.com> CC: , , Joel Becker , Mark Fasheh , Junxiao Bi , Andrew Morton , "Linus Torvalds" From: Zefan Li Message-ID: <57157951.2090105@huawei.com> Date: Tue, 19 Apr 2016 08:18:25 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1 MIME-Version: 1.0 In-Reply-To: <5714C536.4070308@huawei.com> Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.177.19.236] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3698 Lines: 114 On 2016/4/18 19:29, Joseph Qi wrote: > Hi Zefan, > In dlm_master_request_handler, it has missed the following change : > - dlm_put(dlm); > + if (!dispatched) > + dlm_put(dlm); Oops..I guess I forgot to refresh the patch after adjusting the context. Thanks for the review! > > Thanks, > Joseph > > On 2016/4/18 18:46, lizf@kernel.org wrote: >> From: Joseph Qi >> >> 3.4.112-rc1 review patch. If anyone has any objections, please let me know. >> >> ------------------ >> >> >> commit 012572d4fc2e4ddd5c8ec8614d51414ec6cae02a upstream. >> >> The order of the following three spinlocks should be: >> dlm_domain_lock < dlm_ctxt->spinlock < dlm_lock_resource->spinlock >> >> But dlm_dispatch_assert_master() is called while holding >> dlm_ctxt->spinlock and dlm_lock_resource->spinlock, and then it calls >> dlm_grab() which will take dlm_domain_lock. >> >> Once another thread (for example, dlm_query_join_handler) has already >> taken dlm_domain_lock, and tries to take dlm_ctxt->spinlock deadlock >> happens. >> >> Signed-off-by: Joseph Qi >> Cc: Joel Becker >> Cc: Mark Fasheh >> Cc: "Junxiao Bi" >> Signed-off-by: Andrew Morton >> Signed-off-by: Linus Torvalds >> [lizf: Backported to 3.4: adjust context] >> Signed-off-by: Zefan Li >> --- >> fs/ocfs2/dlm/dlmmaster.c | 4 +++- >> fs/ocfs2/dlm/dlmrecovery.c | 6 +++++- >> 2 files changed, 8 insertions(+), 2 deletions(-) >> >> diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c >> index 7ba6ac1..751efa8 100644 >> --- a/fs/ocfs2/dlm/dlmmaster.c >> +++ b/fs/ocfs2/dlm/dlmmaster.c >> @@ -1411,6 +1411,7 @@ int dlm_master_request_handler(struct o2net_msg *msg, u32 len, void *data, >> int found, ret; >> int set_maybe; >> int dispatch_assert = 0; >> + int dispatched = 0; >> >> if (!dlm_grab(dlm)) >> return DLM_MASTER_RESP_NO; >> @@ -1617,6 +1618,8 @@ send_response: >> mlog(ML_ERROR, "failed to dispatch assert master work\n"); >> response = DLM_MASTER_RESP_ERROR; >> dlm_lockres_put(res); >> + } else { >> + dispatched = 1; >> } >> } else { >> if (res) >> @@ -2041,7 +2044,6 @@ int dlm_dispatch_assert_master(struct dlm_ctxt *dlm, >> >> >> /* queue up work for dlm_assert_master_worker */ >> - dlm_grab(dlm); /* get an extra ref for the work item */ >> dlm_init_work_item(dlm, item, dlm_assert_master_worker, NULL); >> item->u.am.lockres = res; /* already have a ref */ >> /* can optionally ignore node numbers higher than this node */ >> diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c >> index d15b071..0e5013e 100644 >> --- a/fs/ocfs2/dlm/dlmrecovery.c >> +++ b/fs/ocfs2/dlm/dlmrecovery.c >> @@ -1689,6 +1689,7 @@ int dlm_master_requery_handler(struct o2net_msg *msg, u32 len, void *data, >> unsigned int hash; >> int master = DLM_LOCK_RES_OWNER_UNKNOWN; >> u32 flags = DLM_ASSERT_MASTER_REQUERY; >> + int dispatched = 0; >> >> if (!dlm_grab(dlm)) { >> /* since the domain has gone away on this >> @@ -1710,6 +1711,8 @@ int dlm_master_requery_handler(struct o2net_msg *msg, u32 len, void *data, >> mlog_errno(-ENOMEM); >> /* retry!? */ >> BUG(); >> + } else { >> + dispatched = 1; >> } >> } else /* put.. incase we are not the master */ >> dlm_lockres_put(res); >> @@ -1717,7 +1720,8 @@ int dlm_master_requery_handler(struct o2net_msg *msg, u32 len, void *data, >> } >> spin_unlock(&dlm->spinlock); >> >> - dlm_put(dlm); >> + if (!dispatched) >> + dlm_put(dlm); >> return master; >> } >> >> > > > . >