Received: by 2002:ac0:98c7:0:0:0:0:0 with SMTP id g7-v6csp178446imd; Wed, 31 Oct 2018 16:54:17 -0700 (PDT) X-Google-Smtp-Source: AJdET5csFAHievan75WWLD2RMLeH3GRU5ir6sWETMggAZL96YhNWRTQCsXGtPywuYQleYFpYPxO5 X-Received: by 2002:a62:36c3:: with SMTP id d186-v6mr5272436pfa.133.1541030057597; Wed, 31 Oct 2018 16:54:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1541030057; cv=none; d=google.com; s=arc-20160816; b=WwSxPli54bW7ZXj9eLOZ0MTgowei8oNY/X0Q3vRnY91h3pD8/PxDEz68+oCrHKebTl lx0YlsZFwzqt6jaRfyzXLHXFeLpfOKyezJeI6WP3l3zkfkHE7Xmsgta98Gf+YsLCeKRi JnQ58yYl6NIpcXyTq80kKYUPoy1GPE7laVmFWyqTZxFMwZRxbkZITF4CxBZY0zbN5+A+ N2pHDjHsaFlgCR3vOAPvAJlrSUuKhDB6i8Teyo81F/tmIJ7dEOWdTXwzR5yvYNs7lbVM nL56OR4IM91Gjkkqj68EkKk3hlNOiUX72361wGkxl+r9lVCRmrbMkcC6gCXz1O3TdTqC gz/Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature; bh=qX2OHtRqMSUsWfK3djAOPWBcizKcKBg9TdG7dMhR8ZM=; b=aTuQXzJB+w+k7v2hST4RABSUAPYdZ6kg9gQmI+yVT6zYJXc0ZWVRAPOU7bYe2eCz03 VEwK4CUTCZaqRwmLj/c3eARcvl0pwtdO1+iqOdrt/cGcvtALU1a9lTcm6H9aNnz8oT04 iky0QuvswzHfvm/4mVlU3OAUk85oKkoii13KFlYx1kb2Tk3D/05SkgGQnFL0yJ9XsgyN cl3xWJw1bQ6IRZ7ddR3MtvyJJ2Uko+ve2RkmkySV7izMsI3YQBlsthNGKVQZ9GyB8eNn 5xIiSSzdPb/d4hAOhe1m9Cd4HLByad6tRp7hgNBhTvIrJBCZd0gOq7ioG+wDp0rA8euV sqjQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=MMIAMbpp; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id c12-v6si28501874pgl.551.2018.10.31.16.54.02; Wed, 31 Oct 2018 16:54:17 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=MMIAMbpp; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727876AbeKAIGY (ORCPT + 99 others); Thu, 1 Nov 2018 04:06:24 -0400 Received: from mail.kernel.org ([198.145.29.99]:53246 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727752AbeKAIGW (ORCPT ); Thu, 1 Nov 2018 04:06:22 -0400 Received: from sasha-vm.mshome.net (c-73-47-72-35.hsd1.nh.comcast.net [73.47.72.35]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 2972B2081B; Wed, 31 Oct 2018 23:06:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1541027171; bh=XEf0RlrhqaRxAJNBmKEdpP8scWQk9smH2GgejxoPUmM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=MMIAMbppf9cLNnuS+AymeW34mgFWGN8JOwRSJb6z40ajcCGG9PsPyBhfu4hNZlBCL il+F5tehXk6sw+HhEMX858hPEjq8kc88fmf+dhtnyUXBshVPij0+a0D7tZPVeheYqn 65fUAaQ3EsM3x2INXz/yjQsGpau/sJDRkF/VsOSA= From: Sasha Levin To: stable@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Sagi Grimberg , Christoph Hellwig , Sasha Levin Subject: [PATCH AUTOSEL 4.19 032/146] nvmet-rdma: use a private workqueue for delete Date: Wed, 31 Oct 2018 19:03:47 -0400 Message-Id: <20181031230541.28822-32-sashal@kernel.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181031230541.28822-1-sashal@kernel.org> References: <20181031230541.28822-1-sashal@kernel.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Sagi Grimberg [ Upstream commit 2acf70ade79d26b97611a8df52eb22aa33814cd4 ] Queue deletion is done asynchronous when the last reference on the queue is dropped. Thus, in order to make sure we don't over allocate under a connect/disconnect storm, we let queue deletion complete before making forward progress. However, given that we flush the system_wq from rdma_cm context which runs from a workqueue context, we can have a circular locking complaint [1]. Fix that by using a private workqueue for queue deletion. [1]: ====================================================== WARNING: possible circular locking dependency detected 4.19.0-rc4-dbg+ #3 Not tainted ------------------------------------------------------ kworker/5:0/39 is trying to acquire lock: 00000000a10b6db9 (&id_priv->handler_mutex){+.+.}, at: rdma_destroy_id+0x6f/0x440 [rdma_cm] but task is already holding lock: 00000000331b4e2c ((work_completion)(&queue->release_work)){+.+.}, at: process_one_work+0x3ed/0xa20 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 ((work_completion)(&queue->release_work)){+.+.}: process_one_work+0x474/0xa20 worker_thread+0x63/0x5a0 kthread+0x1cf/0x1f0 ret_from_fork+0x24/0x30 -> #2 ((wq_completion)"events"){+.+.}: flush_workqueue+0xf3/0x970 nvmet_rdma_cm_handler+0x133d/0x1734 [nvmet_rdma] cma_ib_req_handler+0x72f/0xf90 [rdma_cm] cm_process_work+0x2e/0x110 [ib_cm] cm_req_handler+0x135b/0x1c30 [ib_cm] cm_work_handler+0x2b7/0x38cd [ib_cm] process_one_work+0x4ae/0xa20 nvmet_rdma:nvmet_rdma_cm_handler: nvmet_rdma: disconnected (10): status 0 id 0000000040357082 worker_thread+0x63/0x5a0 kthread+0x1cf/0x1f0 ret_from_fork+0x24/0x30 nvme nvme0: Reconnecting in 10 seconds... -> #1 (&id_priv->handler_mutex/1){+.+.}: __mutex_lock+0xfe/0xbe0 mutex_lock_nested+0x1b/0x20 cma_ib_req_handler+0x6aa/0xf90 [rdma_cm] cm_process_work+0x2e/0x110 [ib_cm] cm_req_handler+0x135b/0x1c30 [ib_cm] cm_work_handler+0x2b7/0x38cd [ib_cm] process_one_work+0x4ae/0xa20 worker_thread+0x63/0x5a0 kthread+0x1cf/0x1f0 ret_from_fork+0x24/0x30 -> #0 (&id_priv->handler_mutex){+.+.}: lock_acquire+0xc5/0x200 __mutex_lock+0xfe/0xbe0 mutex_lock_nested+0x1b/0x20 rdma_destroy_id+0x6f/0x440 [rdma_cm] nvmet_rdma_release_queue_work+0x8e/0x1b0 [nvmet_rdma] process_one_work+0x4ae/0xa20 worker_thread+0x63/0x5a0 kthread+0x1cf/0x1f0 ret_from_fork+0x24/0x30 Fixes: 777dc82395de ("nvmet-rdma: occasionally flush ongoing controller teardown") Reported-by: Bart Van Assche Signed-off-by: Sagi Grimberg Tested-by: Bart Van Assche Signed-off-by: Christoph Hellwig Signed-off-by: Sasha Levin --- drivers/nvme/target/rdma.c | 19 +++++++++++++++---- 1 file changed, 15 insertions(+), 4 deletions(-) diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c index bfc4da660bb4..5becca88ccbe 100644 --- a/drivers/nvme/target/rdma.c +++ b/drivers/nvme/target/rdma.c @@ -122,6 +122,7 @@ struct nvmet_rdma_device { int inline_page_count; }; +struct workqueue_struct *nvmet_rdma_delete_wq; static bool nvmet_rdma_use_srq; module_param_named(use_srq, nvmet_rdma_use_srq, bool, 0444); MODULE_PARM_DESC(use_srq, "Use shared receive queue."); @@ -1267,12 +1268,12 @@ static int nvmet_rdma_queue_connect(struct rdma_cm_id *cm_id, if (queue->host_qid == 0) { /* Let inflight controller teardown complete */ - flush_scheduled_work(); + flush_workqueue(nvmet_rdma_delete_wq); } ret = nvmet_rdma_cm_accept(cm_id, queue, &event->param.conn); if (ret) { - schedule_work(&queue->release_work); + queue_work(nvmet_rdma_delete_wq, &queue->release_work); /* Destroying rdma_cm id is not needed here */ return 0; } @@ -1337,7 +1338,7 @@ static void __nvmet_rdma_queue_disconnect(struct nvmet_rdma_queue *queue) if (disconnect) { rdma_disconnect(queue->cm_id); - schedule_work(&queue->release_work); + queue_work(nvmet_rdma_delete_wq, &queue->release_work); } } @@ -1367,7 +1368,7 @@ static void nvmet_rdma_queue_connect_fail(struct rdma_cm_id *cm_id, mutex_unlock(&nvmet_rdma_queue_mutex); pr_err("failed to connect queue %d\n", queue->idx); - schedule_work(&queue->release_work); + queue_work(nvmet_rdma_delete_wq, &queue->release_work); } /** @@ -1649,8 +1650,17 @@ static int __init nvmet_rdma_init(void) if (ret) goto err_ib_client; + nvmet_rdma_delete_wq = alloc_workqueue("nvmet-rdma-delete-wq", + WQ_UNBOUND | WQ_MEM_RECLAIM | WQ_SYSFS, 0); + if (!nvmet_rdma_delete_wq) { + ret = -ENOMEM; + goto err_unreg_transport; + } + return 0; +err_unreg_transport: + nvmet_unregister_transport(&nvmet_rdma_ops); err_ib_client: ib_unregister_client(&nvmet_rdma_ib_client); return ret; @@ -1658,6 +1668,7 @@ static int __init nvmet_rdma_init(void) static void __exit nvmet_rdma_exit(void) { + destroy_workqueue(nvmet_rdma_delete_wq); nvmet_unregister_transport(&nvmet_rdma_ops); ib_unregister_client(&nvmet_rdma_ib_client); WARN_ON_ONCE(!list_empty(&nvmet_rdma_queue_list)); -- 2.17.1