Received: by 2002:ac0:a5a6:0:0:0:0:0 with SMTP id m35-v6csp715152imm; Thu, 13 Sep 2018 06:43:54 -0700 (PDT) X-Google-Smtp-Source: ANB0VdY4WmhrRdvejqB5QoWNiqOB6I3tVp+mzBpWTQwdeNHSGjhtcvPjkSDzVyl8vMECcO5wyT52 X-Received: by 2002:a17:902:8348:: with SMTP id z8-v6mr7484112pln.51.1536846234074; Thu, 13 Sep 2018 06:43:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536846234; cv=none; d=google.com; s=arc-20160816; b=z0z4z+6q+DGiajHdu+RdDWtq8i+EJQH5UhoUt4aQlHJzkZzttlDZ6qH6wbrCZYeet6 yTe/dMyF8JQ05Q9mFiSJY2ilsk9iKw0LBZivZ5iadnIdBH7hP+E4sT1KSLUsDtjznNgx LDotdRP+7ccxv764C5OeHjYhbUXroUgBS4hK1oxjFB0+Pjn0Ubq9pxa84ZG/ar3XWWBk UeU5P9ab2g0o1Dk8X1bNy5QN5hVdHTScsYxSVqwo8o7kl0v3TvnlPuIzACdyjiIPJz20 eG3iZdbQS20niCzf7TzlwEHuQ9EIFPnt1DMgst9De1uetyGmQKSeraGbqmhx6sFoqIs0 AP7Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from; bh=vGmewBxpGsVKrZ6sZbaZl9trtMhs5e/YrOAiZpKOqxw=; b=L+HZlTYNk9Rq4bD0c2Btypm+XqJVcX06SqaEQEJb1VBoy/nGFs2gqT7PGSNi3Hti8H msOM8WwSCpLU0xORBg/yPKrGv5ZppmfKeJKnibUErtLFWp5MRl/tOxnVVFiG5uLpYOMX 4H4zmoh61gcCZWy6zrvoMR0n5aPDEZyXoTtXb1yum3x0GjIszq0IuNYMy+bNBjl9nc1e gpOqrhMeGTLdWjQTbbfi6WhRyv/M6Sr3KdliXn5GMJqUF4xpJ3Rg/hupuQQfVc7q10Oo FjtPBI4AL81qMYiMlKf+6X7sPKX14R7lck5UeRHP+nzFssLTN8zu6bSSBQTXsz0ythJK fgaQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h4-v6si4284809pgc.429.2018.09.13.06.43.39; Thu, 13 Sep 2018 06:43:54 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729842AbeIMSxA (ORCPT + 99 others); Thu, 13 Sep 2018 14:53:00 -0400 Received: from mail.linuxfoundation.org ([140.211.169.12]:60340 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728650AbeIMSw7 (ORCPT ); Thu, 13 Sep 2018 14:52:59 -0400 Received: from localhost (ip-213-127-77-73.ip.prioritytelecom.net [213.127.77.73]) by mail.linuxfoundation.org (Postfix) with ESMTPSA id 7636CD10; Thu, 13 Sep 2018 13:43:26 +0000 (UTC) From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Dexuan Cui , Stephen Hemminger , "K. Y. Srinivasan" , Haiyang Zhang , "David S. Miller" Subject: [PATCH 4.14 017/115] hv_netvsc: Fix a deadlock by getting rtnl lock earlier in netvsc_probe() Date: Thu, 13 Sep 2018 15:30:37 +0200 Message-Id: <20180913131824.597582634@linuxfoundation.org> X-Mailer: git-send-email 2.19.0 In-Reply-To: <20180913131823.327472833@linuxfoundation.org> References: <20180913131823.327472833@linuxfoundation.org> User-Agent: quilt/0.65 X-stable: review MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 4.14-stable review patch. If anyone has any objections, please let me know. ------------------ From: Dexuan Cui [ Upstream commit e04e7a7bbd4bbabef4e1a58367e5fc9b2edc3b10 ] This patch fixes the race between netvsc_probe() and rndis_set_subchannel(), which can cause a deadlock. These are the related 3 paths which show the deadlock: path #1: Workqueue: hv_vmbus_con vmbus_onmessage_work [hv_vmbus] Call Trace: schedule schedule_preempt_disabled __mutex_lock __device_attach bus_probe_device device_add vmbus_device_register vmbus_onoffer vmbus_onmessage_work process_one_work worker_thread kthread ret_from_fork path #2: schedule schedule_preempt_disabled __mutex_lock netvsc_probe vmbus_probe really_probe __driver_attach bus_for_each_dev driver_attach_async async_run_entry_fn process_one_work worker_thread kthread ret_from_fork path #3: Workqueue: events netvsc_subchan_work [hv_netvsc] Call Trace: schedule rndis_set_subchannel netvsc_subchan_work process_one_work worker_thread kthread ret_from_fork Before path #1 finishes, path #2 can start to run, because just before the "bus_probe_device(dev);" in device_add() in path #1, there is a line "object_uevent(&dev->kobj, KOBJ_ADD);", so systemd-udevd can immediately try to load hv_netvsc and hence path #2 can start to run. Next, path #2 offloads the subchannal's initialization to a workqueue, i.e. path #3, so we can end up in a deadlock situation like this: Path #2 gets the device lock, and is trying to get the rtnl lock; Path #3 gets the rtnl lock and is waiting for all the subchannel messages to be processed; Path #1 is trying to get the device lock, but since #2 is not releasing the device lock, path #1 has to sleep; since the VMBus messages are processed one by one, this means the sub-channel messages can't be procedded, so #3 has to sleep with the rtnl lock held, and finally #2 has to sleep... Now all the 3 paths are sleeping and we hit the deadlock. With the patch, we can make sure #2 gets both the device lock and the rtnl lock together, gets its job done, and releases the locks, so #1 and #3 will not be blocked for ever. Fixes: 8195b1396ec8 ("hv_netvsc: fix deadlock on hotplug") Signed-off-by: Dexuan Cui Cc: Stephen Hemminger Cc: K. Y. Srinivasan Cc: Haiyang Zhang Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/hyperv/netvsc_drv.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) --- a/drivers/net/hyperv/netvsc_drv.c +++ b/drivers/net/hyperv/netvsc_drv.c @@ -2044,6 +2044,16 @@ static int netvsc_probe(struct hv_device memcpy(net->dev_addr, device_info.mac_adr, ETH_ALEN); + /* We must get rtnl lock before scheduling nvdev->subchan_work, + * otherwise netvsc_subchan_work() can get rtnl lock first and wait + * all subchannels to show up, but that may not happen because + * netvsc_probe() can't get rtnl lock and as a result vmbus_onoffer() + * -> ... -> device_add() -> ... -> __device_attach() can't get + * the device lock, so all the subchannels can't be processed -- + * finally netvsc_subchan_work() hangs for ever. + */ + rtnl_lock(); + if (nvdev->num_chn > 1) schedule_work(&nvdev->subchan_work); @@ -2062,7 +2072,6 @@ static int netvsc_probe(struct hv_device else net->max_mtu = ETH_DATA_LEN; - rtnl_lock(); ret = register_netdevice(net); if (ret != 0) { pr_err("Unable to register netdev.\n");