Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp443203pxj; Thu, 20 May 2021 13:03:21 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwXrlmKEnLr/LVYLOreJ2MaGJM5XUyGsVl1vmQfd45j9JzKqLNxNkjh01yM/q1fvCDD6pcI X-Received: by 2002:a05:6402:30a2:: with SMTP id df2mr6668780edb.176.1621541000911; Thu, 20 May 2021 13:03:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1621541000; cv=none; d=google.com; s=arc-20160816; b=0qClefebh9HQwJDJPSGpcQHjv4+Ca8gl9Mu10kEeqE4ZnztCnKF04Tr8FzwjuF45nb spYSnOkBp14OvVBpY8t5GHH0BXzQum0ExHL59iQSG+3om8q5+FAwF1/xenmAMpXET36e YA0Bp8COtYnic3lB1ygRb1imDZSI6ohh8EU0oLxDQqjPQ/uvV/3yRn5ihVFY9YIfpJ+O kTi/pJ4KeleCHNe6mOFB1Po1cTcd0mc/aJL6FYYkFuwloZr4RkpiQgQBYgDPugqtaU2J VEBbvv16YX5tLR9qnM08U7iIgnHxFv5DwCPjx0NtJvOL++onDm4+xga2Yi9XnFVkz0PP C/5A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=72ns4RIasfvr1jnSUATxzJ/GhNNBjbM9BO5EBUMGR+Y=; b=PrimzzKzouS99oeLncrUk7n9q3l/Z7Nt+ql9aq4F2Tr7Xou2jqAllUk8Pv2fdQwq6j G24V4Eif0KovUp4uT2nDrJ7hx/xfKehixuhqLpdsxSiDQqkXh2TiOI3pJQnycymbrDMP ontXCHpcqxIWqYkhsRUomxIMNfcM0JoMScFLuAPPE3TKM1S44mOaniegEEfHv0As0mzK lT3a/2VMWUslDBw5KPhnG8+zG5ZeuGOb+K9ewSM97FgnIUcE7e1HK90eQYhnrYAqKXVS BwesNFpQR7DE/RxD14/44Puecw8BTJVGwGuXi2VX03XCr28q+DLMXusUthanMh1daDdu jllA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=Ay4Sw5jd; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id t20si2869346edw.7.2021.05.20.13.02.46; Thu, 20 May 2021 13:03:20 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=Ay4Sw5jd; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237447AbhETKpi (ORCPT + 99 others); Thu, 20 May 2021 06:45:38 -0400 Received: from mail.kernel.org ([198.145.29.99]:60398 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237064AbhETK3b (ORCPT ); Thu, 20 May 2021 06:29:31 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 726BB61402; Thu, 20 May 2021 09:51:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1621504285; bh=Z1f1XwR9OBuTvU9C/2CVawYZsoX+BiYvvOlAfS8vKMs=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Ay4Sw5jdSt5MAix1xMjb/rJ1qyxc86c3zK7JMIRpcMGf+h3sHFwrP3fgBU+sN9zx6 R3s7SMC/tHYOAIkBO0Ua4FmG4sPHT5Au2OYwgwYovW5SshllSRwcTSTNs8KT0WiFFw r5ixy/fqofOcXj78nijkam+CTIXHPS3tPY1Oa/Yg= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Michael Kelley , Vitaly Kuznetsov , Wei Liu , Sasha Levin Subject: [PATCH 4.14 177/323] Drivers: hv: vmbus: Increase wait time for VMbus unload Date: Thu, 20 May 2021 11:21:09 +0200 Message-Id: <20210520092126.180009798@linuxfoundation.org> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210520092120.115153432@linuxfoundation.org> References: <20210520092120.115153432@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Michael Kelley [ Upstream commit 77db0ec8b7764cb9b09b78066ebfd47b2c0c1909 ] When running in Azure, disks may be connected to a Linux VM with read/write caching enabled. If a VM panics and issues a VMbus UNLOAD request to Hyper-V, the response is delayed until all dirty data in the disk cache is flushed. In extreme cases, this flushing can take 10's of seconds, depending on the disk speed and the amount of dirty data. If kdump is configured for the VM, the current 10 second timeout in vmbus_wait_for_unload() may be exceeded, and the UNLOAD complete message may arrive well after the kdump kernel is already running, causing problems. Note that no problem occurs if kdump is not enabled because Hyper-V waits for the cache flush before doing a reboot through the BIOS/UEFI code. Fix this problem by increasing the timeout in vmbus_wait_for_unload() to 100 seconds. Also output periodic messages so that if anyone is watching the serial console, they won't think the VM is completely hung. Fixes: 911e1987efc8 ("Drivers: hv: vmbus: Add timeout to vmbus_wait_for_unload") Signed-off-by: Michael Kelley Reviewed-by: Vitaly Kuznetsov Link: https://lore.kernel.org/r/1618894089-126662-1-git-send-email-mikelley@microsoft.com Signed-off-by: Wei Liu Signed-off-by: Sasha Levin --- drivers/hv/channel_mgmt.c | 30 +++++++++++++++++++++++++----- 1 file changed, 25 insertions(+), 5 deletions(-) diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c index 3891d3c2cc00..bd79d958f7d6 100644 --- a/drivers/hv/channel_mgmt.c +++ b/drivers/hv/channel_mgmt.c @@ -768,6 +768,12 @@ static void init_vp_index(struct vmbus_channel *channel, u16 dev_type) free_cpumask_var(available_mask); } +#define UNLOAD_DELAY_UNIT_MS 10 /* 10 milliseconds */ +#define UNLOAD_WAIT_MS (100*1000) /* 100 seconds */ +#define UNLOAD_WAIT_LOOPS (UNLOAD_WAIT_MS/UNLOAD_DELAY_UNIT_MS) +#define UNLOAD_MSG_MS (5*1000) /* Every 5 seconds */ +#define UNLOAD_MSG_LOOPS (UNLOAD_MSG_MS/UNLOAD_DELAY_UNIT_MS) + static void vmbus_wait_for_unload(void) { int cpu; @@ -785,12 +791,17 @@ static void vmbus_wait_for_unload(void) * vmbus_connection.unload_event. If not, the last thing we can do is * read message pages for all CPUs directly. * - * Wait no more than 10 seconds so that the panic path can't get - * hung forever in case the response message isn't seen. + * Wait up to 100 seconds since an Azure host must writeback any dirty + * data in its disk cache before the VMbus UNLOAD request will + * complete. This flushing has been empirically observed to take up + * to 50 seconds in cases with a lot of dirty data, so allow additional + * leeway and for inaccuracies in mdelay(). But eventually time out so + * that the panic path can't get hung forever in case the response + * message isn't seen. */ - for (i = 0; i < 1000; i++) { + for (i = 1; i <= UNLOAD_WAIT_LOOPS; i++) { if (completion_done(&vmbus_connection.unload_event)) - break; + goto completed; for_each_online_cpu(cpu) { struct hv_per_cpu_context *hv_cpu @@ -813,9 +824,18 @@ static void vmbus_wait_for_unload(void) vmbus_signal_eom(msg, message_type); } - mdelay(10); + /* + * Give a notice periodically so someone watching the + * serial output won't think it is completely hung. + */ + if (!(i % UNLOAD_MSG_LOOPS)) + pr_notice("Waiting for VMBus UNLOAD to complete\n"); + + mdelay(UNLOAD_DELAY_UNIT_MS); } + pr_err("Continuing even though VMBus UNLOAD did not complete\n"); +completed: /* * We're crashing and already got the UNLOAD_RESPONSE, cleanup all * maybe-pending messages on all CPUs to be able to receive new -- 2.30.2