Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757337AbcJMVlA (ORCPT ); Thu, 13 Oct 2016 17:41:00 -0400 Received: from vie01a-qmta-pe01-2.mx.upcmail.net ([62.179.121.179]:57168 "EHLO vie01a-qmta-pe01-2.mx.upcmail.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756762AbcJMVkv (ORCPT ); Thu, 13 Oct 2016 17:40:51 -0400 X-Greylist: delayed 826 seconds by postgrey-1.27 at vger.kernel.org; Thu, 13 Oct 2016 17:40:51 EDT X-SourceIP: 84.115.59.110 To: alexng@microsoft.com Cc: kys@microsoft.com, linux-kernel@vger.kernel.org, devel@linuxdriverproject.org, olaf@aepfle.de, apw@canonical.com, vkuznets@redhat.com, gregkh@linuxfoundation.org From: Michael Gissing Subject: [PATCH] Tools: hv: recover after hv_vss_daemon freeze times out Message-ID: <7a8b552a-d1e0-89e2-5f49-7b4fd2011c70@faulpeltz.net> Date: Thu, 13 Oct 2016 23:26:59 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1727 Lines: 52 If a FIFREEZE operation run by the hv_vss_daemon takes longer than the VSS_USERSPACE_TIMEOUT set in the hv_snapshot module, instead of exiting after a write failure, try to recover by reopening the hv_vss device and performing the initial handshake again. Exiting causes all subsequent VSS operations sent by the Hyper-V host to fail until the daemon is restarted. Signed-off-by: Michael Gissing --- tools/hv/hv_vss_daemon.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/tools/hv/hv_vss_daemon.c b/tools/hv/hv_vss_daemon.c index 5d51d6f..0ecbdab 100644 --- a/tools/hv/hv_vss_daemon.c +++ b/tools/hv/hv_vss_daemon.c @@ -176,6 +176,7 @@ int main(int argc, char *argv[]) openlog("Hyper-V VSS", 0, LOG_USER); syslog(LOG_INFO, "VSS starting; pid is:%d", getpid()); +recover: vss_fd = open("/dev/vmbus/hv_vss", O_RDWR); if (vss_fd < 0) { syslog(LOG_ERR, "open /dev/vmbus/hv_vss failed; error: %d %s", @@ -196,6 +197,7 @@ int main(int argc, char *argv[]) } pfd.fd = vss_fd; + in_handshake = 1; while (1) { pfd.events = POLLIN; @@ -258,7 +260,14 @@ int main(int argc, char *argv[]) if (len != sizeof(struct hv_vss_msg)) { syslog(LOG_ERR, "write failed; error: %d %s", errno, strerror(errno)); - exit(EXIT_FAILURE); + /* + * try to recover from possible timeout by THAWing + * and restarting the message loop + */ + vss_operate(VSS_OP_THAW); + close(vss_fd); + syslog(LOG_INFO, "trying to recover VSS connection"); + goto recover; } } -- 2.7.4