Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1761902AbcJ1U4e (ORCPT ); Fri, 28 Oct 2016 16:56:34 -0400 Received: from mail-sn1nam01on0093.outbound.protection.outlook.com ([104.47.32.93]:10880 "EHLO NAM01-SN1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1756771AbcJ1U4a (ORCPT ); Fri, 28 Oct 2016 16:56:30 -0400 X-Greylist: delayed 7274 seconds by postgrey-1.27 at vger.kernel.org; Fri, 28 Oct 2016 16:56:30 EDT From: KY Srinivasan To: Michael Gissing , "Alex Ng (LIS)" CC: "linux-kernel@vger.kernel.org" , "devel@linuxdriverproject.org" , "olaf@aepfle.de" , "apw@canonical.com" , "vkuznets@redhat.com" , "gregkh@linuxfoundation.org" Subject: RE: [PATCH] Tools: hv: recover after hv_vss_daemon freeze times out Thread-Topic: [PATCH] Tools: hv: recover after hv_vss_daemon freeze times out Thread-Index: AQHSJZiMSNaE1hOf40Ss39+yWjIXZaC+Q26A Date: Fri, 28 Oct 2016 18:21:29 +0000 Message-ID: References: <7a8b552a-d1e0-89e2-5f49-7b4fd2011c70@faulpeltz.net> In-Reply-To: <7a8b552a-d1e0-89e2-5f49-7b4fd2011c70@faulpeltz.net> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=kys@microsoft.com; x-originating-ip: [50.135.110.52] x-ms-office365-filtering-correlation-id: b9fefd6e-3b2c-4e34-d160-08d3ff5f3cdc x-microsoft-exchange-diagnostics: 1;DM5PR03MB2731;7:gSkSeRr7PYWBshbq4R15V4Qe9fwD0J1Klh51UOm4z/KtSV5WxrpVTp955uLD274rm1+M7paaaqcTMCW6assJNof1M6fd0e+EfWqsVNf783IrvQLXz5eUyjcqPijl8no7Is4otl59ys0Hk1DUP5LcucfXV/MNbG1XtyCOhxPgDxzOHUKo0CcGa6470tQJjQFcH8m/5CRn9/T9kEsguyuBL95qzqiQXTwWV2JTdvLMyAZGCdTMtcJB5+8HHAoBkTLBBD/AWK10WPTlBBGKCt9PmuPQSGCTzehYPJB71YPdLKq5W7CTYOv+LGVU8RWMiTC3adWuBA0ZcexcB7crY1mQoog6bJELUimZErW+y2iud9upabBE6bvWl2m5g6OSnbGU x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:DM5PR03MB2731; x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(9452136761055)(198206253151910); x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(61425038)(6040176)(6045074)(601004)(2401047)(8121501046)(5005006)(10201501046)(3002001)(6055026)(61426038)(61427038)(6046074)(6072074);SRVR:DM5PR03MB2731;BCL:0;PCL:0;RULEID:;SRVR:DM5PR03MB2731; x-forefront-prvs: 0109D382B0 x-forefront-antispam-report: SFV:NSPM;SFS:(10019020)(6009001)(7916002)(377454003)(189002)(13464003)(199003)(3846002)(189998001)(305945005)(8676002)(81166006)(81156014)(2906002)(6636002)(19580395003)(122556002)(74316002)(5002640100001)(7736002)(7846002)(33656002)(10090500001)(4326007)(6862003)(2950100002)(2421001)(102836003)(87936001)(5001770100001)(92566002)(7696004)(3660700001)(97736004)(19580405001)(5660300001)(11100500001)(3280700002)(8990500004)(10290500002)(5005710100001)(10400500002)(6116002)(2900100001)(54356999)(68736007)(1511001)(76176999)(2561002)(50986999)(66066001)(586003)(77096005)(8936002)(106116001)(101416001)(99286002)(9686002)(105586002)(106356001)(86612001)(86362001)(76576001);DIR:OUT;SFP:1102;SCL:1;SRVR:DM5PR03MB2731;H:DM5PR03MB2490.namprd03.prod.outlook.com;FPR:;SPF:None;PTR:InfoNoRecords;MX:1;A:1;LANG:en; spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 X-OriginatorOrg: microsoft.com X-MS-Exchange-CrossTenant-originalarrivaltime: 28 Oct 2016 18:21:29.4910 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 72f988bf-86f1-41af-91ab-2d7cd011db47 X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM5PR03MB2731 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by mail.home.local id u9SKwAkq017853 Content-Length: 2590 Lines: 73 > -----Original Message----- > From: Michael Gissing [mailto:mg@faulpeltz.net] > Sent: Thursday, October 13, 2016 2:27 PM > To: Alex Ng (LIS) > Cc: KY Srinivasan ; linux-kernel@vger.kernel.org; > devel@linuxdriverproject.org; olaf@aepfle.de; apw@canonical.com; > vkuznets@redhat.com; gregkh@linuxfoundation.org > Subject: [PATCH] Tools: hv: recover after hv_vss_daemon freeze times out > > > If a FIFREEZE operation run by the hv_vss_daemon takes longer than the > VSS_USERSPACE_TIMEOUT set in the hv_snapshot module, instead of exiting > after a write failure, try to recover by reopening the hv_vss device and > performing the initial handshake again. Exiting causes all subsequent VSS > operations sent by the Hyper-V host to fail until the daemon is restarted. > > Signed-off-by: Michael Gissing > > --- > tools/hv/hv_vss_daemon.c | 11 ++++++++++- > 1 file changed, 10 insertions(+), 1 deletion(-) > > diff --git a/tools/hv/hv_vss_daemon.c b/tools/hv/hv_vss_daemon.c > index 5d51d6f..0ecbdab 100644 > --- a/tools/hv/hv_vss_daemon.c > +++ b/tools/hv/hv_vss_daemon.c > @@ -176,6 +176,7 @@ int main(int argc, char *argv[]) > openlog("Hyper-V VSS", 0, LOG_USER); > syslog(LOG_INFO, "VSS starting; pid is:%d", getpid()); > > +recover: > vss_fd = open("/dev/vmbus/hv_vss", O_RDWR); > if (vss_fd < 0) { > syslog(LOG_ERR, "open /dev/vmbus/hv_vss failed; error: %d %s", > @@ -196,6 +197,7 @@ int main(int argc, char *argv[]) > } > > pfd.fd = vss_fd; > + in_handshake = 1; > > while (1) { > pfd.events = POLLIN; > @@ -258,7 +260,14 @@ int main(int argc, char *argv[]) > if (len != sizeof(struct hv_vss_msg)) { > syslog(LOG_ERR, "write failed; error: %d %s", errno, > strerror(errno)); > - exit(EXIT_FAILURE); > + /* > + * try to recover from possible timeout by THAWing > + * and restarting the message loop > + */ > + vss_operate(VSS_OP_THAW); > + close(vss_fd); > + syslog(LOG_INFO, "trying to recover VSS connection"); > + goto recover; > } > } I agree with issuing a THAW command when we timeout in the kernel as this would leave the file system in a sane state. That said, I am not sure why we need to close the fd and reinitialize everything in the daemon. What if we just ignored the write error and go back to wait for new commands from the host. Regards, K. Y > > -- > 2.7.4 >