Received: by 2002:a25:ab43:0:0:0:0:0 with SMTP id u61csp6087965ybi; Wed, 29 May 2019 02:39:39 -0700 (PDT) X-Google-Smtp-Source: APXvYqzuNM1utT9q5tMc7E9pSvWw3U6Mt1vX0GCZJBxJEYdfjeOo39lux+dHkZjRDLlWCWOQicD1 X-Received: by 2002:a17:902:a405:: with SMTP id p5mr36825094plq.51.1559122779036; Wed, 29 May 2019 02:39:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1559122779; cv=none; d=google.com; s=arc-20160816; b=z6XcKGL4deAwUwwh630AdB2AodgZm2LQgwNgLZrV4zBiI+ofzdbxddPeMKRyoBxZ7g PgSy0zHylAFsUg2dEPcspmFgzJaC+cpR0T5ogf3vlCpLaFKeuhPilNnKHiw3RlwWaNn3 nxJWUP0fRnEO1JMgkzQ4jGbHAyN1CDTjIL2pIXqgQ062ig9ZDbjWWrkK8L0zKI6Uluc2 9ffCgyAE2GRFEkBPw16N/K3Ou6PuMJEVFBuoD273/Ocus+ix2Btg3KqtKzMoLmDbdCZE cRnFcXX+votPX7wQTIVtjgfCoY3QYURcbvTi64HbHFzOyXCHrudGFV4hpRUb9VSJDuUK Uk0g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=qqCrhytt8E9HMsNeIIh0OOW9fjvfj5EZcdxbDdNZhSM=; b=it4aDTmqaDHceY3poWK2pd61/p9XEmrgoLw/aUebw5nvk2WvaY1V02cuC+PYjWTKvL WnYN9JtOE2CA88dxCvfA3hgaRPJEwK15aNU40qGMHseCMUzD61+NJgWrX2eY24ocGEgo nZTOlKn0M5Vo2P5GuVGYerAAIAH2wwqlgeWtSWcz/1tNku2fh6mTeZg4b2T+9bSCIuJV AuDeKY957YwlZMd3wJDekghc0EOuCAp5NSQ3T9va0hW8WbOJp6nSvtGilAP4tMxfkR0h /1jV7UV6UCrEaQbMKt6Jp3P9bEfjOZ6bEUU3X0UhcNgfVXtShuWMk2AWh3CaJMtuInlc x1WQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ucloud.cn Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e2si11967905pgb.117.2019.05.29.02.39.22; Wed, 29 May 2019 02:39:39 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ucloud.cn Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726024AbfE2JiJ (ORCPT + 99 others); Wed, 29 May 2019 05:38:09 -0400 Received: from m9783.mail.qiye.163.com ([220.181.97.83]:2787 "EHLO m9783.mail.qiye.163.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725861AbfE2JiJ (ORCPT ); Wed, 29 May 2019 05:38:09 -0400 X-Greylist: delayed 398 seconds by postgrey-1.27 at vger.kernel.org; Wed, 29 May 2019 05:38:06 EDT Received: from localhost (unknown [120.132.1.243]) by m9783.mail.qiye.163.com (Hmail) with ESMTPA id A1BD9C1A9B; Wed, 29 May 2019 17:31:24 +0800 (CST) Date: Wed, 29 May 2019 03:04:46 +0800 From: Yao Liu To: Josef Bacik Cc: Jens Axboe , linux-block , nbd , linux-kernel Subject: Re: [PATCH 1/3] nbd: fix connection timed out error after reconnecting to server Message-ID: <20190528190446.GA21513@192-168-150-246.7~> References: <1558691036-16281-1-git-send-email-yotta.liu@ucloud.cn> <20190524130740.zfypc2j3q5e3gryr@MacBook-Pro-91.local.dhcp.thefacebook.com> <20190527180743.GA20702@192-168-150-246.7~> <20190528165758.zxfrv6fum4vwcv4e@MacBook-Pro-91.local> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190528165758.zxfrv6fum4vwcv4e@MacBook-Pro-91.local> User-Agent: Mutt/1.5.21 (2010-09-15) X-HM-Spam-Status: e1kIGBQJHllBWUtVQ01OQkJCQ0xMT05JQ05ZV1koWUFJQjdXWS1ZQUlXWQ kOFx4IWUFZNTQpNjo3JCkuNz5ZBg++ X-HM-Sender-Digest: e1kMHhlZQR0aFwgeV1kSHx4VD1lBWUc6NAw6Hzo4Ezg3USJLOikVPU09 OggaCRlVSlVKTk5CSklJSUNPQ05IVTMWGhIXVQIUDw8aVRcSDjsOGBcUDh9VGBVFWVdZEgtZQVlK SUtVSkhJVUpVSU9IWVdZCAFZQU5OSU43Bg++ X-HM-Tid: 0a6b02ee10a32085kuqya1bd9c1a9b Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, May 28, 2019 at 12:57:59PM -0400, Josef Bacik wrote: > On Tue, May 28, 2019 at 02:07:43AM +0800, Yao Liu wrote: > > On Fri, May 24, 2019 at 09:07:42AM -0400, Josef Bacik wrote: > > > On Fri, May 24, 2019 at 05:43:54PM +0800, Yao Liu wrote: > > > > Some I/O requests that have been sent succussfully but have not yet been > > > > replied won't be resubmitted after reconnecting because of server restart, > > > > so we add a list to track them. > > > > > > > > Signed-off-by: Yao Liu > > > > > > Nack, this is what the timeout stuff is supposed to handle. The commands will > > > timeout and we'll resubmit them if we have alive sockets. Thanks, > > > > > > Josef > > > > > > > On the one hand, if num_connections == 1 and the only sock has dead, > > then we do nbd_genl_reconfigure to reconnect within dead_conn_timeout, > > nbd_xmit_timeout will not resubmit commands that have been sent > > succussfully but have not yet been replied. The log is as follows: > > > > [270551.108746] block nbd0: Receive control failed (result -104) > > [270551.108747] block nbd0: Send control failed (result -32) > > [270551.108750] block nbd0: Request send failed, requeueing > > [270551.116207] block nbd0: Attempted send on invalid socket > > [270556.119584] block nbd0: reconnected socket > > [270581.161751] block nbd0: Connection timed out > > [270581.165038] block nbd0: shutting down sockets > > [270581.165041] print_req_error: I/O error, dev nbd0, sector 5123224 flags 8801 > > [270581.165149] print_req_error: I/O error, dev nbd0, sector 5123232 flags 8801 > > [270581.165580] block nbd0: Connection timed out > > [270581.165587] print_req_error: I/O error, dev nbd0, sector 844680 flags 8801 > > [270581.166184] print_req_error: I/O error, dev nbd0, sector 5123240 flags 8801 > > [270581.166554] block nbd0: Connection timed out > > [270581.166576] print_req_error: I/O error, dev nbd0, sector 844688 flags 8801 > > [270581.167124] print_req_error: I/O error, dev nbd0, sector 5123248 flags 8801 > > [270581.167590] block nbd0: Connection timed out > > [270581.167597] print_req_error: I/O error, dev nbd0, sector 844696 flags 8801 > > [270581.168021] print_req_error: I/O error, dev nbd0, sector 5123256 flags 8801 > > [270581.168487] block nbd0: Connection timed out > > [270581.168493] print_req_error: I/O error, dev nbd0, sector 844704 flags 8801 > > [270581.170183] print_req_error: I/O error, dev nbd0, sector 5123264 flags 8801 > > [270581.170540] block nbd0: Connection timed out > > [270581.173333] block nbd0: Connection timed out > > [270581.173728] block nbd0: Connection timed out > > [270581.174135] block nbd0: Connection timed out > > > > On the other hand, if we wait nbd_xmit_timeout to handle resubmission, > > the I/O requests will have a big delay. For example, if timeout time is 30s, > > and from sock dead to nbd_genl_reconfigure returned OK we only spend > > 2s, the I/O requests will still be handled by nbd_xmit_timeout after 30s. > > We have to wait for the full timeout anyway to know that the socket went down, > so it'll be re-submitted right away and then we'll wait on the new connection. > > Now we could definitely have requests that were submitted well after the first > thing that failed, so their timeout would be longer than simply retrying them, > but we have no idea of knowing which ones timed out and which ones didn't. This > way lies pain, because we have to matchup tags with handles. This is why we > rely on the generic timeout infrastructure, so everything is handled correctly > without ending up with duplicate submissions/replies. Thanks, > > Josef > But as I mentioned before, if num_connections == 1, nbd_xmit_timeout won't re-submit commands and I/O error will occur. Should we change the condition if (config->num_connections > 1) to if (config->num_connections >= 1) ?