Received: by 2002:a25:ab43:0:0:0:0:0 with SMTP id u61csp4739878ybi; Tue, 28 May 2019 01:35:57 -0700 (PDT) X-Google-Smtp-Source: APXvYqz9wZniP3TunROhXFrAq9mK/qZLRJqNKuEYp0YwfDHclGCG2sgqIuun41PgFIaDCcLK1zhx X-Received: by 2002:aa7:942f:: with SMTP id y15mr83512057pfo.121.1559032556887; Tue, 28 May 2019 01:35:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1559032556; cv=none; d=google.com; s=arc-20160816; b=DaotGpw08ZvEKw7fTsSpcpS1POZrShieKQQ5+V9d9QIG2MdH+KH0GlgCzIFz0nSW8w N3Rmw/fmLNFHhH+/XMKrk2n+nvcfoMq+7n4lXZQraEjuiGxcP3InDn+Cl/SCmrtxBDFf n2YuX877YReda9NctlznJJKSGzrWszw4/qcybYKVtu3ZKlMevISVTXFDSeqY1/SjFYBI I+ZJr9W/NG/exDrfNzpxJgWydY9cSiBr6oJiobaGaRdaZ5mF6PyP3f4N7wWpnxZFzyjw zlxMOUaaeL75cdpvyGR0Zpjesm86KVkeLCyDVyr1SAKT+CrFCQ3YF2h7rli4HQn+17VD 7zGA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=pnS3EhtWYjP3IdRzeC2Lx8MPMuwK0msE/gzgpovnRYI=; b=A9R1k0tAtkZ008yuxcRPsk/fhel7hww9gvG8Ik+7v1qPhUrSYC/H3oSElwnBYEw0dp EWjtQMrsau4LsQRrWf4O7S7VKfHHFibeIEOGZ4ZtoDIdy00ml3b4UQ2xY7BrSwz1EzOx 4H2Mj+ZwJsMt5XPq0Dw/VkVy2FHwbgc9M+1CBVyFrwcQ6O5YilEDAdzp3mZU8FeIVxZF Kl5RAOynDFn2aiiT4MmXn597OSiY3JnncHhyrThWC+gjWlh29MO1z3B33rfmXUicxszI aS0mDx2t8G4KsJ7vEpVMWWWGXRuZ5sinIU+FK47r561MfNMrxh9QpM6KJaF5zEPdFMo/ J+Dw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ucloud.cn Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 102si6400135plf.250.2019.05.28.01.35.41; Tue, 28 May 2019 01:35:56 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ucloud.cn Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726694AbfE1Iea (ORCPT + 99 others); Tue, 28 May 2019 04:34:30 -0400 Received: from m9784.mail.qiye.163.com ([220.181.97.84]:51996 "EHLO m9784.mail.qiye.163.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726557AbfE1Iea (ORCPT ); Tue, 28 May 2019 04:34:30 -0400 Received: from localhost (unknown [120.132.1.243]) by m9784.mail.qiye.163.com (Hmail) with ESMTPA id A3EFD419FD; Tue, 28 May 2019 16:34:22 +0800 (CST) Date: Tue, 28 May 2019 02:07:43 +0800 From: Yao Liu To: Josef Bacik Cc: Jens Axboe , linux-block , nbd , linux-kernel Subject: Re: Re: [PATCH 1/3] nbd: fix connection timed out error after reconnecting to server Message-ID: <20190527180743.GA20702@192-168-150-246.7~> References: <1558691036-16281-1-git-send-email-yotta.liu@ucloud.cn> <20190524130740.zfypc2j3q5e3gryr@MacBook-Pro-91.local.dhcp.thefacebook.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190524130740.zfypc2j3q5e3gryr@MacBook-Pro-91.local.dhcp.thefacebook.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-HM-Spam-Status: e1kIGBQJHllBWUtVTkhJQkJCQ0pITk5NTUxKWVdZKFlBSUI3V1ktWUFJV1 kJDhceCFlBWTU0KTY6NyQpLjc#WQY+ X-HM-Sender-Digest: e1kMHhlZQR0aFwgeV1kSHx4VD1lBWUc6NjY6Kzo5DjgwNiNJIS4dCDoN P01PCjpVSlVKTk5CS0hJT01JQ0JOVTMWGhIXVQIUDw8aVRcSDjsOGBcUDh9VGBVFWVdZEgtZQVlK SUtVSkhJVUpVSU9IWVdZCAFZQU9KTk43Bg++ X-HM-Tid: 0a6afd937d9b2086kuqya3efd419fd Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, May 24, 2019 at 09:07:42AM -0400, Josef Bacik wrote: > On Fri, May 24, 2019 at 05:43:54PM +0800, Yao Liu wrote: > > Some I/O requests that have been sent succussfully but have not yet been > > replied won't be resubmitted after reconnecting because of server restart, > > so we add a list to track them. > > > > Signed-off-by: Yao Liu > > Nack, this is what the timeout stuff is supposed to handle. The commands will > timeout and we'll resubmit them if we have alive sockets. Thanks, > > Josef > On the one hand, if num_connections == 1 and the only sock has dead, then we do nbd_genl_reconfigure to reconnect within dead_conn_timeout, nbd_xmit_timeout will not resubmit commands that have been sent succussfully but have not yet been replied. The log is as follows: [270551.108746] block nbd0: Receive control failed (result -104) [270551.108747] block nbd0: Send control failed (result -32) [270551.108750] block nbd0: Request send failed, requeueing [270551.116207] block nbd0: Attempted send on invalid socket [270556.119584] block nbd0: reconnected socket [270581.161751] block nbd0: Connection timed out [270581.165038] block nbd0: shutting down sockets [270581.165041] print_req_error: I/O error, dev nbd0, sector 5123224 flags 8801 [270581.165149] print_req_error: I/O error, dev nbd0, sector 5123232 flags 8801 [270581.165580] block nbd0: Connection timed out [270581.165587] print_req_error: I/O error, dev nbd0, sector 844680 flags 8801 [270581.166184] print_req_error: I/O error, dev nbd0, sector 5123240 flags 8801 [270581.166554] block nbd0: Connection timed out [270581.166576] print_req_error: I/O error, dev nbd0, sector 844688 flags 8801 [270581.167124] print_req_error: I/O error, dev nbd0, sector 5123248 flags 8801 [270581.167590] block nbd0: Connection timed out [270581.167597] print_req_error: I/O error, dev nbd0, sector 844696 flags 8801 [270581.168021] print_req_error: I/O error, dev nbd0, sector 5123256 flags 8801 [270581.168487] block nbd0: Connection timed out [270581.168493] print_req_error: I/O error, dev nbd0, sector 844704 flags 8801 [270581.170183] print_req_error: I/O error, dev nbd0, sector 5123264 flags 8801 [270581.170540] block nbd0: Connection timed out [270581.173333] block nbd0: Connection timed out [270581.173728] block nbd0: Connection timed out [270581.174135] block nbd0: Connection timed out On the other hand, if we wait nbd_xmit_timeout to handle resubmission, the I/O requests will have a big delay. For example, if timeout time is 30s, and from sock dead to nbd_genl_reconfigure returned OK we only spend 2s, the I/O requests will still be handled by nbd_xmit_timeout after 30s.