Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751849AbdGUMXK (ORCPT ); Fri, 21 Jul 2017 08:23:10 -0400 Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:38646 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1750762AbdGUMXI (ORCPT ); Fri, 21 Jul 2017 08:23:08 -0400 From: Josef Bacik To: Adam Borowski CC: "linux-block@vger.kernel.org" , "nbd-general@lists.sourceforge.net" , "linux-kernel@vger.kernel.org" Subject: Re: nbd drops connection on most writes Thread-Topic: nbd drops connection on most writes Thread-Index: AQHTAhr+9zeMz0rdfkCsH9zyatoTo6JeM+4H Date: Fri, 21 Jul 2017 12:22:51 +0000 Message-ID: References: <20170721121439.ofwm3lfuzqjsvjok@angband.pl> In-Reply-To: <20170721121439.ofwm3lfuzqjsvjok@angband.pl> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [2606:a000:4381:1201:bde1:b2d0:3614:d751] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1;DM5SPR00MB335;20:0GcYR3JEwUJG9D4lCe74ASo7qTBomVuGuPcfrPlhR4s2+/mJKUrvY5kW+w2UkHzxsttkV5oppJEVgNH0K5rNcR6foXOXO/QLKzaVvu19Dw2TZ4r5zgXLTnV02Nqt4Hqh4nwFtqdI9kih7VXRrqNb3d2WFxDNHYoD3tlxypPoWvs= x-ms-office365-filtering-correlation-id: 0fc1c00d-d4bb-486b-f0bf-08d4d03334c4 x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:(300000500095)(300135000095)(300000501095)(300135300095)(22001)(300000502095)(300135100095)(2017030254075)(300000503095)(300135400095)(2017052603031)(201703131423075)(201703031133081)(201702281549075)(300000504095)(300135200095)(300000505095)(300135600095)(300000506095)(300135500095);SRVR:DM5SPR00MB335; x-ms-traffictypediagnostic: DM5SPR00MB335: x-exchange-antispam-report-test: UriScan:(158342451672863)(84791874153150); x-microsoft-antispam-prvs: x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(6040450)(601004)(2401047)(8121501046)(5005006)(100000703101)(100105400095)(93006095)(93001095)(10201501046)(3002001)(6041248)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123555025)(20161123562025)(20161123564025)(20161123560025)(20161123558100)(6072148)(100000704101)(100105200095)(100000705101)(100105500095);SRVR:DM5SPR00MB335;BCL:0;PCL:0;RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095);SRVR:DM5SPR00MB335; x-forefront-prvs: 0375972289 x-forefront-antispam-report: SFV:NSPM;SFS:(10019020)(6009001)(39450400003)(39410400002)(39400400002)(39840400002)(39850400002)(377454003)(199003)(24454002)(189002)(14454004)(5003630100001)(25786009)(97736004)(189998001)(4326008)(36756003)(86362001)(5660300001)(102836003)(6116002)(3660700001)(2906002)(575784001)(3280700002)(83716003)(50986999)(99286003)(54356999)(76176999)(54906002)(8676002)(7736002)(305945005)(6436002)(105586002)(106356001)(8936002)(81156014)(81166006)(68736007)(53546010)(101416001)(478600001)(6486002)(82746002)(77096006)(6506006)(2950100002)(6916009)(2900100001)(6512007)(6246003)(110136004)(229853002)(38730400002)(53936002)(33656002);DIR:OUT;SFP:1102;SCL:1;SRVR:DM5SPR00MB335;H:DM5PR15MB1914.namprd15.prod.outlook.com;FPR:;SPF:None;PTR:InfoNoRecords;A:1;MX:1;LANG:en; spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 X-MS-Exchange-CrossTenant-originalarrivaltime: 21 Jul 2017 12:22:51.1438 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 8ae927fe-1255-47a7-a2af-5f3a069daaa2 X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM5SPR00MB335 X-OriginatorOrg: fb.com X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-07-21_06:,, signatures=0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by nfs id v6LCNHmN009562 Content-Length: 5013 Lines: 83 Oh shit the default timeout is 0 if you don't set it in the client. Use the timeout option with nbd client and it should fix it for you. I'll send something up to make this a sane default. Thanks, Josef Sent from my iPhone > On Jul 21, 2017, at 8:15 AM, Adam Borowski wrote: > > Hi! > I'm afraid that 4.13-rc1 nbd aborts connection on writes for me: > > [ 251.938384] block nbd0: Send data failed (result -11) > [ 251.943484] block nbd0: Request send failed trying another connection > [ 251.950034] block nbd0: Receive control failed (result -32) > [ 251.955676] block nbd0: Attempted send on invalid socket > [ 251.961022] print_req_error: I/O error, dev nbd0, sector 2206344 > [ 251.961025] block nbd0: shutting down sockets > > Not all kinds of writes trigger the problem. For example, you can dd to the > nbd block device, likewise badblocks -w succeeds without a hitch. Yet at > least btrfs and swap disconnect nearly immediately. Reads seem to work: for > example, btrfs can usually mount and scrub successfully, yet minor writes > that happen on a filesystem mounted rw even without explicit user-level > writes cause a disconnect in a short time. "Real" writes to the filesystem > trigger it apparently outright. Likewise, to use swap you need to write to > it first, thus it fails quickly. > > Reproduced on arm64 (Pine64) first. As this SoC just switched from an > out-of-tree ethernet driver to a completely different new one (dwmac-sun8i), > and such a switch can't be bisected, I assumed that's the culprit and did > not complain while in -next. > > However, turns out the same happens on a bog-standard amd64, both on bare > metal and in qemu. > > In all of these cases, the server is an amd64 Debian stretch, kernel > 4.9.30-2+deb9u2, nbd-server 1:3.15.2-3. > > Bisect blames dc88e34d "nbd: set sk->sk_sndtimeo for our sockets", and > indeed, reverting that patch makes everything fine again. > > > Bisect log: > # bad: [63a86362130f4c17eaa57f3ef5171ec43111a54e] Merge tag 'pm-4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm > # good: [6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c] Linux 4.12 > git bisect start 'linus/master' 'v4.12' > # bad: [55a7b2125cf4739a8478d2d7223310ae7393408c] Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux > git bisect bad 55a7b2125cf4739a8478d2d7223310ae7393408c > # bad: [1849f800fba32cd5a0b647f824f11426b85310d8] Merge tag 'armsoc-dt' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc > git bisect bad 1849f800fba32cd5a0b647f824f11426b85310d8 > # bad: [cbcd4f08aa637b74f575268770da86a00fabde6d] Merge tag 'staging-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging > git bisect bad cbcd4f08aa637b74f575268770da86a00fabde6d > # bad: [1b044f1cfc65a7d90b209dfabd57e16d98b58c5b] Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip > git bisect bad 1b044f1cfc65a7d90b209dfabd57e16d98b58c5b > # bad: [892ad5acca0b2ddb514fae63fa4686bf726d2471] Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip > git bisect bad 892ad5acca0b2ddb514fae63fa4686bf726d2471 > # bad: [e442cbf910c71fba5926cf757dd7f8fcce22fc5f] pktcdvd: remove the call to blk_queue_bounce > git bisect bad e442cbf910c71fba5926cf757dd7f8fcce22fc5f > # bad: [d86c4d8ef31b3d99c681c859cb4e936dafc2d7a4] nvme: move reset workqueue handling to common code > git bisect bad d86c4d8ef31b3d99c681c859cb4e936dafc2d7a4 > # bad: [fdd050b5b3c96813ae6756ed68157d32ba31b9f2] Merge branch 'uuid-types' of bombadil.infradead.org:public_git/uuid into nvme-base > git bisect bad fdd050b5b3c96813ae6756ed68157d32ba31b9f2 > # bad: [a104c9f22c7d073d4ae308ca36383ce5cc4631cc] nvme-rdma: fix merge error > git bisect bad a104c9f22c7d073d4ae308ca36383ce5cc4631cc > # good: [b040ad9cf6a169cc000a5324fcada695dfa1f4b3] loop: fix error handling regression > git bisect good b040ad9cf6a169cc000a5324fcada695dfa1f4b3 > # bad: [36ffc6c1c0e67acdacb53348350d0a37206dbadf] block_dev: propagate bio_iov_iter_get_pages error in __blkdev_direct_IO > git bisect bad 36ffc6c1c0e67acdacb53348350d0a37206dbadf > # bad: [f729b66fca43d850d564b264c2033980c00a14b0] gfs2: remove the unused sd_log_error field > git bisect bad f729b66fca43d850d564b264c2033980c00a14b0 > # bad: [401741547f95c0883fe143ac446d92c772937556] nvme-lightnvm: use blk_execute_rq in nvme_nvm_submit_user_cmd > git bisect bad 401741547f95c0883fe143ac446d92c772937556 > # bad: [dc88e34d69d87c370deaa9d613dac8e3a0411f59] nbd: set sk->sk_sndtimeo for our sockets > git bisect bad dc88e34d69d87c370deaa9d613dac8e3a0411f59 > # first bad commit: [dc88e34d69d87c370deaa9d613dac8e3a0411f59] nbd: set sk->sk_sndtimeo for our sockets > > > Meow! > -- > ⢀⣴⠾⠻⢶⣦⠀ > ⣾⠁⢠⠒⠀⣿⡁ A dumb species has no way to open a tuna can. > ⢿⡄⠘⠷⠚⠋⠀ A smart species invents a can opener. > ⠈⠳⣄⠀⠀⠀⠀ A master species delegates.