Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753301AbdDDJiE (ORCPT ); Tue, 4 Apr 2017 05:38:04 -0400 Received: from mx2.suse.de ([195.135.220.15]:59806 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752572AbdDDJiC (ORCPT ); Tue, 4 Apr 2017 05:38:02 -0400 From: NeilBrown To: Michael Wang , "linux-kernel\@vger.kernel.org" , linux-block@vger.kernel.org, linux-raid@vger.kernel.org Date: Tue, 04 Apr 2017 19:37:17 +1000 Cc: Jens Axboe , Shaohua Li , Jinpu Wang Subject: Re: [RFC PATCH] blk: reset 'bi_next' when bio is done inside request In-Reply-To: <9be3ca00-d802-bf64-bcdc-1e76608147f0@profitbricks.com> References: <9505ff12-7307-7dec-76b5-2a233a592634@profitbricks.com> <877f31kwti.fsf@notabene.neil.brown.name> <9be3ca00-d802-bf64-bcdc-1e76608147f0@profitbricks.com> Message-ID: <871st8jyya.fsf@notabene.neil.brown.name> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha256; protocol="application/pgp-signature" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4161 Lines: 123 --=-=-= Content-Type: text/plain Content-Transfer-Encoding: quoted-printable On Tue, Apr 04 2017, Michael Wang wrote: > Hi, Neil > > On 04/03/2017 11:25 PM, NeilBrown wrote: >> On Mon, Apr 03 2017, Michael Wang wrote: >>=20 >>> blk_attempt_plug_merge() try to merge bio into request and chain them >>> by 'bi_next', while after the bio is done inside request, we forgot to >>> reset the 'bi_next'. >>> >>> This lead into BUG while removing all the underlying devices from md-ra= id1, >>> the bio once go through: >>> >>> md_do_sync() >>> sync_request() >>> generic_make_request() >>=20 >> This is a read request from the "first" device. >>=20 >>> blk_queue_bio() >>> blk_attempt_plug_merge() >>> CHAINED HERE >>> >>> will keep chained and reused by: >>> >>> raid1d() >>> sync_request_write() >>> generic_make_request() >>=20 >> This is a write request to some other device, isn't it? >>=20 >> If sync_request_write() is using a bio that has already been used, it >> should call bio_reset() and fill in the details again. >> However I don't see how that would happen. >> Can you give specific details on the situation that triggers the bug? > > We have storage side mapping lv through scst to server, on server side > we assemble them into multipath device, and then assemble these dm into > two raid1. > > The test is firstly do mkfs.ext4 on raid1 then start fio on it, on storage > side we unmap all the lv (could during mkfs or fio), then on server side > we hit the BUG (reproducible). So I assume the initial resync is still happening at this point? And you unmap *all* the lv's so you expect IO to fail? I can see that the code would behave strangely if you have a bad-block-list configured (which is the default). Do you have a bbl? If you create the array without the bbl, does it still crash? > > The path of bio was confirmed by add tracing, it is reused in sync_reques= t_write() > with 'bi_next' once chained inside blk_attempt_plug_merge(). I still don't see why it is re-used. I assume you didn't explicitly ask for a check/repair (i.e. didn't write to .../md/sync_action at all?). In that case MD_RECOVERY_REQUESTED is not set. So sync_request() sends only one bio to generic_make_request(): r1_bio->bios[r1_bio->read_disk]; then sync_request_write() *doesn't* send that bio again, but does send all the others. So where does it reuse a bio? > > We also tried to reset the bi_next inside sync_request_write() before > generic_make_request() which also works. > > The testing was done with 4.4, but we found upstream also left bi_next > chained after done in request, thus we post this RFC. > > Regarding raid1, we haven't found the place on path where the bio was > reset... where does it supposed to be? I'm not sure what you mean. We only reset bios when they are being reused. One place is in process_checks() where bio_reset() is called before filling in all the details. Maybe, in sync_request_write(), before wbio->bi_rw =3D WRITE; add something like if (wbio->bi_next) printk("bi_next!=3D NULL i=3D%d read_disk=3D%d bi_end_io=3D%pf\n", i, r1_bio->read_disk, wbio->bi_end_io); that might help narrow down what is happening. NeilBrown --=-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEG8Yp69OQ2HB7X0l6Oeye3VZigbkFAljjaU0ACgkQOeye3VZi gbmzZBAAwqm8fRgBgGlGtBGzKrnyCP2J685kyaVVYtA0fxxLgjk7dAndNnQ/LUvh ZHvlHO8GDU0vPdzyIJrBg842vuU0tv/ZklMnEYByFFexc0mLW7qcpIXXaKN0ErX+ QgYeykFCcY9yZo8K5EBXFn/jje8Kk3T3QZYNEhlOXrz+mbCZty2CwLRONj4QggxT 5jfCXNZYEPNVCqn8dKYabKFB1kHGELUwQiCnFDArZh3j18dlQHAhMrJeUHXr6JKh jDNqIjBMP5mPZM4Kr8d5EuWEarBgfQ0xq6W3dscSSpIaUAuYnCYDaaBjxM5LqgIe vF7597xaIkR4Rxn9Z/NFTTSnbQfx9ao0inROO84vvld+b4l0VEo9xWbb08OU2E+7 LChiT72ov/NujtvtXL3uiSztpdcJIsneQuWORPd8lindQMuHG+aw5+fMXSkkwoJ5 CC1ye+xsVGX4mf2vy0vZVw/DBgESWu5dqLlNyXomkIwxPyrPijP1VBrh2ajWa02v uFedzersRqMaJvy6UeazR006maDsIqz7340mHxSbtkKLB/ewA0usoXSPNWpCdRJb drAe7XV/ohpY+Ba4uDDmMQb8ac3aG+aT/nFHdKgjquuby3F+gONfly+KrAXAmOP/ 9ytnUNcJCI7uVBvKWFeRGl/iSptAmYhJJFOX7Y8Vsjai55DPQnE= =Ei52 -----END PGP SIGNATURE----- --=-=-=--