Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753341AbcD1JAb (ORCPT ); Thu, 28 Apr 2016 05:00:31 -0400 Received: from metis.ext.4.pengutronix.de ([92.198.50.35]:48615 "EHLO metis.ext.4.pengutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750867AbcD1JA2 (ORCPT ); Thu, 28 Apr 2016 05:00:28 -0400 From: Markus Pargmann To: Ratna Manoj Cc: pbonzini@redhat.com, jack@suse.cz, Gou Rao , Vinod Jayaraman , nbd-general@lists.sourceforge.net, linux-kernel@vger.kernel.org Subject: Re: [PATCH] NBD: replace kill_bdev() with __invalidate_device() Date: Thu, 28 Apr 2016 11:00:20 +0200 Message-ID: <1809196.i4ZeAM11iZ@adelgunde> User-Agent: KMail/4.14.1 (Linux/4.4.0-0.bpo.1-amd64; KDE/4.14.2; x86_64; ; ) In-Reply-To: <571ADB31.2000609@gmail.com> References: <56E711D1.9090708@gmail.com> <11986900.ymBh9azTyz@adelgunde> <571ADB31.2000609@gmail.com> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="nextPart3487245.aFLYjWJGEk"; micalg="pgp-sha256"; protocol="application/pgp-signature" X-SA-Exim-Connect-IP: 2001:67c:670:100:a61f:72ff:fe68:75ba X-SA-Exim-Mail-From: mpa@pengutronix.de X-SA-Exim-Scanned: No (on metis.ext.pengutronix.de); SAEximRunCond expanded to false X-PTX-Original-Recipient: linux-kernel@vger.kernel.org Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 7489 Lines: 237 --nextPart3487245.aFLYjWJGEk Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="us-ascii" Hi, On Saturday 23 April 2016 07:47:21 Ratna Manoj wrote: > Thanks for the review.=20 >=20 > Atleast for ext4 this crash happens on a sys_umount() call, timing of= > which is not in control of block driver. Block driver cannot force th= e > filesystems to be unmounted, and the file system does not expect=20 > buffers to get unmapped under it. Yes the block driver can't force a clean umount. >=20 > Ext4 can be fixed with the this patch: > http://www.spinics.net/lists/linux-ext4/msg51112.html=20 > It did not make to the kernel. It checks the state of the buffer head= > before committing. >=20 > When we consider diskett/CD as user space thread that called NBD_DO_I= T, > this problem is analogous to changing disk with another or the same > disk suddenly when the file system is still mounted.=20 >=20 > If we completely kill the block device we would loss some writes when= > same thread is reconnected. I am not so sure about your exact use-case here. If the NBD_DO_IT thread returns I am considering the connection and block device as dead and disconnected. Securing any data afterwards wit= h a new connection is potentially dangerous as it may be a different server. >=20 > if we do not completely kill or if we only invalidate clean buffers,=20= > we will have inconsistency on re-attach with a different thread > (analogous to replacing disk with different disk suddenly).=20 Yes exactly. That's why I suggested that NBD_DO_IT waits until all blockdevice users are gone. This would avoid any issues with writing/reading data to a wrong server. Best Regards, Markus >=20 > Ratna. =20 > =20 >=20 > On Wed, Apr 20, 2016 at 4:36 PM, Markus Pargmann = wrote: >=20 > > Hi, > > > > On Thursday 24 March 2016 07:04:10 Ratna Manoj wrote: > > > From: Ratna Manoj Bolla > > > > > > When a filesystem is mounted on a nbd device and on a disconnect,= because > > > of kill_bdev(), and resetting bdev size to zero, buffer_head mapp= ings are > > > getting destroyed under mounted filesystem. > > > > > > After a bdev size reset(i.e bdev->bd_inode->i_size =3D 0) on a di= sconnect, > > > followed by a sys_umount(), > > > generic_shutdown_super()->... > > > ->__sync_blockdev()->... > > > -> blkdev_writepages()->... > > > ->do_invalidatepage()->... > > > -> discard_buffer() is discarding superblock buffer_hea= d=20 > > assumed > > > to be in mapped state by ext4_commit_super(). > > > > > > > > > > > > Signed-off-by: Ratna Manoj Bolla > > > --- > > > This script reproduces both the kernel panic scenarios: > > > > > > $ qemu-img create -f qcow2 f.img 1G > > > $ mkfs.ext4 f.img > > > $ qemu-nbd -c /dev/nbd0 f.img > > > $ mount /dev/nbd0 dir > > > $ killall -KILL qemu-nbd > > > $ sleep 1 > > > $ ls dir > > > $ umount dir > > > > > > Bug reports: > > > http://www.kernelhub.org/?p=3D2&msg=3D361407 > > >=20 > > https://www.mail-archive.com/nbd-general@lists.sourceforge.net/msg0= 2388.html > > > > Thanks, please CC nbd-general@lists.sourceforge.net, > > linux-kernel@vger.kernel.org as well. > > > > So this patch simply does not cleanup the blockdevice to avoid any > > errors on the filesystem side. The userspace thread that called > > NBD_DO_IT will exit immediately before the filesystem decided to re= lease > > the blockdevice. The nbd driver assumes that the shutdown was done = and > > accepts new clients setting up sockets and so on. Couldn't this lea= d to > > a lot of problems? > > > > Currently NBD_DO_IT returns when it is save to use the NBD device a= gain. > > This patch changes this as the blockdevice may still be in use when= > > NBD_DO_IT returns. I think it would be better to delay NBD_DO_IT un= til > > everything is cleaned up and all filesystems are closed. > > > > Best Regards, > > > > Markus > > > > > > > > diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c > > > index f6b51d7..6e77b3a 100644 > > > --- a/drivers/block/nbd.c > > > +++ b/drivers/block/nbd.c > > > @@ -119,7 +119,8 @@ static const char *nbdcmd_to_ascii(int cmd) > > > > > > static int nbd_size_clear(struct nbd_device *nbd, struct block_d= evice=20 > > *bdev) > > > { > > > - bdev->bd_inode->i_size =3D 0; > > > + if (bdev->bd_openers <=3D 1) > > > + bdev->bd_inode->i_size =3D 0; > > > set_capacity(nbd->disk, 0); > > > kobject_uevent(&nbd_to_dev(nbd)->kobj, KOBJ_CHANGE); > > > > > > @@ -678,6 +679,9 @@ static void nbd_reset(struct nbd_device *nbd)= > > > > > > static void nbd_bdev_reset(struct block_device *bdev) > > > { > > > + if (bdev->bd_openers > 1) > > > + return; > > > + > > > set_device_ro(bdev, false); > > > bdev->bd_inode->i_size =3D 0; > > > if (max_part > 0) { > > > @@ -735,7 +739,7 @@ static int __nbd_ioctl(struct block_device *b= dev,=20 > > struct nbd_device *nbd, > > > nbd_clear_que(nbd); > > > BUG_ON(!list_empty(&nbd->queue_head)); > > > BUG_ON(!list_empty(&nbd->waiting_queue)); > > > - kill_bdev(bdev); > > > + __invalidate_device(bdev, true); > > > return 0; > > > > > > case NBD_SET_SOCK: { > > > @@ -809,7 +813,7 @@ static int __nbd_ioctl(struct block_device *b= dev,=20 > > struct nbd_device *nbd, > > > > > > sock_shutdown(nbd); > > > nbd_clear_que(nbd); > > > - kill_bdev(bdev); > > > + __invalidate_device(bdev, true); > > > nbd_bdev_reset(bdev); > > > > > > if (nbd->disconnect) /* user requested, ignore sock= et=20 > > errors */ > > > > > > > > > > > > > -- > > Pengutronix e.K. | = | > > Industrial Linux Solutions | http://www.pengutronix= .de/ | > > Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917= =2D0 | > > Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917= =2D5555 | > > >=20 >=20 >=20 =2D-=20 Pengutronix e.K. | = | Industrial Linux Solutions | http://www.pengutronix.de/= | Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 = | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-555= 5 | --nextPart3487245.aFLYjWJGEk Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part. Content-Transfer-Encoding: 7Bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJXIdEkAAoJEEpcgKtcEGQQ5HcP/jsRJZmqy+RRhMOIMOhwgUXk iQr0JDaKqHWS/kCjT2L5/QDUurTuQwUVtJCYCV0B+kRi6/Tu4diJmPeSaA1G2Zws UmmS4poRD71ZTYelb0ByDPLUeKH2SG/qj7QvVJ95ZIontFiPcOVEkXbk6UzIPs09 jQTTgDAipG/yP3L85ZW7NARj/a0DsVIF2FL7L9n4HGEdZQEd5CEAzgyQb2cJlELH YHK0EV2QMnVMeGB7cjmEdk36USSPmDsLKvBJC82Ny3rI8qUjcgYxr5g0LxwC5gtg X4cO9Say5WnCPLdc1qMaIbCHFEluvClsE6JzbutYHQUos6nNVbbFWn+2ImSh+3C1 w+/C5/XOl4wT+/yfebHYM/dqKnmOnNQa4vbV3GAaNOWMN5djVzibw9E595ix0RoG rYeRZi95z8jI4hZu8JFYpOkkTZXH5iwGgZUS7bS34FMsX/ZOsWiFWdcSYCrF9TAj 4vbNabnTjvpgpR2X2bEumq+ZnmHTmtOfh7DhuDpoe8Yw1cbNeXzBWjjg0jgLeTmu ZBJGaC+uTwKVToDr+eXmlg232c+Ho3UQzGUw2SYKufh2JnN2IHf3NzuwVFta67TK wyLjLHpoeUUbhOyNoE6Ux8pr9B1S11kJ2dXpea7Z8AYhAScCzX3CS65daE+qS5Fz +ijJ0hIFAzTOJhl4jIrv =yTPy -----END PGP SIGNATURE----- --nextPart3487245.aFLYjWJGEk--