Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965783AbbBCDBj (ORCPT ); Mon, 2 Feb 2015 22:01:39 -0500 Received: from mail-pa0-f45.google.com ([209.85.220.45]:57944 "EHLO mail-pa0-f45.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965675AbbBCDBh (ORCPT ); Mon, 2 Feb 2015 22:01:37 -0500 Date: Tue, 3 Feb 2015 12:01:28 +0900 From: Minchan Kim To: Sergey Senozhatsky Cc: Andrew Morton , Jerome Marchand , Ganesh Mahendran , linux-kernel@vger.kernel.org Subject: Re: [PATCH] zram: fix umount-reset_store-mount race condition Message-ID: <20150203030127.GA1541@blaptop> References: <1422886120-16534-1-git-send-email-sergey.senozhatsky@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1422886120-16534-1-git-send-email-sergey.senozhatsky@gmail.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 7440 Lines: 188 Hello Sergey, On Mon, Feb 02, 2015 at 11:08:40PM +0900, Sergey Senozhatsky wrote: > Ganesh Mahendran was the first one who proposed to use bdev->bd_mutex > to avoid ->bd_holders race condition: > > CPU0 CPU1 > umount /* zram->init_done is true */ > reset_store() > bdev->bd_holders == 0 mount > ... zram_make_request() > zram_reset_device() > > However, his solution required some considerable amount of code movement, > which we can avoid. > > Apart from using bdev->bd_mutex in reset_store(), this patch also > simplifies zram_reset_device(). > > zram_reset_device() has a bool parameter reset_capacity which tells > it whether disk capacity and itself disk should be reset. There are > two zram_reset_device() callers: > -- zram_exit() passes reset_capacity=false > -- reset_store() passes reset_capacity=true > > So we can move reset_capacity-sensitive work out of zram_reset_device() > and perform it unconditionally in reset_store(). This also lets us drop > reset_capacity parameter from zram_reset_device() and pass zram pointer > only. > > Reported-by: Ganesh Mahendran > Signed-off-by: Sergey Senozhatsky Looks good to me but as I told you, we need to check processes open the block device as !FMODE_EXCL. I can make following warning with below simple script. In addition, I added msleep(2000) below set_capacity(zram->disk, 0) after applying your patch to make window huge(Kudos to Ganesh!) #!/bin/bash echo $((60<<30)) > /sys/block/zram0/disksize # Check your kernel turns on CONFIG_SCHED_AUTOGROUP=y # I have no idea why it doesn't happen without setsid and CONFIG_SCHED_AUTOGROUP=y # Please let me know it if someone can know the reason. setsid dd if=/dev/zero of=/dev/zram0 & sleep 1 setsid echo 1 > /sys/block/zram0/reset [ 30.683449] zram0: detected capacity change from 0 to 64424509440 [ 33.736869] Buffer I/O error on dev zram0, logical block 180823, lost async page write [ 33.738814] Buffer I/O error on dev zram0, logical block 180824, lost async page write [ 33.740654] Buffer I/O error on dev zram0, logical block 180825, lost async page write [ 33.742551] Buffer I/O error on dev zram0, logical block 180826, lost async page write [ 33.744153] Buffer I/O error on dev zram0, logical block 180827, lost async page write [ 33.745807] Buffer I/O error on dev zram0, logical block 180828, lost async page write [ 33.747419] Buffer I/O error on dev zram0, logical block 180829, lost async page write [ 33.749060] Buffer I/O error on dev zram0, logical block 180830, lost async page write [ 33.750687] Buffer I/O error on dev zram0, logical block 180831, lost async page write [ 33.752286] Buffer I/O error on dev zram0, logical block 180832, lost async page write [ 33.811590] ------------[ cut here ]------------ [ 33.812038] WARNING: CPU: 11 PID: 1996 at fs/block_dev.c:57 __blkdev_put+0x1d7/0x210() [ 33.812817] Modules linked in: [ 33.813142] CPU: 11 PID: 1996 Comm: dd Not tainted 3.19.0-rc6-next-20150202+ #1125 [ 33.813837] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 [ 33.814525] ffffffff81801a2d ffff880061e77db8 ffffffff815b848e 0000000000000001 [ 33.815196] 0000000000000000 ffff880061e77df8 ffffffff8104de2a 0000000000000000 [ 33.815867] ffff88005da287f0 ffff88005da28680 ffff88005da28770 ffff88005da28698 [ 33.816536] Call Trace: [ 33.816817] [] dump_stack+0x45/0x57 [ 33.817304] [] warn_slowpath_common+0x8a/0xc0 [ 33.817829] [] warn_slowpath_null+0x1a/0x20 [ 33.818331] [] __blkdev_put+0x1d7/0x210 [ 33.818797] [] blkdev_put+0x50/0x130 [ 33.819244] [] blkdev_close+0x25/0x30 [ 33.819723] [] __fput+0xdf/0x1e0 [ 33.820140] [] ____fput+0xe/0x10 [ 33.820576] [] task_work_run+0xa7/0xe0 [ 33.821151] [] do_notify_resume+0x49/0x60 [ 33.821721] [] int_signal+0x12/0x17 [ 33.822228] ---[ end trace 274fbbc5664827d2 ]--- The warning comes from bdev_write_node in blkdev_put path. tatic void bdev_write_inode(struct inode *inode) { spin_lock(&inode->i_lock); while (inode->i_state & I_DIRTY) { spin_unlock(&inode->i_lock); WARN_ON_ONCE(write_inode_now(inode, true)); <========= here. spin_lock(&inode->i_lock); } spin_unlock(&inode->i_lock); } The reason is dd process encounters I/O fails due to sudden block device disappear so in filemap_check_errors in __writeback_single_inode returns -EIO. If we checks bd_openners instead of bd_holders, we could address the problem. When I see the brd, it already have used it rather than bd_holders so although I'm not a expert of block layer, it seems to be better. > --- > drivers/block/zram/zram_drv.c | 23 +++++++++-------------- > 1 file changed, 9 insertions(+), 14 deletions(-) > > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c > index aa5a4c5..a32069f 100644 > --- a/drivers/block/zram/zram_drv.c > +++ b/drivers/block/zram/zram_drv.c > @@ -717,7 +717,7 @@ static void zram_bio_discard(struct zram *zram, u32 index, > } > } > > -static void zram_reset_device(struct zram *zram, bool reset_capacity) > +static void zram_reset_device(struct zram *zram) > { > down_write(&zram->init_lock); > > @@ -736,18 +736,7 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity) > memset(&zram->stats, 0, sizeof(zram->stats)); > > zram->disksize = 0; > - if (reset_capacity) > - set_capacity(zram->disk, 0); > - > up_write(&zram->init_lock); > - > - /* > - * Revalidate disk out of the init_lock to avoid lockdep splat. > - * It's okay because disk's capacity is protected by init_lock > - * so that revalidate_disk always sees up-to-date capacity. > - */ > - if (reset_capacity) > - revalidate_disk(zram->disk); > } > > static ssize_t disksize_store(struct device *dev, > @@ -820,6 +809,7 @@ static ssize_t reset_store(struct device *dev, > if (!bdev) > return -ENOMEM; > > + mutex_lock(&bdev->bd_mutex); > /* Do not reset an active device! */ > if (bdev->bd_holders) { > ret = -EBUSY; > @@ -837,12 +827,17 @@ static ssize_t reset_store(struct device *dev, > > /* Make sure all pending I/O is finished */ > fsync_bdev(bdev); > + zram_reset_device(zram); > + set_capacity(zram->disk, 0); > + > + mutex_unlock(&bdev->bd_mutex); > + revalidate_disk(zram->disk); > bdput(bdev); > > - zram_reset_device(zram, true); > return len; > > out: > + mutex_unlock(&bdev->bd_mutex); > bdput(bdev); > return ret; > } > @@ -1188,7 +1183,7 @@ static void __exit zram_exit(void) > * Shouldn't access zram->disk after destroy_device > * because destroy_device already released zram->disk. > */ > - zram_reset_device(zram, false); > + zram_reset_device(zram); > } > > unregister_blkdev(zram_major, "zram"); > -- > 2.3.0.rc2 > -- Kind regards, Minchan Kim -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/