Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754545AbcJEGrv (ORCPT ); Wed, 5 Oct 2016 02:47:51 -0400 Received: from mx1.redhat.com ([209.132.183.28]:54288 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752005AbcJEGrt (ORCPT ); Wed, 5 Oct 2016 02:47:49 -0400 Date: Wed, 5 Oct 2016 08:47:42 +0200 From: Mateusz Guzik To: Jan Kara Cc: Pierre Morel , viro@zeniv.linux.org.uk, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, farman@linux.vnet.ibm.com, cornelia.huck@de.ibm.com, Jens Axboe , Josef Bacik Subject: Re: [PATCH] fs/block_dev.c: return the right error in thaw_bdev() Message-ID: <20161005064740.yekp2jl3nler2won@mguzik> References: <1475571220-2522-1-git-send-email-pmorel@linux.vnet.ibm.com> <1475571220-2522-2-git-send-email-pmorel@linux.vnet.ibm.com> <20161004090615.GF17515@quack2.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20161004090615.GF17515@quack2.suse.cz> User-Agent: Mutt/1.6.0.1 (2016-04-01) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.39]); Wed, 05 Oct 2016 06:47:48 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4622 Lines: 128 On Tue, Oct 04, 2016 at 11:06:15AM +0200, Jan Kara wrote: > On Tue 04-10-16 10:53:40, Pierre Morel wrote: > > When triggering thaw-filesystems via magic sysrq, the system enters a > > loop in do_thaw_one(), as thaw_bdev() still returns success if > > bd_fsfreeze_count == 0. To fix this, let thaw_bdev() always return > > error (and simplify the code a bit at the same time). > > > > The patch looks good. > Now that I had a closer look, while the patch indeed gets rid of the infinite loop, the functionality itself does not work properly. Note I'm not familiar with this code, so chances are I got details wrong. I also don't know the original reasoning. The current state is that you can freeze by calling either freeze_super or freeze_bdev. The latter bumps bd_fsfreeze_count and calls freeze_super. freeze_super does NOT modify bd_fsfreeze_count. freeze_bdev is used by device mapper, xfs and e2fs. freeze_super is used by the FIFREEZE ioctl. This disparity leads to breakage. Quick look suggests device freezing has its users and sb is optionally present. So the fix would consist on making all freezers call the bdev variant. The j sysrq ends up not working in 2 ways. 1. It ends up calling do_thaw_one -> thaw_bdev. It will look at the bd_fsfreeze_count counter and error out. But if the user froze the fs with the ioctl, the counter is not modified and the fs in question ends up not being thawed. 2. (assuming 1 is fixed) Since freezing through freeze_bdev supports nesting, multiple invocations of the sysrq may be needed to actually thaw. Now, looking at *_bdev functions: > struct super_block *freeze_bdev(struct block_device *bdev) > { > struct super_block *sb; > int error = 0; > > mutex_lock(&bdev->bd_fsfreeze_mutex); > if (++bdev->bd_fsfreeze_count > 1) { No limit is put in place so in principle this will eventually turn negative. > /* > * We don't even need to grab a reference - the first call > * to freeze_bdev grab an active reference and only the last > * thaw_bdev drops it. > */ > sb = get_super(bdev); > if (sb) > drop_super(sb); > mutex_unlock(&bdev->bd_fsfreeze_mutex); The code assumes nobody thaws the fs from under the consumer. That is, code doing sb = freeze_bdev(b); ...; thaw_bdev(b, sb); risks use-after-free if the user thawed the fs. I did not check if current consumers hold the object in different ways and thus avoiding the bug. > return sb; > } > > sb = get_active_super(bdev); > if (!sb) > goto out; > if (sb->s_op->freeze_super) > error = sb->s_op->freeze_super(sb); > else > error = freeze_super(sb); This differs from the ioctl version, which has this check: > if (sb->s_op->freeze_fs == NULL && sb->s_op->freeze_super == NULL) > return -EOPNOTSUPP; For a filesystem with freeze_fs == NULL, freeze_super will return 0 and freeze_bdev will end up returnig 0. I don't know if this turns into a real problem. > int thaw_bdev(struct block_device *bdev, struct super_block *sb) [snip] > if (sb->s_op->thaw_super) > error = sb->s_op->thaw_super(sb); > else > error = thaw_super(sb); > if (error) { > bdev->bd_fsfreeze_count++; > mutex_unlock(&bdev->bd_fsfreeze_mutex); > return error; > } The fact that someone else can thaw makes it very fishy to pass super blocks around in the first place. In particular: kernel freeze -> user thaw -> mount -> user freeze -> kernel thaw Are we guaranteed that the sb returned for the kernel freeze is the same which was used for user freeze? That said, rough proposed fix is as follows: - store the pointer to the sb used for freezing in bdev - drop the sb argument from thaw_bdev - optionally return a referenced sb from freeze_bdev, otherwise just return NULL on success - convert all freeze consumers to use *_bdev variants - somewhat cosmetic, introduce nesting limit for freezes (say 2048 or whatever, just to have something) The ioctl used use freeze_bdev but this was changed with 18e9e5104fcd "Introduce freeze_super and thaw_super for the fsfreeze ioctl" which cites btrfs as the reason for freeze_super as opposed to freeze_bdev. CC'ing the author for comments. -- Mateusz Guzik