Received: by 2002:a25:ad19:0:0:0:0:0 with SMTP id y25csp665075ybi; Fri, 2 Aug 2019 02:14:24 -0700 (PDT) X-Google-Smtp-Source: APXvYqwuf7P9YXuSGTjqkr3dHdT9ElODbcWCH1jrOp/BvtjsUSjRaWRBXWoYD4/1ZSqcVHBh8xZ3 X-Received: by 2002:a63:d23:: with SMTP id c35mr122752249pgl.376.1564737264255; Fri, 02 Aug 2019 02:14:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564737264; cv=none; d=google.com; s=arc-20160816; b=vQpaCCq0A4ZbZVSp8f43lMiHq+vftjr8nxdCg4vWXPPc8v5qHknW1GazKxF3ADavUG dV8ftvy4bcOpKRPyr6lQLQDrhvL96a8fpYCYYK4UO5ZE9AvnC/6EWr5eWtc/w944yk/E bAKWnkeVRng5k5A6IFcYwlUjvJ06dNUUGPBWf6vAQfd7VPCT7fzOV1tIf+jWbU2EkYiV 0dPFBxUURVt8MZagVRU2qXeGLrm2M4vqGAwKWIuSyCSzvC/aMzgWukGXRCLwkcxrmlb9 GNCQZyaGkiVyH+0t/y2Fula9FiAR87LcYrLH+/Yg+1HMFAeOv9UnP+91oTCx27YUvGNY /GbA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:in-reply-to :mime-version:user-agent:date:message-id:from:cc:references:to :subject; bh=Ne4mAdftHzr2bODyL+jSqXgu3vXR0+CamfaLg5Ucqzw=; b=G4S95Z9DYdGjwaXVddtZujJMHubpIObxH/0rgmz/YGScOiWW3gOgZvAsu/u5RCv3/c 41fvga98o87Zzf/khDftuHwuVuHQmP6xtwSEHBc8Utuuz6xP0HxMBsV9Eig8uGVmBB2L RQqOVIjN8huzK+wuRj4W9orrUxiZfxPnsJV/0T/mSfbrpDoJCCB5W/wDRQ0U2PkjDfZk K3DyZb+INM86RIIkAim2iejw5d2+rsHQ/8C/MZaEGU3BktKcnvUcvYmpr0cniA+bA4RM fo+z0DolHvC6hfdq2Y1DG1nO9ZjXhaBFNDqlKXlPF2OuBabsWnEFLH90+hbKwukHOfCU duVQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g11si38559724pfi.282.2019.08.02.02.14.04; Fri, 02 Aug 2019 02:14:24 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1733177AbfHBG2W (ORCPT + 99 others); Fri, 2 Aug 2019 02:28:22 -0400 Received: from szxga07-in.huawei.com ([45.249.212.35]:48536 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1732494AbfHBG2W (ORCPT ); Fri, 2 Aug 2019 02:28:22 -0400 Received: from DGGEMS409-HUB.china.huawei.com (unknown [172.30.72.59]) by Forcepoint Email with ESMTP id 97EF2BC928EEF673CD80; Fri, 2 Aug 2019 14:28:11 +0800 (CST) Received: from [127.0.0.1] (10.177.244.145) by DGGEMS409-HUB.china.huawei.com (10.3.19.209) with Microsoft SMTP Server id 14.3.439.0; Fri, 2 Aug 2019 14:28:08 +0800 Subject: Re: [PATCH] ext4: fix potential use after free in system zone via remount with noblock_validity To: Jan Kara References: <1563970268-33688-1-git-send-email-yi.zhang@huawei.com> <20190731140821.GF15806@quack2.suse.cz> CC: , , From: "zhangyi (F)" Message-ID: <54c1c269-c874-1ff8-8ca8-a87e65227957@huawei.com> Date: Fri, 2 Aug 2019 14:28:21 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: <20190731140821.GF15806@quack2.suse.cz> Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.177.244.145] X-CFilter-Loop: Reflected Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org Thanks for your suggestions, I will look at the RCU method to solve this problem. Thanks, Yi. On 2019/7/31 22:08, Jan Kara Wrote: > On Wed 24-07-19 20:11:08, zhangyi (F) wrote: >> Remount process will release system zone which was allocated before if >> "noblock_validity" is specified. If we mount an ext4 file system to two >> mountpoints whit default mount options, and then remount one of them >> with "noblock_validity", it may trigger a use after free problem when >> someone accessing the other one. >> >> # mount /dev/sda foo >> # mount /dev/sda bar >> >> User access mountpoint "foo" | Remount mountpoint "bar" >> | >> ext4_map_blocks() | ext4_remount() >> check_block_validity() | ext4_setup_system_zone() >> ext4_data_block_valid() | ext4_release_system_zone() >> | free system_blks rb nodes >> access system_blks rb nodes | >> trigger use after free | >> >> This patch lock the system zone when accessing it to prevent it being >> released when doing a remount with "noblock_validity" mount option. >> >> Signed-off-by: zhangyi (F) >> Cc: stable@vger.kernel.org > > Thanks for the patch. It is a good catch. Some small comments below. > >> diff --git a/fs/ext4/block_validity.c b/fs/ext4/block_validity.c >> index 8e83741..d9c4792 100644 >> --- a/fs/ext4/block_validity.c >> +++ b/fs/ext4/block_validity.c >> @@ -191,7 +191,7 @@ int ext4_setup_system_zone(struct super_block *sb) >> >> if (!test_opt(sb, BLOCK_VALIDITY)) { >> if (sbi->system_blks.rb_node) >> - ext4_release_system_zone(sb); >> + ext4_release_system_zone_lock(sb); >> return 0; >> } >> if (sbi->system_blks.rb_node) >> @@ -239,6 +239,14 @@ void ext4_release_system_zone(struct super_block *sb) >> EXT4_SB(sb)->system_blks = RB_ROOT; >> } >> >> +/* Called when (re)mounting the filesystem without BLOCK_VALIDITY */ >> +void ext4_release_system_zone_lock(struct super_block *sb) >> +{ >> + spin_lock(&EXT4_SB(sb)->system_blks_lock); >> + ext4_release_system_zone(sb); >> + spin_unlock(&EXT4_SB(sb)->system_blks_lock); >> +} > > Is there any reason why ext4_release_system_zone() should not always take > the system_blks_lock lock? I understand it may not be necessary in all the > cases but it won't hurt either... > > Also ext4_setup_system_zone() should IMO use system_blks_lock to protect > modifications of the rbtree. It can get called during remount as well so > there can be racing ext4_data_block_valid() reading the rbtree at the same > time. > >> @@ -256,6 +264,13 @@ int ext4_data_block_valid(struct ext4_sb_info *sbi, ext4_fsblk_t start_blk, >> sbi->s_es->s_last_error_block = cpu_to_le64(start_blk); >> return 0; >> } >> + >> + /* >> + * Lock the system zone to prevent it being released concurrently >> + * when doing a remount with "noblock_validity" mount option. >> + */ >> + spin_lock(&sbi->system_blks_lock); >> + n = sbi->system_blks.rb_node; >> while (n) { >> entry = rb_entry(n, struct ext4_system_zone, node); >> if (start_blk + count - 1 < entry->start_blk) >> @@ -264,9 +279,11 @@ int ext4_data_block_valid(struct ext4_sb_info *sbi, ext4_fsblk_t start_blk, >> n = n->rb_right; >> else { >> sbi->s_es->s_last_error_block = cpu_to_le64(start_blk); >> + spin_unlock(&sbi->system_blks_lock); >> return 0; >> } >> } >> + spin_unlock(&sbi->system_blks_lock); >> return 1; >> } > > So this will not only serialize ext4_data_block_valid() against remounts > but also against each other. So I suspect that a read-heavy workload on > fast storage could contend on your new fs-wide spinlock. So I think it > would be better to have some other synchronization scheme to avoid the > race. > > If nothing else, rwlock_t would allow concurrent ext4_data_block_valid() > calls. It is still not ideal as the calls would be still bouncing around > the cacheline when updating the lock itself but better than nothing. > > Ideal (performance-wise) would be to use RCU scheme for this - > ext4_data_block_valid() would be RCU protected when reading the RB-tree, > teardown of the block validity information would clear > sbi->system_blks.rb_node and then defer actual freeing of the tree nodes to > RCU callback. Setup would first construct the rbtree and then just set > sbi->system_blks.rb_node to the root of the constructed tree. > > That being said I'm not *sure* this is going to be a performance issue > since ext4_map_blocks() are not that frequent and the lock hold times will > be very short (needs testing). So maybe rwlock_t is a reasonable compromise > between complexity and performance. > > Honza >