Received: by 2002:a25:ad19:0:0:0:0:0 with SMTP id y25csp6091041ybi; Wed, 31 Jul 2019 08:12:57 -0700 (PDT) X-Google-Smtp-Source: APXvYqz9XvVDGJRHWnYJps+M2vSxqttm2nkeIJ1fYJUB0aYaldrpv/M0RXms9JQyf5mYfA0nblZO X-Received: by 2002:a17:90a:cb81:: with SMTP id a1mr3363992pju.81.1564585977870; Wed, 31 Jul 2019 08:12:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564585977; cv=none; d=google.com; s=arc-20160816; b=oeqzd1KUNKbth9a9Jnth1crB91I4AZ8mKMmlyUY3F3AvtVUp/BDR7OKOTX6MApLGQQ 0PT+DzcifzPOlhJpJJOJ6ryeANoXHnR0spwChCZF0vu6uUhb3FG5PmdD27zs+45UmjWL s4lbjsi93V9STsajUlTDvHtYfhyU9FoelibZsn279+rJGfxHGnjZS08CJoeEjHeAD+k2 0v8Ink8QS00p9hrKv4RnCW0EY7cwOB3Yosio/sV+lE0hRzJgWzryASjxo1pHAdxf2eAX jYRPFgh13IC+O1zHukvjaBVj/c4ifhqm+5qbBUubv2EUPhRYt6Tfy936Hdm+79MbwDwA pzSQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=tEQ4SwZ2zQ/PRWUgPsVBsWly8fPNgVMtWgMJtgBiOfo=; b=SJYf/r1QUIkEfYeEf/Aqh42zGUzuTeBk1lgRIScrYiGkmYmMB/KRK5sIGTXfg1OCAp yf/CBs/5ku59xEzk49oNYT28Im8koKGeTY0+RzxN0PxKk4uOSSQHKnSLR8d8SNf8Nl70 bptrN0wqCxKr8alNBQrwOHjeunZV/rEkZfy2BpUf3zLRRpMLfdGS+KOhVa+QuRVvcJIl qyW3cnxMMc4n9aAZBu3f25cnMqc3BKjzS5QBQzZodLaDBZf6Dy3aJuxaMVqEZaP/x3pT 30pi00/GrbKY3Oh+E6DFXRz4+8eIgQyGf4aG2wVgwb2x/xm4Y54oGWTp5WYwVNlQn+B4 iaMw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id t6si33052332pfe.231.2019.07.31.08.12.40; Wed, 31 Jul 2019 08:12:57 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727309AbfGaOIn (ORCPT + 99 others); Wed, 31 Jul 2019 10:08:43 -0400 Received: from mx2.suse.de ([195.135.220.15]:47666 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726607AbfGaOIn (ORCPT ); Wed, 31 Jul 2019 10:08:43 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id B6EBBAC20; Wed, 31 Jul 2019 14:08:41 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 7839F1E434C; Wed, 31 Jul 2019 16:08:21 +0200 (CEST) Date: Wed, 31 Jul 2019 16:08:21 +0200 From: Jan Kara To: "zhangyi (F)" Cc: linux-ext4@vger.kernel.org, tytso@mit.edu, jack@suse.cz, adilger.kernel@dilger.ca Subject: Re: [PATCH] ext4: fix potential use after free in system zone via remount with noblock_validity Message-ID: <20190731140821.GF15806@quack2.suse.cz> References: <1563970268-33688-1-git-send-email-yi.zhang@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1563970268-33688-1-git-send-email-yi.zhang@huawei.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On Wed 24-07-19 20:11:08, zhangyi (F) wrote: > Remount process will release system zone which was allocated before if > "noblock_validity" is specified. If we mount an ext4 file system to two > mountpoints whit default mount options, and then remount one of them > with "noblock_validity", it may trigger a use after free problem when > someone accessing the other one. > > # mount /dev/sda foo > # mount /dev/sda bar > > User access mountpoint "foo" | Remount mountpoint "bar" > | > ext4_map_blocks() | ext4_remount() > check_block_validity() | ext4_setup_system_zone() > ext4_data_block_valid() | ext4_release_system_zone() > | free system_blks rb nodes > access system_blks rb nodes | > trigger use after free | > > This patch lock the system zone when accessing it to prevent it being > released when doing a remount with "noblock_validity" mount option. > > Signed-off-by: zhangyi (F) > Cc: stable@vger.kernel.org Thanks for the patch. It is a good catch. Some small comments below. > diff --git a/fs/ext4/block_validity.c b/fs/ext4/block_validity.c > index 8e83741..d9c4792 100644 > --- a/fs/ext4/block_validity.c > +++ b/fs/ext4/block_validity.c > @@ -191,7 +191,7 @@ int ext4_setup_system_zone(struct super_block *sb) > > if (!test_opt(sb, BLOCK_VALIDITY)) { > if (sbi->system_blks.rb_node) > - ext4_release_system_zone(sb); > + ext4_release_system_zone_lock(sb); > return 0; > } > if (sbi->system_blks.rb_node) > @@ -239,6 +239,14 @@ void ext4_release_system_zone(struct super_block *sb) > EXT4_SB(sb)->system_blks = RB_ROOT; > } > > +/* Called when (re)mounting the filesystem without BLOCK_VALIDITY */ > +void ext4_release_system_zone_lock(struct super_block *sb) > +{ > + spin_lock(&EXT4_SB(sb)->system_blks_lock); > + ext4_release_system_zone(sb); > + spin_unlock(&EXT4_SB(sb)->system_blks_lock); > +} Is there any reason why ext4_release_system_zone() should not always take the system_blks_lock lock? I understand it may not be necessary in all the cases but it won't hurt either... Also ext4_setup_system_zone() should IMO use system_blks_lock to protect modifications of the rbtree. It can get called during remount as well so there can be racing ext4_data_block_valid() reading the rbtree at the same time. > @@ -256,6 +264,13 @@ int ext4_data_block_valid(struct ext4_sb_info *sbi, ext4_fsblk_t start_blk, > sbi->s_es->s_last_error_block = cpu_to_le64(start_blk); > return 0; > } > + > + /* > + * Lock the system zone to prevent it being released concurrently > + * when doing a remount with "noblock_validity" mount option. > + */ > + spin_lock(&sbi->system_blks_lock); > + n = sbi->system_blks.rb_node; > while (n) { > entry = rb_entry(n, struct ext4_system_zone, node); > if (start_blk + count - 1 < entry->start_blk) > @@ -264,9 +279,11 @@ int ext4_data_block_valid(struct ext4_sb_info *sbi, ext4_fsblk_t start_blk, > n = n->rb_right; > else { > sbi->s_es->s_last_error_block = cpu_to_le64(start_blk); > + spin_unlock(&sbi->system_blks_lock); > return 0; > } > } > + spin_unlock(&sbi->system_blks_lock); > return 1; > } So this will not only serialize ext4_data_block_valid() against remounts but also against each other. So I suspect that a read-heavy workload on fast storage could contend on your new fs-wide spinlock. So I think it would be better to have some other synchronization scheme to avoid the race. If nothing else, rwlock_t would allow concurrent ext4_data_block_valid() calls. It is still not ideal as the calls would be still bouncing around the cacheline when updating the lock itself but better than nothing. Ideal (performance-wise) would be to use RCU scheme for this - ext4_data_block_valid() would be RCU protected when reading the RB-tree, teardown of the block validity information would clear sbi->system_blks.rb_node and then defer actual freeing of the tree nodes to RCU callback. Setup would first construct the rbtree and then just set sbi->system_blks.rb_node to the root of the constructed tree. That being said I'm not *sure* this is going to be a performance issue since ext4_map_blocks() are not that frequent and the lock hold times will be very short (needs testing). So maybe rwlock_t is a reasonable compromise between complexity and performance. Honza -- Jan Kara SUSE Labs, CR