Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751983AbbFZNAD (ORCPT ); Fri, 26 Jun 2015 09:00:03 -0400 Received: from cantor2.suse.de ([195.135.220.15]:58786 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751988AbbFZM7y (ORCPT ); Fri, 26 Jun 2015 08:59:54 -0400 Date: Fri, 26 Jun 2015 14:59:47 +0200 From: Jan Kara To: Dave Hansen Cc: jack@suse.cz, viro@zeniv.linux.org.uk, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, paulmck@linux.vnet.ibm.com, tim.c.chen@linux.intel.com, ak@linux.intel.com, dave.hansen@linux.intel.com Subject: Re: [RFCv2][PATCH 2/7] fs: use RCU for free_super() vs. __sb_start_write() Message-ID: <20150626125947.GE6271@quack.suse.cz> References: <20150625001605.72553909@viggo.jf.intel.com> <20150625001605.50C9BE41@viggo.jf.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150625001605.50C9BE41@viggo.jf.intel.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6618 Lines: 195 On Wed 24-06-15 17:16:05, Dave Hansen wrote: > > From: Dave Hansen > > Currently, __sb_start_write() and freeze_super() can race with > each other. __sb_start_write() uses a smp_mb() to ensure that > freeze_super() can see its write to sb->s_writers.counter and > that it can see freeze_super()'s update to sb->s_writers.frozen. > This all seems to work fine. > > But, this smp_mb() makes __sb_start_write() the single hottest > function in the kernel if I sit in a loop and do tiny write()s to > tmpfs over and over. This is on a very small 2-core system, so > it will only get worse on larger systems. > > This _seems_ like an ideal case for RCU. __sb_start_write() is > the RCU read-side and is in a very fast, performance-sensitive > path. freeze_super() is the RCU writer and is in an extremely > rare non-performance-sensitive path. > > Instead of doing and smp_wmb() in __sb_start_write(), we do > rcu_read_lock(). This ensures that a CPU doing freeze_super() > can not proceed past its synchronize_rcu() until the grace > period has ended and the 's_writers.frozen = SB_FREEZE_WRITE' > is visible to __sb_start_write(). > > One question here: Does the work that __sb_start_write() does in > a previous grace period becomes visible to freeze_super() after > its call to synchronize_rcu()? It _seems_ like it should, but it > seems backwards to me since __sb_start_write() is the "reader" in > this case. I believe yes. Because all accesses (be it reads or writes) must finish before the current RCU period finishes. And synchronize_rcu() must make sure that any code (loads / stores) after it execute only after the RCU period has finished... The patch looks good to me. You can add: Reviewed-by: Jan Kara Honza > > This patch increases the number of writes/second that I can do > by 5.6%. > > Does anybody see any holes with this? > > Cc: Jan Kara > Cc: Alexander Viro > Cc: linux-fsdevel@vger.kernel.org > Cc: linux-kernel@vger.kernel.org > Cc: Paul E. McKenney > Cc: Tim Chen > Cc: Andi Kleen > Signed-off-by: Dave Hansen > --- > > b/fs/super.c | 63 ++++++++++++++++++++++++++++++++++++----------------------- > 1 file changed, 39 insertions(+), 24 deletions(-) > > diff -puN fs/super.c~rcu-__sb_start_write fs/super.c > --- a/fs/super.c~rcu-__sb_start_write 2015-06-24 17:14:34.939125713 -0700 > +++ b/fs/super.c 2015-06-24 17:14:34.942125847 -0700 > @@ -1190,27 +1190,21 @@ static void acquire_freeze_lock(struct s > */ > int __sb_start_write(struct super_block *sb, int level, bool wait) > { > -retry: > - if (unlikely(sb->s_writers.frozen >= level)) { > + rcu_read_lock(); > + while (unlikely(sb->s_writers.frozen >= level)) { > + rcu_read_unlock(); > if (!wait) > return 0; > wait_event(sb->s_writers.wait_unfrozen, > sb->s_writers.frozen < level); > + rcu_read_lock(); > } > > #ifdef CONFIG_LOCKDEP > acquire_freeze_lock(sb, level, !wait, _RET_IP_); > #endif > percpu_counter_inc(&sb->s_writers.counter[level-1]); > - /* > - * Make sure counter is updated before we check for frozen. > - * freeze_super() first sets frozen and then checks the counter. > - */ > - smp_mb(); > - if (unlikely(sb->s_writers.frozen >= level)) { > - __sb_end_write(sb, level); > - goto retry; > - } > + rcu_read_unlock(); > return 1; > } > EXPORT_SYMBOL(__sb_start_write); > @@ -1254,6 +1248,29 @@ static void sb_wait_write(struct super_b > } while (writers); > } > > +static void __thaw_super(struct super_block *sb) > +{ > + sb->s_writers.frozen = SB_UNFROZEN; > + /* > + * RCU protects us against races where we are taking > + * s_writers.frozen in to a less permissive state. When > + * that happens, __sb_start_write() might not yet have > + * seen our write and might still increment > + * s_writers.counter. > + * > + * Here, however, we are transitioning to a _more_ > + * permissive state. The filesystem is frozen and no > + * writes to s_writers.counter are being permitted. > + * > + * A smp_wmb() is sufficient here because we just need > + * to ensure that new calls __sb_start_write() are > + * allowed, not that _concurrent_ calls have finished. > + */ > + smp_wmb(); > + wake_up(&sb->s_writers.wait_unfrozen); > + deactivate_locked_super(sb); > +} > + > /** > * freeze_super - lock the filesystem and force it into a consistent state > * @sb: the super to lock > @@ -1312,7 +1329,13 @@ int freeze_super(struct super_block *sb) > > /* From now on, no new normal writers can start */ > sb->s_writers.frozen = SB_FREEZE_WRITE; > - smp_wmb(); > + /* > + * After we synchronize_rcu(), we have ensured that everyone > + * who reads sb->s_writers.frozen under rcu_read_lock() can > + * now see our update. This pretty much means that > + * __sb_start_write() will not allow any new writers. > + */ > + synchronize_rcu(); > > /* Release s_umount to preserve sb_start_write -> s_umount ordering */ > up_write(&sb->s_umount); > @@ -1322,7 +1345,7 @@ int freeze_super(struct super_block *sb) > /* Now we go and block page faults... */ > down_write(&sb->s_umount); > sb->s_writers.frozen = SB_FREEZE_PAGEFAULT; > - smp_wmb(); > + synchronize_rcu(); > > sb_wait_write(sb, SB_FREEZE_PAGEFAULT); > > @@ -1331,7 +1354,7 @@ int freeze_super(struct super_block *sb) > > /* Now wait for internal filesystem counter */ > sb->s_writers.frozen = SB_FREEZE_FS; > - smp_wmb(); > + synchronize_rcu(); > sb_wait_write(sb, SB_FREEZE_FS); > > if (sb->s_op->freeze_fs) { > @@ -1339,11 +1362,7 @@ int freeze_super(struct super_block *sb) > if (ret) { > printk(KERN_ERR > "VFS:Filesystem freeze failed\n"); > - sb->s_writers.frozen = SB_UNFROZEN; > - smp_wmb(); > - wake_up(&sb->s_writers.wait_unfrozen); > - deactivate_locked_super(sb); > - return ret; > + __thaw_super(sb); > } > } > /* > @@ -1386,11 +1405,7 @@ int thaw_super(struct super_block *sb) > } > > out: > - sb->s_writers.frozen = SB_UNFROZEN; > - smp_wmb(); > - wake_up(&sb->s_writers.wait_unfrozen); > - deactivate_locked_super(sb); > - > + __thaw_super(sb); > return 0; > } > EXPORT_SYMBOL(thaw_super); > _ -- Jan Kara SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/