Date: Thu, 20 Oct 2005 13:50:14 -0700
From: Andrew Morton <akpm@osdl.org>
To: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: linux-kernel@vger.kernel.org
Subject: Re: [EXPERIMENT,RFC] FAT: Add "flush" option for hotplug devices
Message-Id: <20051020135014.2289fa01.akpm@osdl.org>
In-Reply-To: <871x2gf8f5.fsf@devron.myhome.or.jp>
References: <871x2gf8f5.fsf@devron.myhome.or.jp>
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 4554
Lines: 166

OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> wrote:
>
> Hi,
> 
> This adds new "flush" option on experiment for hotplug devices.
> 
> Current implementation of "flush" option does,
> 
> 	- synchronizing data pages at ->release() (last close(2))
> 	- if user's work seems to be done (fs is not active), all
> 	  metadata syncs by pdflush()

Seems like a sensible thing to do.

> This option would provide kind of sane progress, and dirty buffers is
> flushed more frequently (if fs is not active).

Your implementation doesn't really do this.  bdi_write_congested() only
returns true if the device is super-busy.  To determine whether it's "not
active" we'd need to peek at the queue's disk_stat accounting, or at the
queue's outstanding read/write requests.  We covered this a couple of weeks
ago in the context of Con's swap prefetch work.

>  This option doesn't
> provide any robustness (robustness is provided by other options), but
> probably the option is proper for hotplug devices.

Well...  It does a full fsync_super() - that's pretty robust.

> +EXPORT_SYMBOL(filemap_write_and_wait);

_GPL please.

> +EXPORT_SYMBOL(fsync_super);

Ditto

> +/*
> + * Copyright (C) 2005, OGAWA Hirofumi
> + * Released under GPL v2.
> + */
> +
> +#include <linux/fs.h>
> +#include <linux/blkdev.h>
> +#include <linux/writeback.h>
> +#include <linux/msdos_fs.h>
> +
> +int fat_sync_fdata(struct inode *inode, struct file *filp)
> +{
> +	int err = 0;
> +
> +	if (filp->f_mode & FMODE_WRITE) {
> +#if 1
> +		current->flags |= PF_SYNCWRITE;
> +		err = filemap_write_and_wait(inode->i_mapping);
> +		current->flags &= ~PF_SYNCWRITE;
> +#else
> +		down(&inode->i_sem);
> +#if 1
> +		err = generic_osync_inode(inode, inode->i_mapping, OSYNC_DATA);
> +#else
> +		err = filp->f_op->fsync(filp, filp->f_dentry, 1);
> +#endif
> +		up(&inode->i_sem);
> +#endif
> +	}
> +	return err;
> +}

Can't we just split up do_fsync() a bit and use that?

> +static void fat_pdflush_handler(unsigned long arg)
> +{
> +	struct super_block *sb = (struct super_block *)arg;
> +	fsync_super(sb);
> +}

It would be nice if /proc/sys/vm/dirty_writeback_centisecs was a per-fs
thing.   That's non-trivial.

> +static void fat_flush_timer(unsigned long data)
> +{
> +	struct super_block *sb = (struct super_block *)data;
> +	struct msdos_sb_info *sbi = MSDOS_SB(sb);
> +	struct backing_dev_info *bdi = blk_get_backing_dev_info(sb->s_bdev);
> +	unsigned long last_flush_jiff;
> +
> +	if (bdi_write_congested(bdi)) {

As indicated above, this won't be very effective.

> +		mod_timer(&sbi->flush_timer, jiffies + (HZ / 10));
> +		return;
> +	}
> +
> +	last_flush_jiff = sbi->last_flush_jiff;
> +
> +	if (!time_after_eq(jiffies, last_flush_jiff + (HZ / 2))) {
> +		mod_timer(&sbi->flush_timer, last_flush_jiff + (HZ / 2));
> +		return;
> +	}

What's the above doing?

> +	if (pdflush_operation(fat_pdflush_handler, (unsigned long)sb) < 0)
> +		mod_timer(&sbi->flush_timer, jiffies + HZ);
> +}
> +
> +void __fat_mark_flush(struct super_block *sb)
> +{
> +	struct msdos_sb_info *sbi = MSDOS_SB(sb);
> +
> +	sbi->last_flush_jiff = jiffies;
> +	/*
> +	 * make sure by smb_wmb() that dirty buffers before here is
> +	 * processed at the timer routine.
> +	 */
> +	smp_wmb();
> +
> +	if (!timer_pending(&sbi->flush_timer))
> +		mod_timer(&sbi->flush_timer, jiffies + HZ);
> +}
> +EXPORT_SYMBOL(__fat_mark_flush);

_GPL?

> +void fat_flush_stop(struct super_block *sb)
> +{
> +	del_timer_sync(&MSDOS_SB(sb)->flush_timer);
> +}

whoops, the pdflush_operation could still be in progress.

To avoid umount races I think the pdflush callback is going to need to take
sb_lock, increment s_count, take ->s_umount, test ->s_root.  Like, for
example, __sync_inodes.

> +void fat_flush_init(struct super_block *sb)
> +{
> +	struct msdos_sb_info *sbi = MSDOS_SB(sb);
> +	init_timer(&sbi->flush_timer);
> +	sbi->flush_timer.data = (unsigned long)sb;
> +	sbi->flush_timer.function = fat_flush_timer;

-mm has setup_timer()

> +
> +static inline void fat_mark_flush(struct super_block *sb)
> +{
> +	if (MSDOS_SB(sb)->options.flush)
> +		__fat_mark_flush(sb);
> +}

It'd be nice to make this a more generic thing, so other filesystems can
use it without copying lots of code.

> +		case Opt_flush:

MS_FLUSH?   I added MS_DIRSYNC a few years ago - it wasn't too complex.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/