2010-08-09 14:33:55

by Sage Weil

[permalink] [raw]
Subject: [PATCH] vfs: introduce FS_IOC_SYNCFS to sync a single super

Currently the only way to sync a single super_block (and not all of them
via sync(2)) is via the BLKFLSBUF ioctl on the block device. That also
invalidates the bdev mapping, which isn't usually desireable, and it
doesn't work for non-block file systems. The ability to sync a single
mount can be useful for both applications and administrators (e.g., when
other mounts on the system are hung).

Introduce a simple ioctl to sync the super associated with an open file.
Pass any error returned by sync_filesystem() back to the user.

Signed-off-by: Sage Weil <[email protected]>
---
fs/ioctl.c | 9 +++++++++
include/linux/fs.h | 1 +
2 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/fs/ioctl.c b/fs/ioctl.c
index 2d140a7..2aabb19 100644
--- a/fs/ioctl.c
+++ b/fs/ioctl.c
@@ -593,6 +593,15 @@ int do_vfs_ioctl(struct file *filp, unsigned int fd, unsigned int cmd,
case FS_IOC_FIEMAP:
return ioctl_fiemap(filp, arg);

+ case FS_IOC_SYNCFS:
+ {
+ struct super_block *sb = filp->f_dentry->d_sb;
+ down_read(&sb->s_umount);
+ error = sync_filesystem(sb);
+ up_read(&sb->s_umount);
+ break;
+ }
+
case FIGETBSZ:
{
struct inode *inode = filp->f_path.dentry->d_inode;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 68ca1b0..175d77f 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -321,6 +321,7 @@ struct inodes_stat_t {
#define FS_IOC_GETVERSION _IOR('v', 1, long)
#define FS_IOC_SETVERSION _IOW('v', 2, long)
#define FS_IOC_FIEMAP _IOWR('f', 11, struct fiemap)
+#define FS_IOC_SYNCFS _IO('f', 12)
#define FS_IOC32_GETFLAGS _IOR('f', 1, int)
#define FS_IOC32_SETFLAGS _IOW('f', 2, int)
#define FS_IOC32_GETVERSION _IOR('v', 1, int)
--
1.7.0.4


2010-08-09 18:55:21

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH] vfs: introduce FS_IOC_SYNCFS to sync a single super

On Monday 09 August 2010, Sage Weil wrote:
> Currently the only way to sync a single super_block (and not all of them
> via sync(2)) is via the BLKFLSBUF ioctl on the block device. That also
> invalidates the bdev mapping, which isn't usually desireable, and it
> doesn't work for non-block file systems. The ability to sync a single
> mount can be useful for both applications and administrators (e.g., when
> other mounts on the system are hung).

You need to add this to the compat_ioctl handling as well, otherwise
it won't work when you run a 32 bit process in a 64 bit kernel.

Adding it to the ioctl_pointer[] array is probably the easiest way.

Arnd

2010-08-27 00:13:36

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] vfs: introduce FS_IOC_SYNCFS to sync a single super

On Mon, 9 Aug 2010 07:33:57 -0700 (PDT)
Sage Weil <[email protected]> wrote:

> Currently the only way to sync a single super_block (and not all of them
> via sync(2)) is via the BLKFLSBUF ioctl on the block device. That also
> invalidates the bdev mapping, which isn't usually desireable

Actually you can do

mount -o remount /dev/whatever

and it will sync the fs and retain caches.

> and it
> doesn't work for non-block file systems.

And I guess remount will do that also.

> The ability to sync a single
> mount can be useful for both applications and administrators (e.g., when
> other mounts on the system are hung).
>
> Introduce a simple ioctl to sync the super associated with an open file.
> Pass any error returned by sync_filesystem() back to the user.
>

The changelog forgot to tell us why this is a useful thing to add.
What is the use-case?

> ---
> fs/ioctl.c | 9 +++++++++
> include/linux/fs.h | 1 +
> 2 files changed, 10 insertions(+), 0 deletions(-)
>
> diff --git a/fs/ioctl.c b/fs/ioctl.c
> index 2d140a7..2aabb19 100644
> --- a/fs/ioctl.c
> +++ b/fs/ioctl.c
> @@ -593,6 +593,15 @@ int do_vfs_ioctl(struct file *filp, unsigned int fd, unsigned int cmd,
> case FS_IOC_FIEMAP:
> return ioctl_fiemap(filp, arg);
>
> + case FS_IOC_SYNCFS:
> + {
> + struct super_block *sb = filp->f_dentry->d_sb;
> + down_read(&sb->s_umount);
> + error = sync_filesystem(sb);
> + up_read(&sb->s_umount);
> + break;
> + }
> +

`mount -o remount' is surely a Linux-specific side-effect and there's
really no guarantee that Linux will always retain that side-effect.
OTOH FS_IOC_SYNCFS is linux-specific.

If we're going to add something like this then it will need to be
documented in manpages. Supposedly, a cc to [email protected]
will help make all that happen, but I'm not sure who if anyone is
answering the phone over there?

2010-11-27 22:32:33

by Jonathan Nieder

[permalink] [raw]
Subject: Re: [PATCH] vfs: introduce FS_IOC_SYNCFS to sync a single super

Hi,

Andrew Morton wrote:
> Sage Weil wrote:

>> The ability to sync a single
>> mount can be useful for both applications and administrators (e.g., when
>> other mounts on the system are hung).
>>
>> Introduce a simple ioctl to sync the super associated with an open file.
>> Pass any error returned by sync_filesystem() back to the user.
>
> The changelog forgot to tell us why this is a useful thing to add.
> What is the use-case?

Here's a use case.

dpkg, like most package managers, occasionally needs to drop in a whole
bunch of new versions of essential files in the file system. Since
ancient times, that has been done with the "rename trick":

open("/lib/libc.so.6.dpkg-tmp", ...
write(...
open("/lib/libm.so.6.dpkg-tmp", ...
write(...
...
/* done staging! now move into place. */
rename("/lib/libc.so.6.dpkg-tmp", "/lib/libc.so.6");
rename("/lib/libm.so.6.dpkg-tmp", "/lib/libm.so.6");
...

This way, each file has either the old content or the new content,
and we can back out upgrades for certain errors (e.g., disk full).

Great. Problem is, filesystems with delayed allocation like XFS,
ubifs, ext4, hfs+ don't cope so well with that[1]. We need to sync
the files at some point before the rename[2] to prevent zero-length
files and similar oddities. What system call to use?

- a storm of fsyncs causes inappropriate constraints on the order
of writes. The result is very slow and can result in unnecessary
wear.

- a sync() causes I/O on unrelated filesystems. The result can be
very slow and can result in unnecessary wear.

A nice compromise is to only sync the affected filesystems, using
something like this ioctl[3].

> If we're going to add something like this then it will need to be
> documented in manpages. Supposedly, a cc to [email protected]
> will help make all that happen, but I'm not sure who if anyone is
> answering the phone over there?

Michael, does the API look okay?

Hope that helps,
Jonathan

[1] Yes, even after v2.6.30-rc1~416^2~15 (ext4: Automatically allocate
delay allocated blocks on rename, 2009-02-23).
See https://bugzilla.kernel.org/show_bug.cgi?id=18632

[2] http://lists.debian.org/debian-dpkg/2010/11/msg00039.html
http://lists.debian.org/debian-devel/2010/11/msg00550.html

[3] http://lists.debian.org/debian-dpkg/2010/11/msg00069.html

2010-11-27 22:57:16

by Jonathan Nieder

[permalink] [raw]
Subject: Re: [PATCH] vfs: introduce FS_IOC_SYNCFS to sync a single super

Arnd Bergmann wrote:

> You need to add this to the compat_ioctl handling as well, otherwise
> it won't work when you run a 32 bit process in a 64 bit kernel.
>
> Adding it to the ioctl_pointer[] array is probably the easiest way.

Here's a patch for squashing (plus another tweak: renumbering the
ioctl to avoid a conflict with EXT4_IOC_ALLOC_DA_BLKS).

-- 8< --
Subject: FS_IOC_SYNCFS: finishing touches

This ioctl has the same ABI for 32-bit and 64-bit use; record
that in compat_ioctl.c.

Change ioctl number from _IO('f', 12) to _IO('f', 16) since the
former is taken by EXT4_IOC_ALLOC_DA_BLKS.

Signed-off-by: Jonathan Nieder <[email protected]>
---
Documentation/ioctl/ioctl-number.txt | 2 +-
fs/compat_ioctl.c | 1 +
fs/ext4/ext4.h | 1 +
include/linux/fs.h | 3 ++-
4 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/Documentation/ioctl/ioctl-number.txt b/Documentation/ioctl/ioctl-number.txt
index 63ffd78..3644ba9 100644
--- a/Documentation/ioctl/ioctl-number.txt
+++ b/Documentation/ioctl/ioctl-number.txt
@@ -212,7 +212,7 @@ Code Seq#(hex) Include File Comments
'f' 00-1F linux/ext3_fs.h conflict!
'f' 00-0F fs/jfs/jfs_dinode.h conflict!
'f' 00-0F fs/ext4/ext4.h conflict!
-'f' 00-0F linux/fs.h conflict!
+'f' 00-1F linux/fs.h conflict!
'f' 00-0F fs/ocfs2/ocfs2_fs.h conflict!
'g' 00-0F linux/usb/gadgetfs.h
'g' 20-2F linux/usb/g_printer.h
diff --git a/fs/compat_ioctl.c b/fs/compat_ioctl.c
index a60579b..0989acf 100644
--- a/fs/compat_ioctl.c
+++ b/fs/compat_ioctl.c
@@ -876,6 +876,7 @@ COMPATIBLE_IOCTL(FIOASYNC)
COMPATIBLE_IOCTL(FIONBIO)
COMPATIBLE_IOCTL(FIONREAD) /* This is also TIOCINQ */
COMPATIBLE_IOCTL(FS_IOC_FIEMAP)
+COMPATIBLE_IOCTL(FS_IOC_SYNCFS)
/* 0x00 */
COMPATIBLE_IOCTL(FIBMAP)
COMPATIBLE_IOCTL(FIGETBSZ)
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 6a5edea..178d70b 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -541,6 +541,7 @@ struct ext4_new_group_data {
/* note ioctl 11 reserved for filesystem-independent FIEMAP ioctl */
#define EXT4_IOC_ALLOC_DA_BLKS _IO('f', 12)
#define EXT4_IOC_MOVE_EXT _IOWR('f', 15, struct move_extent)
+ /* note ioctls 16- reserved for filesystem-independent ioctls */

#if defined(__KERNEL__) && defined(CONFIG_COMPAT)
/*
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 525ecdd..24c6d42 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -331,7 +331,8 @@ struct inodes_stat_t {
#define FS_IOC_GETVERSION _IOR('v', 1, long)
#define FS_IOC_SETVERSION _IOW('v', 2, long)
#define FS_IOC_FIEMAP _IOWR('f', 11, struct fiemap)
-#define FS_IOC_SYNCFS _IO('f', 12)
+ /* 12-15 are for filesystem-specific ioctls */
+#define FS_IOC_SYNCFS _IO('f', 16)
#define FS_IOC32_GETFLAGS _IOR('f', 1, int)
#define FS_IOC32_SETFLAGS _IOW('f', 2, int)
#define FS_IOC32_GETVERSION _IOR('v', 1, int)
--
1.7.2.3

2010-11-28 05:04:18

by Jonathan Nieder

[permalink] [raw]
Subject: Re: [PATCH] vfs: introduce FS_IOC_SYNCFS to sync a single super

Jonathan Nieder wrote:

> Since
> ancient times, that has been done with the "rename trick":
[...]
> Great. Problem is, filesystems with delayed allocation like XFS,
> ubifs, ext4, hfs+ don't cope so well with that[1].
[...]
> [1] Yes, even after v2.6.30-rc1~416^2~15 (ext4: Automatically allocate
> delay allocated blocks on rename, 2009-02-23).
> See https://bugzilla.kernel.org/show_bug.cgi?id=18632

Sorry, wrong link. The example meant was

https://bugzilla.kernel.org/show_bug.cgi?id=15910

"zero-length files and performance degradation", reported against 2.6.32:

| To reproduce it:
| * install a fresh Ubuntu Lucid system on an ext4 filesystem, or Debian with
| dpkg < 1.15.6 or Ubuntu Karmic
| * install a package, wait a few seconds and simulate a crash
| $ sudo apt-get install some-package; sleep 5; sudo echo b > /proc/sysrq-trigger
| * reboot
| $ ls -l /var/lib/dpkg/info/some-package.* will list empty maintainer's scripts.
| $ ls -l /var/cache/apt/archive/some-package.* will show the empty archive
| you've just downloaded
| At this stage, the package manager is unusable and the common user cannot
| update anything anymore.
|
| This behavior is observed with data=ordered and with or without the mount
| option auto_da_alloc.

Sorry for the confusion.
Jonathan