2014-03-05 14:31:54

by Lucas Nussbaum

[permalink] [raw]
Subject: Extremely slow remounts with concurrent I/O

TL;DR: we experience long temporary hangs when doing multiple mount -o
remount at the same time as other I/O on an ext4 filesystem.

Hi,

When starting hundreds of LXC containers simultaneously on a system, the
boot of some containers was hanging. We tracked this down to an
initscript's use of mount -o remount, which was hanging in D state.

We reproduced the problem outside of LXC, with the script available at
[0]. That script initiates 1000 mount -o remount, and performs some
writes using a big cp to the same filesystem during the remounts.

All tests are performed on a Debian jessie system, with Linux 3.12.

Note that when running the script several times in a row, the same data is
copied over and over again, probably resulting in almost no I/O thanks to
caches. That's on purpose (see below).

When things work fine, the script above takes about 100s on our test
system (with quite a lot of variation). When things go wrong, it takes up
to 600s.

When things go wrong, we see the initial I/O caused by cp, which quickly
finishes.

After that, many (often close to 1000) mount processes are hung in D state
for a *long* time. During that time, we see still some I/O on the block
device (using dstat). At [1], you will find a btrace dump of what it
happening.

At some point, some mount processes start finishing, and then slowly all
mount process terminate.

Just to give an idea of timings:
from t0 to t0+25s: cp running
from t0+25s to t0+267s: all mount processes are hung
from t0+269s to t0+408s: mount processes terminate

After doing:
echo 100 > /proc/sys/kernel/hung_task_warnings
echo 30 > /proc/sys/kernel/hung_task_timeout_secs
We get:
[ 5879.392070] INFO: task mount:8953 blocked for more than 30 seconds.
[ 5879.429616] Tainted: G I 3.12-1-amd64 #1
[ 5879.461434] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 5879.508354] mount D ffff88007f104380 0 8953 7954 0x00000000
[ 5879.550760] ffff88007f104040 0000000000000086 00000000000142c0 ffff880409281fd8
[ 5879.595415] 00000000000142c0 ffff880409281fd8 ffff880409281df8 ffff880409281d90
[ 5879.640054] ffff880409281df0 ffff88007f104040 0000000000000000 0000000000000020
[ 5879.684652] Call Trace:
[ 5879.699328] [<ffffffff8148d969>] ? schedule_timeout+0x1f9/0x2c0
[ 5879.735318] [<ffffffff81087a55>] ? check_preempt_curr+0x75/0x90
[ 5879.771305] [<ffffffff81087a7f>] ? ttwu_do_wakeup+0xf/0xd0
[ 5879.804683] [<ffffffff810898ba>] ? try_to_wake_up+0xca/0x260
[ 5879.839112] [<ffffffff8148fc50>] ? wait_for_completion+0xa0/0x110
[ 5879.876140] [<ffffffff81089a90>] ? wake_up_state+0x10/0x10
[ 5879.909517] [<ffffffff81198ebc>] ? sync_inodes_sb+0x9c/0x1a0
[ 5879.943942] [<ffffffff8119ee33>] ? sync_filesystem+0x53/0xa0
[ 5879.978385] [<ffffffff81176bba>] ? do_remount_sb+0x4a/0x1a0
[ 5880.012319] [<ffffffff81191d98>] ? do_mount+0x688/0xa70
[ 5880.044158] [<ffffffff8112c268>] ? memdup_user+0x38/0x70
[ 5880.076499] [<ffffffff811921fc>] ? SyS_mount+0x7c/0xc0
[ 5880.107791] [<ffffffff81498fb9>] ? system_call_fastpath+0x16/0x1b
[ 5880.697315] INFO: task mount:8957 blocked for more than 30 seconds.
[ 5880.734859] Tainted: G I 3.12-1-amd64 #1
[ 5880.766688] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 5880.813608] mount D ffff8804062783c0 0 8957 7956 0x00000000
[ 5880.856012] ffff880406278080 0000000000000086 00000000000142c0 ffff88040263dfd8
[ 5880.900676] 00000000000142c0 ffff88040263dfd8 ffff880406278080 ffff88040263de68
[ 5880.945281] ffff88042d30f068 ffff88042d30f070 ffffffff00000000 ffff88042d30f078
[ 5880.989897] Call Trace:
[ 5881.004577] [<ffffffff81490cb5>] ? rwsem_down_write_failed+0x105/0x1d0
[ 5881.044214] [<ffffffff81271a73>] ? call_rwsem_down_write_failed+0x13/0x20
[ 5881.085403] [<ffffffff8148ed44>] ? down_write+0x24/0x26
[ 5881.117215] [<ffffffff811919ed>] ? do_mount+0x2dd/0xa70
[ 5881.149042] [<ffffffff8112c268>] ? memdup_user+0x38/0x70
[ 5881.181387] [<ffffffff811921fc>] ? SyS_mount+0x7c/0xc0
[ 5881.212684] [<ffffffff81498fb9>] ? system_call_fastpath+0x16/0x1b

Some other things we tried:
1) we tried to 'sync' after removing the files, and dropping the caches
(as shown in the commented lines in [0]). That makes the problem disappear
(or at least makes it less frequent). The overall script execution is
actually faster with the post-rm sync and dropping caches than without
them!

2) We tried switching to the noop scheduler (instead of cfq). The problem
could still be reproduced. A btrace dump with noop is available at [2].

3) We tried with ext3 instead of ext4. The problem could never be
reproduced.

4) We tried on different machines, and we could reproduce the problem.
However, on a machine with SSD drives, we were not able to reproduce the
problem.

Any ideas?

[0] http://blop.info/p/20140305-ext4-mount-remount.rb
[1] http://blop.info/p/20140305-ext4-mount-btrace.log
[2] http://blop.info/p/20140305-ext4-mount-btrace-noop.log.gz
--
Lucas Nussbaum and Emmanuel Jeanvoine


2014-03-06 13:56:47

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH, RFC] jbd2: don't write non-commit blocks synchronously

On Wed, Mar 05, 2014 at 03:13:43PM +0100, Lucas Nussbaum wrote:
> TL;DR: we experience long temporary hangs when doing multiple mount -o
> remount at the same time as other I/O on an ext4 filesystem.

Hi Lukas,

Thanks for this report. Are you willing to try a kernel patch? If
so, could you try and see if this fixes your issue. From looking at
your block trace, I saw a large number of suspicious 4k writes from
the jbd2 layer.

- Ted

commit 137d7cea675fd7d8ff98b7e035fb6516dc4ab220
Author: Theodore Ts'o <[email protected]>
Date: Thu Mar 6 08:56:11 2014 -0500

jbd2: don't write non-commit blocks synchronously

We don't need to write the revoke blocks and descriptor blocks using
WRITE_SYNC, since when we issue the commit block, thos blocks will get
pushed out via REQ_FLUSH. This will allow the journal blocks to be
written in fewer i/o operations (otherwise we end up issuing a whole
series of 4k writes unnecessarily).

Signed-off-by: "Theodore Ts'o" <[email protected]>

diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index cf2fc05..fb64629 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -554,7 +554,7 @@ void jbd2_journal_commit_transaction(journal_t *journal)

blk_start_plug(&plug);
jbd2_journal_write_revoke_records(journal, commit_transaction,
- &log_bufs, WRITE_SYNC);
+ &log_bufs, WRITE);
blk_finish_plug(&plug);

jbd_debug(3, "JBD2: commit phase 2b\n");
@@ -739,7 +739,7 @@ start_journal_io:
clear_buffer_dirty(bh);
set_buffer_uptodate(bh);
bh->b_end_io = journal_end_buffer_io_sync;
- submit_bh(WRITE_SYNC, bh);
+ submit_bh(WRITE, bh);
}
cond_resched();
stats.run.rs_blocks_logged += bufs;


2014-03-06 17:28:12

by Lucas Nussbaum

[permalink] [raw]
Subject: Re: [PATCH, RFC] jbd2: don't write non-commit blocks synchronously

On 06/03/14 at 08:56 -0500, Theodore Ts'o wrote:
> On Wed, Mar 05, 2014 at 03:13:43PM +0100, Lucas Nussbaum wrote:
> > TL;DR: we experience long temporary hangs when doing multiple mount -o
> > remount at the same time as other I/O on an ext4 filesystem.
>
> Hi Lukas,
>
> Thanks for this report. Are you willing to try a kernel patch? If
> so, could you try and see if this fixes your issue. From looking at
> your block trace, I saw a large number of suspicious 4k writes from
> the jbd2 layer.

Hi Ted,

This patch doesn't solve the problem. It seems that on average, it makes
the situation worse, but I'm not sure if this is just anecdotical
evidence or statistically valid.

Anything else I could try to gather more information?

Lucas

2014-03-06 18:27:39

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH, RFC] jbd2: don't write non-commit blocks synchronously

Hmm... OK, let me make sure I understand what is going on. So you
have a single file system which is mounted read/write, and you are
doing a huge number of copies into the file system, which is keeping
it busy. You are then running a huge number of "mount -o remount" on
that same file system, which should effectively be no-op's, since the
remount isn't actually change the read/only or read/write or any other
mount options. Is that right?

Why were you doing the remount in in your actual production workload,
anyway?

- Ted

2014-03-06 18:39:32

by Lucas Nussbaum

[permalink] [raw]
Subject: Re: [PATCH, RFC] jbd2: don't write non-commit blocks synchronously

On 06/03/14 at 18:28 +0100, Lucas Nussbaum wrote:
> On 06/03/14 at 08:56 -0500, Theodore Ts'o wrote:
> > On Wed, Mar 05, 2014 at 03:13:43PM +0100, Lucas Nussbaum wrote:
> > > TL;DR: we experience long temporary hangs when doing multiple mount -o
> > > remount at the same time as other I/O on an ext4 filesystem.
> >
> > Hi Lukas,
> >
> > Thanks for this report. Are you willing to try a kernel patch? If
> > so, could you try and see if this fixes your issue. From looking at
> > your block trace, I saw a large number of suspicious 4k writes from
> > the jbd2 layer.
>
> Hi Ted,
>
> This patch doesn't solve the problem. It seems that on average, it makes
> the situation worse, but I'm not sure if this is just anecdotical
> evidence or statistically valid.
>
> Anything else I could try to gather more information?

Two other data points:
- same problem with a 2.6.29-2-amd64 kernel from Debian
- with that 2.6.29-2-amd64, disabling the journal on the ext4 filesytem
using 'tune2fs -O ^has_journal /dev/sda5' makes the problem go away
--
| Lucas Nussbaum Assistant professor @ Univ. de Lorraine |
| [email protected] LORIA / AlGorille |
| http://www.loria.fr/~lnussbau/ +33 3 54 95 86 19 |

2014-03-06 18:45:43

by Lucas Nussbaum

[permalink] [raw]
Subject: Re: [PATCH, RFC] jbd2: don't write non-commit blocks synchronously

On 06/03/14 at 13:27 -0500, Theodore Ts'o wrote:
> Hmm... OK, let me make sure I understand what is going on. So you
> have a single file system which is mounted read/write,

Yes

> and you are
> doing a huge number of copies into the file system, which is keeping
> it busy.

On that minimal system, I'm just copying ~300 MB of data. I wouldn't
qualify it as huge.

> You are then running a huge number of "mount -o remount" on
> that same file system,

at the same time as the data copy, yes.

> which should effectively be no-op's, since the
> remount isn't actually change the read/only or read/write or any other
> mount options. Is that right?

Yes

> Why were you doing the remount in in your actual production workload,
> anyway?

We are booting a large number (hundreds) of LXC containers in order to
setup an experimental environment. Those LXC containers simply use
subdirectories on the ext4 filesystem as root directory.
What we saw is that the boot of LXC containers "deadlocked".

We later discovered that:
- this was caused by Debian's /etc/init.d/checkroot.sh that calls
mount -o remount,defaults,rw
- it was not a deadlock, but rather something looking like severe lock
contention. After a seemingly random amount of time (2 to to 15 mins),
the boot of LXC containers finishes.
- it was possible to reproduce the problem outside of LXC, using the
"write data and do lots of remounts at the same time" setup I
described earlier.
--
| Lucas Nussbaum Assistant professor @ Univ. de Lorraine |
| [email protected] LORIA / AlGorille |
| http://www.loria.fr/~lnussbau/ +33 3 54 95 86 19 |

2014-03-08 16:08:22

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH, RFC] fs: only call sync_filesystem() when remounting read-only

On Wed, Mar 05, 2014 at 03:13:43PM +0100, Lucas Nussbaum wrote:
> TL;DR: we experience long temporary hangs when doing multiple mount -o
> remount at the same time as other I/O on an ext4 filesystem.
>
> When starting hundreds of LXC containers simultaneously on a system, the
> boot of some containers was hanging. We tracked this down to an
> initscript's use of mount -o remount, which was hanging in D state.
>
> We reproduced the problem outside of LXC, with the script available at
> [0]. That script initiates 1000 mount -o remount, and performs some
> writes using a big cp to the same filesystem during the remounts....

+linux-fsdevel since the patch modifies fs/super.c

Lukas, can you try this patch? I'm pretty sure this is what's going
on. It turns out each "mount -o remount" is implying an fsync(), so
your test case is identical to copying a large file while having
thousand of processes calling syncfs() on the file system, with the
predictable results.

Folks on linux-fsdevel, any objections if I carry this patch in the
ext4 tree? I don't think it should cause problems for other file
systems, since any file system that tries to rely on the implied
syncfs() is going to be subject to races, but it might make such a
race condition bug much more visible...

- Ted

commit 8862c3c69acc205b59b00baed67e50446e2fd093
Author: Theodore Ts'o <[email protected]>
Date: Sat Mar 8 11:05:35 2014 -0500

fs: only call sync_filesystem() when remounting read-only

Currently "mount -o remount" always implies an syncfs() on the file
system. This can cause a problem if a workload calls "mount -o
remount" many, many times while concurrent I/O is happening:

http://article.gmane.org/gmane.comp.file-systems.ext4/42876

Whether it would ever be sane for a workload to call "mount -o
remount" gazillions of times when they are effectively no-ops, it
seems stupid for a remount to imply an fsync().

It's possible that there is some file system which is relying on the
implied fsync(), but that's arguably broken, since aside for the
remount read-only case, there's nothing that will prevent other writes
from sneaking in between the sync_filesystem() and the call to
sb->s_op->remount_fs().

Signed-off-by: "Theodore Ts'o" <[email protected]>

diff --git a/fs/super.c b/fs/super.c
index 80d5cf2..0fc87ac 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -717,10 +717,9 @@ int do_remount_sb(struct super_block *sb, int flags, void *data, int force)
if (retval)
return retval;
}
+ sync_filesystem(sb);
}

- sync_filesystem(sb);

2014-03-10 12:08:33

by Lucas Nussbaum

[permalink] [raw]
Subject: Re: [PATCH, RFC] fs: only call sync_filesystem() when remounting read-only

On 08/03/14 at 11:08 -0500, Theodore Ts'o wrote:
> On Wed, Mar 05, 2014 at 03:13:43PM +0100, Lucas Nussbaum wrote:
> > TL;DR: we experience long temporary hangs when doing multiple mount -o
> > remount at the same time as other I/O on an ext4 filesystem.
> >
> > When starting hundreds of LXC containers simultaneously on a system, the
> > boot of some containers was hanging. We tracked this down to an
> > initscript's use of mount -o remount, which was hanging in D state.
> >
> > We reproduced the problem outside of LXC, with the script available at
> > [0]. That script initiates 1000 mount -o remount, and performs some
> > writes using a big cp to the same filesystem during the remounts....
>
> +linux-fsdevel since the patch modifies fs/super.c
>
> Lukas, can you try this patch? I'm pretty sure this is what's going
> on. It turns out each "mount -o remount" is implying an fsync(), so
> your test case is identical to copying a large file while having
> thousand of processes calling syncfs() on the file system, with the
> predictable results.

Hi Ted,

I can confirm that:
1) the patch solves my problem
2) issuing 'sync' instead of 'mount -o remount' indeed exhibits the
problem again

However, I'm curious: why would such a workload (multiple syncfs()
initiated during a write) block for several minutes on an ext4
filesystem? I've just tried again on ext3, and it's not a problem in
that case.

Lucas

2014-03-10 12:15:14

by Lucas Nussbaum

[permalink] [raw]
Subject: Re: [PATCH, RFC] fs: only call sync_filesystem() when remounting read-only

Hi Ted,

That Cc: line:
Cc: [email protected], "[email protected] Emmanuel Jeanvoine" <[email protected]>
sounds wrong. You might want to re-send to linux-fsdevel@.

Thanks

Lucas

On 08/03/14 at 11:08 -0500, Theodore Ts'o wrote:
> On Wed, Mar 05, 2014 at 03:13:43PM +0100, Lucas Nussbaum wrote:
> > TL;DR: we experience long temporary hangs when doing multiple mount -o
> > remount at the same time as other I/O on an ext4 filesystem.
> >
> > When starting hundreds of LXC containers simultaneously on a system, the
> > boot of some containers was hanging. We tracked this down to an
> > initscript's use of mount -o remount, which was hanging in D state.
> >
> > We reproduced the problem outside of LXC, with the script available at
> > [0]. That script initiates 1000 mount -o remount, and performs some
> > writes using a big cp to the same filesystem during the remounts....
>
> +linux-fsdevel since the patch modifies fs/super.c
>
> Lukas, can you try this patch? I'm pretty sure this is what's going
> on. It turns out each "mount -o remount" is implying an fsync(), so
> your test case is identical to copying a large file while having
> thousand of processes calling syncfs() on the file system, with the
> predictable results.
>
> Folks on linux-fsdevel, any objections if I carry this patch in the
> ext4 tree? I don't think it should cause problems for other file
> systems, since any file system that tries to rely on the implied
> syncfs() is going to be subject to races, but it might make such a
> race condition bug much more visible...
>
> - Ted
>
> commit 8862c3c69acc205b59b00baed67e50446e2fd093
> Author: Theodore Ts'o <[email protected]>
> Date: Sat Mar 8 11:05:35 2014 -0500
>
> fs: only call sync_filesystem() when remounting read-only
>
> Currently "mount -o remount" always implies an syncfs() on the file
> system. This can cause a problem if a workload calls "mount -o
> remount" many, many times while concurrent I/O is happening:
>
> http://article.gmane.org/gmane.comp.file-systems.ext4/42876
>
> Whether it would ever be sane for a workload to call "mount -o
> remount" gazillions of times when they are effectively no-ops, it
> seems stupid for a remount to imply an fsync().
>
> It's possible that there is some file system which is relying on the
> implied fsync(), but that's arguably broken, since aside for the
> remount read-only case, there's nothing that will prevent other writes
> from sneaking in between the sync_filesystem() and the call to
> sb->s_op->remount_fs().
>
> Signed-off-by: "Theodore Ts'o" <[email protected]>
>
> diff --git a/fs/super.c b/fs/super.c
> index 80d5cf2..0fc87ac 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -717,10 +717,9 @@ int do_remount_sb(struct super_block *sb, int flags, void *data, int force)
> if (retval)
> return retval;
> }
> + sync_filesystem(sb);
> }
>
> - sync_filesystem(sb);
> -
> if (sb->s_op->remount_fs) {
> retval = sb->s_op->remount_fs(sb, &flags, data);
> if (retval) {
>

--
| Lucas Nussbaum Assistant professor @ Univ. de Lorraine |
| [email protected] LORIA / AlGorille |
| http://www.loria.fr/~lnussbau/ +33 3 54 95 86 19 |

2014-03-10 14:41:28

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH, RFC] fs: only call sync_filesystem() when remounting read-only

On Mon, Mar 10, 2014 at 12:45:08PM +0100, Lucas Nussbaum wrote:
> > Lukas, can you try this patch? I'm pretty sure this is what's going
> > on. It turns out each "mount -o remount" is implying an fsync(), so
> > your test case is identical to copying a large file while having
> > thousand of processes calling syncfs() on the file system, with the
> > predictable results.
>
> Hi Ted,
>
> I can confirm that:
> 1) the patch solves my problem
> 2) issuing 'sync' instead of 'mount -o remount' indeed exhibits the
> problem again
>
> However, I'm curious: why would such a workload (multiple syncfs()
> initiated during a write) block for several minutes on an ext4
> filesystem? I've just tried again on ext3, and it's not a problem in
> that case.

The reason why is because ext3 is less careful than ext4.
ext3_sync_fs() simply tries to start a commit, and if there is already
a commit already started, it does nothing. So if you issue a
gazillion syncfs() calls, with ext3, it's a no-op.

For ext4, each syncfs() call will result in a SYNC_CACHE flushh being
sent to the disk:

/*
* Data writeback is possible w/o journal transaction, so barrier must
* being sent at the end of the function. But we can skip it if
* transaction_commit will do it for us.
*/
target = jbd2_get_latest_transaction(sbi->s_journal);
if (wait && sbi->s_journal->j_flags & JBD2_BARRIER &&
!jbd2_trans_will_send_data_barrier(sbi->s_journal, target))
needs_barrier = true;
.
.
.
if (needs_barrier) {
int err;
err = blkdev_issue_flush(sb->s_bdev, GFP_KERNEL, NULL);
if (!ret)
ret = err;
}

We can debate whether or not this care is necessary, and since
syncfs() isn't terribly reliable, we could add hacks so that if an
syncfs() had been issued in the last 100ms, we could make it be a
no-op, or some other horrible hack.

But given that these hacks are horrible, it's not clear that it's
worth it to do all of this just to something where userspace is doing
something really stupid, whether it is issuing thousands of syncfs()
or "mount -o remount" requests per second.

Cheers,

- Ted

2014-03-13 00:36:38

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH, RFC] fs: only call sync_filesystem() when remounting read-only

On Sat, Mar 08, 2014 at 11:08:18AM -0500, Theodore Ts'o wrote:
> On Wed, Mar 05, 2014 at 03:13:43PM +0100, Lucas Nussbaum wrote:
> > TL;DR: we experience long temporary hangs when doing multiple mount -o
> > remount at the same time as other I/O on an ext4 filesystem.
> >
> > When starting hundreds of LXC containers simultaneously on a system, the
> > boot of some containers was hanging. We tracked this down to an
> > initscript's use of mount -o remount, which was hanging in D state.
> >
> > We reproduced the problem outside of LXC, with the script available at
> > [0]. That script initiates 1000 mount -o remount, and performs some
> > writes using a big cp to the same filesystem during the remounts....
>
> +linux-fsdevel since the patch modifies fs/super.c
>
> Lukas, can you try this patch? I'm pretty sure this is what's going
> on. It turns out each "mount -o remount" is implying an fsync(), so
> your test case is identical to copying a large file while having
> thousand of processes calling syncfs() on the file system, with the
> predictable results.
>
> Folks on linux-fsdevel, any objections if I carry this patch in the
> ext4 tree? I don't think it should cause problems for other file
> systems, since any file system that tries to rely on the implied
> syncfs() is going to be subject to races, but it might make such a
> race condition bug much more visible...

IMO, I think that you should be looking to fix ext4 syncfs issues,
not changing the VFS behaviour that might cause subtle and unnoticed
problems for other filesystems. We should not be moving data
inegrity operations without first auditing of all the filesystem
remount operations for issues.

Cheers,

Dave.
--
Dave Chinner
[email protected]

2014-03-13 01:16:35

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH, RFC] fs: only call sync_filesystem() when remounting read-only

On Thu, Mar 13, 2014 at 11:36:35AM +1100, Dave Chinner wrote:
> > Folks on linux-fsdevel, any objections if I carry this patch in the
> > ext4 tree? I don't think it should cause problems for other file
> > systems, since any file system that tries to rely on the implied
> > syncfs() is going to be subject to races, but it might make such a
> > race condition bug much more visible...
>
> IMO, I think that you should be looking to fix ext4 syncfs issues,
> not changing the VFS behaviour that might cause subtle and unnoticed
> problems for other filesystems. We should not be moving data
> inegrity operations without first auditing of all the filesystem
> remount operations for issues.

The issue is that it's forcing a CACHE FLUSH if we don't need to force
a journal commit, since it's possible that data writes could have been
sent to the disk without modifying fs metadata that would require a
commit. So arguably what we're doing with ext4 is _correct_, where as
with ext3 we would simply not calling blkdev_issue_barrier() in that
situation.

The issue is that if userspace executes a no-op remount, there
shouldn't be a reason to call sync_filesystem() at all. But I'm also
not so sure that I should be that solicitous of a workload where
someone is calling thousands and thousands of no-op remounts.....

- Ted

2014-03-13 03:14:53

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH, RFC] fs: only call sync_filesystem() when remounting read-only

On Wed, Mar 12, 2014 at 09:16:29PM -0400, Theodore Ts'o wrote:
> > IMO, I think that you should be looking to fix ext4 syncfs issues,
> > not changing the VFS behaviour that might cause subtle and unnoticed
> > problems for other filesystems. We should not be moving data
> > inegrity operations without first auditing of all the filesystem
> > remount operations for issues.
>
> The issue is that it's forcing a CACHE FLUSH if we don't need to force
> a journal commit, since it's possible that data writes could have been
> sent to the disk without modifying fs metadata that would require a
> commit. So arguably what we're doing with ext4 is _correct_, where as
> with ext3 we would simply not calling blkdev_issue_barrier() in that
> situation.

Doing some more digging, ext4 is currently interpreting syncfs() as
requiring a data integrity sync. So we go through some extra work to
guarantee that we call blkdev_issue_barrier(), even if a journal
commit is not required.

This change was made by Dmitry last June:

commit 06a407f13daf9e48f0ef7189c7e54082b53940c7
Author: Dmitry Monakhov <[email protected]>
Date: Wed Jun 12 22:25:07 2013 -0400

ext4: fix data integrity for ext4_sync_fs

Inode's data or non journaled quota may be written w/o jounral so we
_must_ send a barrier at the end of ext4_sync_fs. But it can be
skipped if journal commit will do it for us.

Also fix data integrity for nojournal mode.

Signed-off-by: Dmitry Monakhov <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>

Both ext3 and xfs do *not* do this.

Looking more closely at the syncfs(2) manpage, it's not clear it
requires this:

sync() causes all buffered modifications to file metadata and
data to be written to the underlying filesystems.

syncfs() is like sync(), but synchronizes just the filesystem
containing file referred to by the open file descriptor fd.

Unlike the fsync(2) system call, it does *not* state that the data
flushed to the disk is guaranteed to be there after a crash, which I
suppose justifies ext3 and xfs's current behavior.

So the way I see it, we have three choices.

1) Nowhere in the remount system call is it stated that it has
***any*** data integrity implifications. If you are making the rw->ro
transition, sure, you'll need to flush out any pending changes. But there
doesn't seem to be any justification for requiring this this if the
remount is a no-op. So I think changing the remount code path as I
suggested is a valid option.

2) We could revert Dmitry's change from last June. This would make
ext4 work the same way as ext3 and xfs. Which I think is also
valid, since the syncfs(2) system call says nothing about
guaranteeing data being preserved after a crash, unlike fsync(2).

3) We could say that a workload that calls thousands of no-op remounts
to be stupid/busted/silly, and not do anything at all.


#1 requires core VFS changes, and Dave seems unhappy with it.

#2 requires rolling back an ext4 change, and I wonder if Dmitry had a
situation where he really needed syncfs(2) to have data integrity
guarantees.

#3 is the default if we can't come to consensus over what else to do.

Cheers,

- Ted

2014-03-13 06:04:44

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH, RFC] fs: only call sync_filesystem() when remounting read-only

On Wed, Mar 12, 2014 at 11:14:40PM -0400, Theodore Ts'o wrote:
> On Wed, Mar 12, 2014 at 09:16:29PM -0400, Theodore Ts'o wrote:
> > > IMO, I think that you should be looking to fix ext4 syncfs issues,
> > > not changing the VFS behaviour that might cause subtle and unnoticed
> > > problems for other filesystems. We should not be moving data
> > > inegrity operations without first auditing of all the filesystem
> > > remount operations for issues.
> >
> > The issue is that it's forcing a CACHE FLUSH if we don't need to force
> > a journal commit, since it's possible that data writes could have been
> > sent to the disk without modifying fs metadata that would require a
> > commit. So arguably what we're doing with ext4 is _correct_, where as
> > with ext3 we would simply not calling blkdev_issue_barrier() in that
> > situation.
>
> Doing some more digging, ext4 is currently interpreting syncfs() as
> requiring a data integrity sync. So we go through some extra work to
> guarantee that we call blkdev_issue_barrier(), even if a journal
> commit is not required.
>
> This change was made by Dmitry last June:
>
> commit 06a407f13daf9e48f0ef7189c7e54082b53940c7
> Author: Dmitry Monakhov <[email protected]>
> Date: Wed Jun 12 22:25:07 2013 -0400
>
> ext4: fix data integrity for ext4_sync_fs
>
> Inode's data or non journaled quota may be written w/o jounral so we
> _must_ send a barrier at the end of ext4_sync_fs. But it can be
> skipped if journal commit will do it for us.
>
> Also fix data integrity for nojournal mode.
>
> Signed-off-by: Dmitry Monakhov <[email protected]>
> Signed-off-by: "Theodore Ts'o" <[email protected]>
>
> Both ext3 and xfs do *not* do this.

XFS most definitely considered sync_filesystem(sb) to be a data
integrity operation. xfs_fs_sync_fs() calls this:

xfs_log_force(mp, XFS_LOG_SYNC);

Which will issue a blocking journal commit which will uses
REQ_FLUSH|REQ_FUA for the journal writes. Hence if there was
anything dirty in the filesystem that sync_filesystem wrote to disk,
it will issue a cache flush just like ext4 does.

> Looking more closely at the syncfs(2) manpage, it's not clear it
> requires this:
>
> sync() causes all buffered modifications to file metadata and
> data to be written to the underlying filesystems.
>
> syncfs() is like sync(), but synchronizes just the filesystem
> containing file referred to by the open file descriptor fd.
>
> Unlike the fsync(2) system call, it does *not* state that the data
> flushed to the disk is guaranteed to be there after a crash, which I
> suppose justifies ext3 and xfs's current behavior.

sync() data integrity is provided by sync_inodes_sb() followed by
->sync_fs().

syncfs() data integrity is provided by sync_inodes_sb() followed by
->sync_fs().

They are functionally identical and so provide the same guarantees.
That is the intent of the syncfs syscall, and there are lots of
people out there using it for data integrity purposes.

> 1) Nowhere in the remount system call is it stated that it has
> ***any*** data integrity implifications. If you are making the rw->ro
> transition, sure, you'll need to flush out any pending changes. But there
> doesn't seem to be any justification for requiring this this if the
> remount is a no-op. So I think changing the remount code path as I
> suggested is a valid option.

What the man page says doesn't change the fact we need to audit all
the existing filesystems before such a change is made.

> 2) We could revert Dmitry's change from last June. This would make
> ext4 work the same way as ext3 and xfs. Which I think is also
> valid, since the syncfs(2) system call says nothing about
> guaranteeing data being preserved after a crash, unlike fsync(2).

It wouldmake ext4 the same as ext3. XFS definitely guarantees
data integrity through syncfs()....

> 3) We could say that a workload that calls thousands of no-op remounts
> to be stupid/busted/silly, and not do anything at all.

Sure, but the reporter indicated that if he replaced the remount
with sync then the problem still existed. IOWs, you haven't fixed
root cause of the problem, merely papered over a symptom.

> #1 requires core VFS changes, and Dave seems unhappy with it.

I'm unhappy with your approach to the change (i.e. no validation of
the assertions made about other filesystems), not about the actual
change.

So:

4) fix ext4 not to issue unnecessary cache flushes, or find and fix
whatever is actually causing sync to be slow (maybe some of these
issues: https://lwn.net/Articles/561569/).

Cheers,

Dave.
--
Dave Chinner
[email protected]

2014-03-13 07:19:30

by Dave Chinner

[permalink] [raw]
Subject: Re: Extremely slow remounts with concurrent I/O

On Wed, Mar 05, 2014 at 03:13:43PM +0100, Lucas Nussbaum wrote:
> TL;DR: we experience long temporary hangs when doing multiple mount -o
> remount at the same time as other I/O on an ext4 filesystem.
>
> Hi,
>
> When starting hundreds of LXC containers simultaneously on a system, the
> boot of some containers was hanging. We tracked this down to an
> initscript's use of mount -o remount, which was hanging in D state.
>
> We reproduced the problem outside of LXC, with the script available at
> [0]. That script initiates 1000 mount -o remount, and performs some
> writes using a big cp to the same filesystem during the remounts.
....
> Some other things we tried:
> 1) we tried to 'sync' after removing the files, and dropping the caches
> (as shown in the commented lines in [0]). That makes the problem disappear
> (or at least makes it less frequent). The overall script execution is
> actually faster with the post-rm sync and dropping caches than without
> them!
>
> 2) We tried switching to the noop scheduler (instead of cfq). The problem
> could still be reproduced. A btrace dump with noop is available at [2].
>
> 3) We tried with ext3 instead of ext4. The problem could never be
> reproduced.
>
> 4) We tried on different machines, and we could reproduce the problem.
> However, on a machine with SSD drives, we were not able to reproduce the
> problem.
>
> Any ideas?

If this really is caused by sync on ext4 being slow while there are
concurrent writers, then perhaps:

http://marc.info/?l=linux-ext4&m=139388721931428&w=2

is a possible fix...

Cheers,

Dave.
--
Dave Chinner
[email protected]

2014-03-13 07:39:40

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH, RFC] fs: only call sync_filesystem() when remounting read-only

On Thu, Mar 13, 2014 at 11:36:35AM +1100, Dave Chinner wrote:
> IMO, I think that you should be looking to fix ext4 syncfs issues,
> not changing the VFS behaviour that might cause subtle and unnoticed
> problems for other filesystems. We should not be moving data
> inegrity operations without first auditing of all the filesystem
> remount operations for issues.

Requiring a sync on every remount that doesn't go read-only seems odd
to me, so removing it doesn't sound bad. However I agree that a proper
audit of filesystems should be done, e.g.:

patch 1:
move calls into the filesystems, explaining why filesystems not
implementing ->remount_fs should be safe
patch 2:
remove call from ext4, safe because of $FOO
patch b:
remove call from $fs, safe because of $BAR


2014-03-13 14:07:44

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH, RFC] fs: only call sync_filesystem() when remounting read-only

On Thu, Mar 13, 2014 at 05:04:44PM +1100, Dave Chinner wrote:
> XFS most definitely considered sync_filesystem(sb) to be a data
> integrity operation. xfs_fs_sync_fs() calls this:
>
> xfs_log_force(mp, XFS_LOG_SYNC);
>
> Which will issue a blocking journal commit which will uses
> REQ_FLUSH|REQ_FUA for the journal writes. Hence if there was
> anything dirty in the filesystem that sync_filesystem wrote to disk,
> it will issue a cache flush just like ext4 does.

Sorry, I didn't quite do my experimets right, apparetly.

So I did the following test:

dd if=/etc/motd of=/mnt/test conv=notrunc ; mount -o remount /mnt ; mount -o remount /mnt ; mount -o remount /mnt

This is what XFS does:

252,2 0 1 0.000000000 15453 Q W 88 + 8 [kworker/u16:1]
252,2 0 2 0.000230348 15453 C W 88 + 8 [0]
252,2 7 1 0.000334892 15544 Q FWFSM 5243112 + 8 [mount]
252,2 0 3 0.251131828 0 C WFSM 5243112 + 8 [0]
252,2 0 4 0.251495890 15087 Q WM 96 + 16 [xfsaild/dm-2]
252,2 0 5 0.251729470 0 C WM 96 + 16 [0]
252,2 4 1 0.263317936 11070 Q FWFSM 5243120 + 8 [kworker/4:2]
252,2 0 6 0.273394400 0 C WFSM 5243120 + 8 [0]
252,2 0 7 0.273678692 15087 Q WM 0 + 8 [xfsaild/dm-2]
252,2 0 8 0.273902550 0 C WM 0 + 8 [0]
252,2 0 9 0.295673237 0 C WFSM 5243128 + 8 [0]
252,2 0 10 0.296035803 15087 Q WM 0 + 8 [xfsaild/dm-2]
252,2 0 11 0.296266732 0 C WM 0 + 8 [0]
252,2 7 2 0.286271844 24012 Q FWFSM 5243128 + 8 [kworker/7:0]

... and this is what ext4 does:

252,4 7 8 10.973326622 15512 Q WM 78 + 2 [mount]
252,4 7 9 10.973576941 15512 Q FWS [mount]
252,4 2 1 10.973141488 15271 Q W 108 + 2 [kworker/u16:3]
252,4 0 24 10.973390538 0 C W 108 + 2 [0]
252,4 0 25 10.973548736 0 C WM 78 + 2 [0]
252,4 7 10 11.244052462 15513 Q FWS [mount]
252,4 0 26 11.231292967 0 C FWS 0 [0]
252,4 0 27 11.231452643 15512 Q WS 2 + 2 [mount]
252,4 0 28 11.231686794 0 C WS 2 + 2 [0]
252,4 7 11 11.266337812 15514 Q FWS [mount]
252,4 0 29 11.253022650 0 C FWS 0 [0]
252,4 0 30 11.253135113 15513 Q WS 2 + 2 [mount]
252,4 0 31 11.253376707 0 C WS 2 + 2 [0]
252,4 0 32 11.266640135 0 C FWS 0 [0]
252,4 0 33 11.266727461 15514 Q WS 2 + 2 [mount]
252,4 0 34 11.266954710 0 C WS 2 + 2 [0]

> > 1) Nowhere in the remount system call is it stated that it has
> > ***any*** data integrity implications. If you are making the rw->ro
> > transition, sure, you'll need to flush out any pending changes. But there
> > doesn't seem to be any justification for requiring this this if the
> > remount is a no-op. So I think changing the remount code path as I
> > suggested is a valid option.
>
> What the man page says doesn't change the fact we need to audit all
> the existing filesystems before such a change is made.

Fair enough, I'll do what Christoph suggested and move the
sync_filesystems() call into all of the existing file systems.

- Ted

2014-03-13 14:20:56

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH] fs: push sync_filesystem() down to the file system's remount_fs()

Previously, the no-op "mount -o mount /dev/xxx" operation when the
file system is already mounted read-write causes an implied,
unconditional syncfs(). This seems pretty stupid, and it's certainly
documented or guaraunteed to do this, nor is it particularly useful,
except in the case where the file system was mounted rw and is getting
remounted read-only.

However, it's possible that there might be some file systems that are
actually depending on this behavior. In most file systems, it's
probably fine to only call sync_filesystem() when transitioning from
read-write to read-only, and there are some file systems where this is
not needed at all (for example, for a pseudo-filesystem or something
like romfs).

Signed-off-by: "Theodore Ts'o" <[email protected]>
Cc: [email protected]
Cc: Christoph Hellwig <[email protected]>
Cc: Artem Bityutskiy <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Evgeniy Dushistov <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: OGAWA Hirofumi <[email protected]>
Cc: Anders Larsen <[email protected]>
Cc: Phillip Lougher <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Mikulas Patocka <[email protected]>
Cc: Petr Vandrovec <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
---
fs/adfs/super.c | 1 +
fs/affs/super.c | 1 +
fs/befs/linuxvfs.c | 1 +
fs/btrfs/super.c | 1 +
fs/cifs/cifsfs.c | 1 +
fs/coda/inode.c | 1 +
fs/cramfs/inode.c | 1 +
fs/debugfs/inode.c | 1 +
fs/devpts/inode.c | 1 +
fs/efs/super.c | 1 +
fs/ext2/super.c | 1 +
fs/ext3/super.c | 2 ++
fs/ext4/super.c | 2 ++
fs/f2fs/super.c | 2 ++
fs/fat/inode.c | 2 ++
fs/freevxfs/vxfs_super.c | 1 +
fs/fuse/inode.c | 1 +
fs/gfs2/super.c | 2 ++
fs/hfs/super.c | 1 +
fs/hfsplus/super.c | 1 +
fs/hpfs/super.c | 2 ++
fs/isofs/inode.c | 1 +
fs/jffs2/super.c | 1 +
fs/jfs/super.c | 1 +
fs/minix/inode.c | 1 +
fs/ncpfs/inode.c | 1 +
fs/nfs/super.c | 2 ++
fs/nilfs2/super.c | 1 +
fs/ntfs/super.c | 2 ++
fs/ocfs2/super.c | 2 ++
fs/openpromfs/inode.c | 1 +
fs/proc/root.c | 2 ++
fs/pstore/inode.c | 1 +
fs/qnx4/inode.c | 1 +
fs/qnx6/inode.c | 1 +
fs/reiserfs/super.c | 1 +
fs/romfs/super.c | 1 +
fs/squashfs/super.c | 1 +
fs/super.c | 2 --
fs/sysv/inode.c | 1 +
fs/ubifs/super.c | 1 +
fs/udf/super.c | 1 +
fs/ufs/super.c | 1 +
fs/xfs/xfs_super.c | 1 +
44 files changed, 53 insertions(+), 2 deletions(-)

diff --git a/fs/adfs/super.c b/fs/adfs/super.c
index 7b3003c..952aeb0 100644
--- a/fs/adfs/super.c
+++ b/fs/adfs/super.c
@@ -212,6 +212,7 @@ static int parse_options(struct super_block *sb, char *options)

static int adfs_remount(struct super_block *sb, int *flags, char *data)
{
+ sync_filesystem(sb);
*flags |= MS_NODIRATIME;
return parse_options(sb, data);
}
diff --git a/fs/affs/super.c b/fs/affs/super.c
index d098731..3074530 100644
--- a/fs/affs/super.c
+++ b/fs/affs/super.c
@@ -530,6 +530,7 @@ affs_remount(struct super_block *sb, int *flags, char *data)

pr_debug("AFFS: remount(flags=0x%x,opts=\"%s\")\n",*flags,data);

+ sync_filesystem(sb);
*flags |= MS_NODIRATIME;

memcpy(volume, sbi->s_volume, 32);
diff --git a/fs/befs/linuxvfs.c b/fs/befs/linuxvfs.c
index 845d2d6..56d70c8 100644
--- a/fs/befs/linuxvfs.c
+++ b/fs/befs/linuxvfs.c
@@ -913,6 +913,7 @@ befs_fill_super(struct super_block *sb, void *data, int silent)
static int
befs_remount(struct super_block *sb, int *flags, char *data)
{
+ sync_filesystem(sb);
if (!(*flags & MS_RDONLY))
return -EINVAL;
return 0;
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 97cc241..00cd0c5 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1381,6 +1381,7 @@ static int btrfs_remount(struct super_block *sb, int *flags, char *data)
unsigned int old_metadata_ratio = fs_info->metadata_ratio;
int ret;

+ sync_filesystem(sb);
btrfs_remount_prepare(fs_info);

ret = btrfs_parse_options(root, data);
diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
index 849f613..4942c94 100644
--- a/fs/cifs/cifsfs.c
+++ b/fs/cifs/cifsfs.c
@@ -541,6 +541,7 @@ static int cifs_show_stats(struct seq_file *s, struct dentry *root)

static int cifs_remount(struct super_block *sb, int *flags, char *data)
{
+ sync_filesystem(sb);
*flags |= MS_NODIRATIME;
return 0;
}
diff --git a/fs/coda/inode.c b/fs/coda/inode.c
index 506de34..3f48000 100644
--- a/fs/coda/inode.c
+++ b/fs/coda/inode.c
@@ -96,6 +96,7 @@ void coda_destroy_inodecache(void)

static int coda_remount(struct super_block *sb, int *flags, char *data)
{
+ sync_filesystem(sb);
*flags |= MS_NOATIME;
return 0;
}
diff --git a/fs/cramfs/inode.c b/fs/cramfs/inode.c
index 06610cf..a275911 100644
--- a/fs/cramfs/inode.c
+++ b/fs/cramfs/inode.c
@@ -244,6 +244,7 @@ static void cramfs_kill_sb(struct super_block *sb)

static int cramfs_remount(struct super_block *sb, int *flags, char *data)
{
+ sync_filesystem(sb);
*flags |= MS_RDONLY;
return 0;
}
diff --git a/fs/debugfs/inode.c b/fs/debugfs/inode.c
index 9c0444c..02928a9 100644
--- a/fs/debugfs/inode.c
+++ b/fs/debugfs/inode.c
@@ -218,6 +218,7 @@ static int debugfs_remount(struct super_block *sb, int *flags, char *data)
int err;
struct debugfs_fs_info *fsi = sb->s_fs_info;

+ sync_filesystem(sb);
err = debugfs_parse_options(data, &fsi->mount_opts);
if (err)
goto fail;
diff --git a/fs/devpts/inode.c b/fs/devpts/inode.c
index a726b9f..c710380 100644
--- a/fs/devpts/inode.c
+++ b/fs/devpts/inode.c
@@ -313,6 +313,7 @@ static int devpts_remount(struct super_block *sb, int *flags, char *data)
struct pts_fs_info *fsi = DEVPTS_SB(sb);
struct pts_mount_opts *opts = &fsi->mount_opts;

+ sync_filesystem(sb);
err = parse_mount_options(data, PARSE_REMOUNT, opts);

/*
diff --git a/fs/efs/super.c b/fs/efs/super.c
index 50215bb..103bbd8 100644
--- a/fs/efs/super.c
+++ b/fs/efs/super.c
@@ -114,6 +114,7 @@ static void destroy_inodecache(void)

static int efs_remount(struct super_block *sb, int *flags, char *data)
{
+ sync_filesystem(sb);
*flags |= MS_RDONLY;
return 0;
}
diff --git a/fs/ext2/super.c b/fs/ext2/super.c
index 20d6697..d260115 100644
--- a/fs/ext2/super.c
+++ b/fs/ext2/super.c
@@ -1254,6 +1254,7 @@ static int ext2_remount (struct super_block * sb, int * flags, char * data)
unsigned long old_sb_flags;
int err;

+ sync_filesystem(sb);
spin_lock(&sbi->s_lock);

/* Store the old options */
diff --git a/fs/ext3/super.c b/fs/ext3/super.c
index 37fd31e..95c6c5a 100644
--- a/fs/ext3/super.c
+++ b/fs/ext3/super.c
@@ -2649,6 +2649,8 @@ static int ext3_remount (struct super_block * sb, int * flags, char * data)
int i;
#endif

+ sync_filesystem(sb);
+
/* Store the original options */
old_sb_flags = sb->s_flags;
old_opts.s_mount_opt = sbi->s_mount_opt;
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index f5c13b8..aa3842f 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -4767,6 +4767,8 @@ static int ext4_remount(struct super_block *sb, int *flags, char *data)
#endif
char *orig_data = kstrdup(data, GFP_KERNEL);

+ sync_filesystem(sb);
+
/* Store the original options */
old_sb_flags = sb->s_flags;
old_opts.s_mount_opt = sbi->s_mount_opt;
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 1a85f83..856bdf9 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -568,6 +568,8 @@ static int f2fs_remount(struct super_block *sb, int *flags, char *data)
struct f2fs_mount_info org_mount_opt;
int err, active_logs;

+ sync_filesystem(sb);
+
/*
* Save the old mount options in case we
* need to restore them.
diff --git a/fs/fat/inode.c b/fs/fat/inode.c
index 854b578..343e477 100644
--- a/fs/fat/inode.c
+++ b/fs/fat/inode.c
@@ -635,6 +635,8 @@ static int fat_remount(struct super_block *sb, int *flags, char *data)
struct msdos_sb_info *sbi = MSDOS_SB(sb);
*flags |= MS_NODIRATIME | (sbi->options.isvfat ? 0 : MS_NOATIME);

+ sync_filesystem(sb);
+
/* make sure we update state on remount. */
new_rdonly = *flags & MS_RDONLY;
if (new_rdonly != (sb->s_flags & MS_RDONLY)) {
diff --git a/fs/freevxfs/vxfs_super.c b/fs/freevxfs/vxfs_super.c
index e37eb27..7ca8c75 100644
--- a/fs/freevxfs/vxfs_super.c
+++ b/fs/freevxfs/vxfs_super.c
@@ -124,6 +124,7 @@ vxfs_statfs(struct dentry *dentry, struct kstatfs *bufp)

static int vxfs_remount(struct super_block *sb, int *flags, char *data)
{
+ sync_filesystem(sb);
*flags |= MS_RDONLY;
return 0;
}
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index d468643..ecdb255d 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -135,6 +135,7 @@ static void fuse_evict_inode(struct inode *inode)

static int fuse_remount_fs(struct super_block *sb, int *flags, char *data)
{
+ sync_filesystem(sb);
if (*flags & MS_MANDLOCK)
return -EINVAL;

diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index 60f60f6..4c6dd50 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -1175,6 +1175,8 @@ static int gfs2_remount_fs(struct super_block *sb, int *flags, char *data)
struct gfs2_tune *gt = &sdp->sd_tune;
int error;

+ sync_filesystem(sb);
+
spin_lock(&gt->gt_spin);
args.ar_commit = gt->gt_logd_secs;
args.ar_quota_quantum = gt->gt_quota_quantum;
diff --git a/fs/hfs/super.c b/fs/hfs/super.c
index 2d2039e..eee7206 100644
--- a/fs/hfs/super.c
+++ b/fs/hfs/super.c
@@ -112,6 +112,7 @@ static int hfs_statfs(struct dentry *dentry, struct kstatfs *buf)

static int hfs_remount(struct super_block *sb, int *flags, char *data)
{
+ sync_filesystem(sb);
*flags |= MS_NODIRATIME;
if ((*flags & MS_RDONLY) == (sb->s_flags & MS_RDONLY))
return 0;
diff --git a/fs/hfsplus/super.c b/fs/hfsplus/super.c
index 80875aa..8eb787b 100644
--- a/fs/hfsplus/super.c
+++ b/fs/hfsplus/super.c
@@ -323,6 +323,7 @@ static int hfsplus_statfs(struct dentry *dentry, struct kstatfs *buf)

static int hfsplus_remount(struct super_block *sb, int *flags, char *data)
{
+ sync_filesystem(sb);
if ((*flags & MS_RDONLY) == (sb->s_flags & MS_RDONLY))
return 0;
if (!(*flags & MS_RDONLY)) {
diff --git a/fs/hpfs/super.c b/fs/hpfs/super.c
index 4534ff6..fe3463a 100644
--- a/fs/hpfs/super.c
+++ b/fs/hpfs/super.c
@@ -421,6 +421,8 @@ static int hpfs_remount_fs(struct super_block *s, int *flags, char *data)
struct hpfs_sb_info *sbi = hpfs_sb(s);
char *new_opts = kstrdup(data, GFP_KERNEL);

+ sync_filesystem(s);
+
*flags |= MS_NOATIME;

hpfs_lock(s);
diff --git a/fs/isofs/inode.c b/fs/isofs/inode.c
index 4a9e10e..6af66ee 100644
--- a/fs/isofs/inode.c
+++ b/fs/isofs/inode.c
@@ -117,6 +117,7 @@ static void destroy_inodecache(void)

static int isofs_remount(struct super_block *sb, int *flags, char *data)
{
+ sync_filesystem(sb);
if (!(*flags & MS_RDONLY))
return -EROFS;
return 0;
diff --git a/fs/jffs2/super.c b/fs/jffs2/super.c
index 0defb1c..0918f0e 100644
--- a/fs/jffs2/super.c
+++ b/fs/jffs2/super.c
@@ -243,6 +243,7 @@ static int jffs2_remount_fs(struct super_block *sb, int *flags, char *data)
struct jffs2_sb_info *c = JFFS2_SB_INFO(sb);
int err;

+ sync_filesystem(sb);
err = jffs2_parse_options(c, data);
if (err)
return -EINVAL;
diff --git a/fs/jfs/super.c b/fs/jfs/super.c
index e2b7483..97f7fda 100644
--- a/fs/jfs/super.c
+++ b/fs/jfs/super.c
@@ -418,6 +418,7 @@ static int jfs_remount(struct super_block *sb, int *flags, char *data)
int flag = JFS_SBI(sb)->flag;
int ret;

+ sync_filesystem(sb);
if (!parse_options(data, sb, &newLVSize, &flag)) {
return -EINVAL;
}
diff --git a/fs/minix/inode.c b/fs/minix/inode.c
index 0332109..dcdc298 100644
--- a/fs/minix/inode.c
+++ b/fs/minix/inode.c
@@ -123,6 +123,7 @@ static int minix_remount (struct super_block * sb, int * flags, char * data)
struct minix_sb_info * sbi = minix_sb(sb);
struct minix_super_block * ms;

+ sync_filesystem(sb);
ms = sbi->s_ms;
if ((*flags & MS_RDONLY) == (sb->s_flags & MS_RDONLY))
return 0;
diff --git a/fs/ncpfs/inode.c b/fs/ncpfs/inode.c
index 2cf2ebe..5f86e80 100644
--- a/fs/ncpfs/inode.c
+++ b/fs/ncpfs/inode.c
@@ -99,6 +99,7 @@ static void destroy_inodecache(void)

static int ncp_remount(struct super_block *sb, int *flags, char* data)
{
+ sync_filesystem(sb);
*flags |= MS_NODIRATIME;
return 0;
}
diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index 910ed90..2cb5694 100644
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -2215,6 +2215,8 @@ nfs_remount(struct super_block *sb, int *flags, char *raw_data)
struct nfs4_mount_data *options4 = (struct nfs4_mount_data *)raw_data;
u32 nfsvers = nfss->nfs_client->rpc_ops->version;

+ sync_filesystem(sb);
+
/*
* Userspace mount programs that send binary options generally send
* them populated with default values. We have no way to know which
diff --git a/fs/nilfs2/super.c b/fs/nilfs2/super.c
index 7ac2a12..8c532b2 100644
--- a/fs/nilfs2/super.c
+++ b/fs/nilfs2/super.c
@@ -1129,6 +1129,7 @@ static int nilfs_remount(struct super_block *sb, int *flags, char *data)
unsigned long old_mount_opt;
int err;

+ sync_filesystem(sb);
old_sb_flags = sb->s_flags;
old_mount_opt = nilfs->ns_mount_opt;

diff --git a/fs/ntfs/super.c b/fs/ntfs/super.c
index 82650d5..bd5610d 100644
--- a/fs/ntfs/super.c
+++ b/fs/ntfs/super.c
@@ -468,6 +468,8 @@ static int ntfs_remount(struct super_block *sb, int *flags, char *opt)

ntfs_debug("Entering with remount options string: %s", opt);

+ sync_filesystem(sb);
+
#ifndef NTFS_RW
/* For read-only compiled driver, enforce read-only flag. */
*flags |= MS_RDONLY;
diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
index 49d84f8..5f9bf8f 100644
--- a/fs/ocfs2/super.c
+++ b/fs/ocfs2/super.c
@@ -631,6 +631,8 @@ static int ocfs2_remount(struct super_block *sb, int *flags, char *data)
struct ocfs2_super *osb = OCFS2_SB(sb);
u32 tmp;

+ sync_filesystem(sb);
+
if (!ocfs2_parse_options(sb, data, &parsed_options, 1) ||
!ocfs2_check_set_options(sb, &parsed_options)) {
ret = -EINVAL;
diff --git a/fs/openpromfs/inode.c b/fs/openpromfs/inode.c
index 8c0ceb8..15e4500 100644
--- a/fs/openpromfs/inode.c
+++ b/fs/openpromfs/inode.c
@@ -368,6 +368,7 @@ static struct inode *openprom_iget(struct super_block *sb, ino_t ino)

static int openprom_remount(struct super_block *sb, int *flags, char *data)
{
+ sync_filesystem(sb);
*flags |= MS_NOATIME;
return 0;
}
diff --git a/fs/proc/root.c b/fs/proc/root.c
index 87dbcbe..ac823a8 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -92,6 +92,8 @@ static int proc_parse_options(char *options, struct pid_namespace *pid)
int proc_remount(struct super_block *sb, int *flags, char *data)
{
struct pid_namespace *pid = sb->s_fs_info;
+
+ sync_filesystem(sb);
return !proc_parse_options(data, pid);
}

diff --git a/fs/pstore/inode.c b/fs/pstore/inode.c
index 1282384..192297b 100644
--- a/fs/pstore/inode.c
+++ b/fs/pstore/inode.c
@@ -249,6 +249,7 @@ static void parse_options(char *options)

static int pstore_remount(struct super_block *sb, int *flags, char *data)
{
+ sync_filesystem(sb);
parse_options(data);

return 0;
diff --git a/fs/qnx4/inode.c b/fs/qnx4/inode.c
index 8955881..c4bcb77 100644
--- a/fs/qnx4/inode.c
+++ b/fs/qnx4/inode.c
@@ -44,6 +44,7 @@ static int qnx4_remount(struct super_block *sb, int *flags, char *data)
{
struct qnx4_sb_info *qs;

+ sync_filesystem(sb);
qs = qnx4_sb(sb);
qs->Version = QNX4_VERSION;
*flags |= MS_RDONLY;
diff --git a/fs/qnx6/inode.c b/fs/qnx6/inode.c
index 8d941ed..65cdaab 100644
--- a/fs/qnx6/inode.c
+++ b/fs/qnx6/inode.c
@@ -55,6 +55,7 @@ static int qnx6_show_options(struct seq_file *seq, struct dentry *root)

static int qnx6_remount(struct super_block *sb, int *flags, char *data)
{
+ sync_filesystem(sb);
*flags |= MS_RDONLY;
return 0;
}
diff --git a/fs/reiserfs/super.c b/fs/reiserfs/super.c
index 2c80335..abf2b76 100644
--- a/fs/reiserfs/super.c
+++ b/fs/reiserfs/super.c
@@ -1319,6 +1319,7 @@ static int reiserfs_remount(struct super_block *s, int *mount_flags, char *arg)
int i;
#endif

+ sync_filesystem(s);
reiserfs_write_lock(s);

#ifdef CONFIG_QUOTA
diff --git a/fs/romfs/super.c b/fs/romfs/super.c
index d841878..ef90e8b 100644
--- a/fs/romfs/super.c
+++ b/fs/romfs/super.c
@@ -432,6 +432,7 @@ static int romfs_statfs(struct dentry *dentry, struct kstatfs *buf)
*/
static int romfs_remount(struct super_block *sb, int *flags, char *data)
{
+ sync_filesystem(sb);
*flags |= MS_RDONLY;
return 0;
}
diff --git a/fs/squashfs/super.c b/fs/squashfs/super.c
index 202df63..031c8d67 100644
--- a/fs/squashfs/super.c
+++ b/fs/squashfs/super.c
@@ -371,6 +371,7 @@ static int squashfs_statfs(struct dentry *dentry, struct kstatfs *buf)

static int squashfs_remount(struct super_block *sb, int *flags, char *data)
{
+ sync_filesystem(sb);
*flags |= MS_RDONLY;
return 0;
}
diff --git a/fs/super.c b/fs/super.c
index 80d5cf2..e9dc3c3 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -719,8 +719,6 @@ int do_remount_sb(struct super_block *sb, int flags, void *data, int force)
}
}

- sync_filesystem(sb);
-
if (sb->s_op->remount_fs) {
retval = sb->s_op->remount_fs(sb, &flags, data);
if (retval) {
diff --git a/fs/sysv/inode.c b/fs/sysv/inode.c
index c327d4e..4742e58 100644
--- a/fs/sysv/inode.c
+++ b/fs/sysv/inode.c
@@ -60,6 +60,7 @@ static int sysv_remount(struct super_block *sb, int *flags, char *data)
{
struct sysv_sb_info *sbi = SYSV_SB(sb);

+ sync_filesystem(sb);
if (sbi->s_forced_ro)
*flags |= MS_RDONLY;
return 0;
diff --git a/fs/ubifs/super.c b/fs/ubifs/super.c
index 5ded849..e1598ab 100644
--- a/fs/ubifs/super.c
+++ b/fs/ubifs/super.c
@@ -1827,6 +1827,7 @@ static int ubifs_remount_fs(struct super_block *sb, int *flags, char *data)
int err;
struct ubifs_info *c = sb->s_fs_info;

+ sync_filesystem(sb);
dbg_gen("old flags %#lx, new flags %#x", sb->s_flags, *flags);

err = ubifs_parse_options(c, data, 1);
diff --git a/fs/udf/super.c b/fs/udf/super.c
index 3306b9f..64f2b73 100644
--- a/fs/udf/super.c
+++ b/fs/udf/super.c
@@ -646,6 +646,7 @@ static int udf_remount_fs(struct super_block *sb, int *flags, char *options)
int error = 0;
struct logicalVolIntegrityDescImpUse *lvidiu = udf_sb_lvidiu(sb);

+ sync_filesystem(sb);
if (lvidiu) {
int write_rev = le16_to_cpu(lvidiu->minUDFWriteRev);
if (write_rev > UDF_MAX_WRITE_VERSION && !(*flags & MS_RDONLY))
diff --git a/fs/ufs/super.c b/fs/ufs/super.c
index 329f2f5..b8c6791 100644
--- a/fs/ufs/super.c
+++ b/fs/ufs/super.c
@@ -1280,6 +1280,7 @@ static int ufs_remount (struct super_block *sb, int *mount_flags, char *data)
unsigned new_mount_opt, ufstype;
unsigned flags;

+ sync_filesystem(sb);
lock_ufs(sb);
mutex_lock(&UFS_SB(sb)->s_lock);
uspi = UFS_SB(sb)->s_uspi;
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index f317488..aaa3eca 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1197,6 +1197,7 @@ xfs_fs_remount(
char *p;
int error;

+ sync_filesystem(sb);
while ((p = strsep(&options, ",")) != NULL) {
int token;

--
1.9.0


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

2014-03-13 16:23:19

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH] fs: push sync_filesystem() down to the file system's remount_fs()

On Thu 13-03-14 10:20:56, Ted Tso wrote:
> Previously, the no-op "mount -o mount /dev/xxx" operation when the
^^remount

> file system is already mounted read-write causes an implied,
> unconditional syncfs(). This seems pretty stupid, and it's certainly
> documented or guaraunteed to do this, nor is it particularly useful,
> except in the case where the file system was mounted rw and is getting
> remounted read-only.
>
> However, it's possible that there might be some file systems that are
> actually depending on this behavior. In most file systems, it's
> probably fine to only call sync_filesystem() when transitioning from
> read-write to read-only, and there are some file systems where this is
> not needed at all (for example, for a pseudo-filesystem or something
> like romfs).
Hum, I'd avoid this excercise at least for filesystem where
sync_filesystem() is obviously useless - proc, debugfs, pstore, devpts,
also always read-only filesystems such as isofs, qnx4, qnx6, befs, cramfs,
efs, freevxfs, romfs, squashfs. I think you can find a couple more which
clearly don't care about sync_filesystem() if you look a bit closer.

Honza
>
> Signed-off-by: "Theodore Ts'o" <[email protected]>
> Cc: [email protected]
> Cc: Christoph Hellwig <[email protected]>
> Cc: Artem Bityutskiy <[email protected]>
> Cc: Adrian Hunter <[email protected]>
> Cc: Evgeniy Dushistov <dushistov-JGs/[email protected]>
> Cc: Jan Kara <[email protected]>
> Cc: OGAWA Hirofumi <[email protected]>
> Cc: Anders Larsen <al-V9/YLgxv/[email protected]>
> Cc: Phillip Lougher <[email protected]>
> Cc: Kees Cook <[email protected]>
> Cc: Mikulas Patocka <mikulas-TTVWCEgN8Z9G4ohzP4jBZS1Fcj925eT/@public.gmane.org>
> Cc: Petr Vandrovec <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: samba-technical-w/[email protected]
> Cc: codalist-/[email protected]
> Cc: [email protected]
> Cc: linux-f2fs-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> ---
> fs/adfs/super.c | 1 +
> fs/affs/super.c | 1 +
> fs/befs/linuxvfs.c | 1 +
> fs/btrfs/super.c | 1 +
> fs/cifs/cifsfs.c | 1 +
> fs/coda/inode.c | 1 +
> fs/cramfs/inode.c | 1 +
> fs/debugfs/inode.c | 1 +
> fs/devpts/inode.c | 1 +
> fs/efs/super.c | 1 +
> fs/ext2/super.c | 1 +
> fs/ext3/super.c | 2 ++
> fs/ext4/super.c | 2 ++
> fs/f2fs/super.c | 2 ++
> fs/fat/inode.c | 2 ++
> fs/freevxfs/vxfs_super.c | 1 +
> fs/fuse/inode.c | 1 +
> fs/gfs2/super.c | 2 ++
> fs/hfs/super.c | 1 +
> fs/hfsplus/super.c | 1 +
> fs/hpfs/super.c | 2 ++
> fs/isofs/inode.c | 1 +
> fs/jffs2/super.c | 1 +
> fs/jfs/super.c | 1 +
> fs/minix/inode.c | 1 +
> fs/ncpfs/inode.c | 1 +
> fs/nfs/super.c | 2 ++
> fs/nilfs2/super.c | 1 +
> fs/ntfs/super.c | 2 ++
> fs/ocfs2/super.c | 2 ++
> fs/openpromfs/inode.c | 1 +
> fs/proc/root.c | 2 ++
> fs/pstore/inode.c | 1 +
> fs/qnx4/inode.c | 1 +
> fs/qnx6/inode.c | 1 +
> fs/reiserfs/super.c | 1 +
> fs/romfs/super.c | 1 +
> fs/squashfs/super.c | 1 +
> fs/super.c | 2 --
> fs/sysv/inode.c | 1 +
> fs/ubifs/super.c | 1 +
> fs/udf/super.c | 1 +
> fs/ufs/super.c | 1 +
> fs/xfs/xfs_super.c | 1 +
> 44 files changed, 53 insertions(+), 2 deletions(-)
>
> diff --git a/fs/adfs/super.c b/fs/adfs/super.c
> index 7b3003c..952aeb0 100644
> --- a/fs/adfs/super.c
> +++ b/fs/adfs/super.c
> @@ -212,6 +212,7 @@ static int parse_options(struct super_block *sb, char *options)
>
> static int adfs_remount(struct super_block *sb, int *flags, char *data)
> {
> + sync_filesystem(sb);
> *flags |= MS_NODIRATIME;
> return parse_options(sb, data);
> }
> diff --git a/fs/affs/super.c b/fs/affs/super.c
> index d098731..3074530 100644
> --- a/fs/affs/super.c
> +++ b/fs/affs/super.c
> @@ -530,6 +530,7 @@ affs_remount(struct super_block *sb, int *flags, char *data)
>
> pr_debug("AFFS: remount(flags=0x%x,opts=\"%s\")\n",*flags,data);
>
> + sync_filesystem(sb);
> *flags |= MS_NODIRATIME;
>
> memcpy(volume, sbi->s_volume, 32);
> diff --git a/fs/befs/linuxvfs.c b/fs/befs/linuxvfs.c
> index 845d2d6..56d70c8 100644
> --- a/fs/befs/linuxvfs.c
> +++ b/fs/befs/linuxvfs.c
> @@ -913,6 +913,7 @@ befs_fill_super(struct super_block *sb, void *data, int silent)
> static int
> befs_remount(struct super_block *sb, int *flags, char *data)
> {
> + sync_filesystem(sb);
> if (!(*flags & MS_RDONLY))
> return -EINVAL;
> return 0;
> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index 97cc241..00cd0c5 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -1381,6 +1381,7 @@ static int btrfs_remount(struct super_block *sb, int *flags, char *data)
> unsigned int old_metadata_ratio = fs_info->metadata_ratio;
> int ret;
>
> + sync_filesystem(sb);
> btrfs_remount_prepare(fs_info);
>
> ret = btrfs_parse_options(root, data);
> diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
> index 849f613..4942c94 100644
> --- a/fs/cifs/cifsfs.c
> +++ b/fs/cifs/cifsfs.c
> @@ -541,6 +541,7 @@ static int cifs_show_stats(struct seq_file *s, struct dentry *root)
>
> static int cifs_remount(struct super_block *sb, int *flags, char *data)
> {
> + sync_filesystem(sb);
> *flags |= MS_NODIRATIME;
> return 0;
> }
> diff --git a/fs/coda/inode.c b/fs/coda/inode.c
> index 506de34..3f48000 100644
> --- a/fs/coda/inode.c
> +++ b/fs/coda/inode.c
> @@ -96,6 +96,7 @@ void coda_destroy_inodecache(void)
>
> static int coda_remount(struct super_block *sb, int *flags, char *data)
> {
> + sync_filesystem(sb);
> *flags |= MS_NOATIME;
> return 0;
> }
> diff --git a/fs/cramfs/inode.c b/fs/cramfs/inode.c
> index 06610cf..a275911 100644
> --- a/fs/cramfs/inode.c
> +++ b/fs/cramfs/inode.c
> @@ -244,6 +244,7 @@ static void cramfs_kill_sb(struct super_block *sb)
>
> static int cramfs_remount(struct super_block *sb, int *flags, char *data)
> {
> + sync_filesystem(sb);
> *flags |= MS_RDONLY;
> return 0;
> }
> diff --git a/fs/debugfs/inode.c b/fs/debugfs/inode.c
> index 9c0444c..02928a9 100644
> --- a/fs/debugfs/inode.c
> +++ b/fs/debugfs/inode.c
> @@ -218,6 +218,7 @@ static int debugfs_remount(struct super_block *sb, int *flags, char *data)
> int err;
> struct debugfs_fs_info *fsi = sb->s_fs_info;
>
> + sync_filesystem(sb);
> err = debugfs_parse_options(data, &fsi->mount_opts);
> if (err)
> goto fail;
> diff --git a/fs/devpts/inode.c b/fs/devpts/inode.c
> index a726b9f..c710380 100644
> --- a/fs/devpts/inode.c
> +++ b/fs/devpts/inode.c
> @@ -313,6 +313,7 @@ static int devpts_remount(struct super_block *sb, int *flags, char *data)
> struct pts_fs_info *fsi = DEVPTS_SB(sb);
> struct pts_mount_opts *opts = &fsi->mount_opts;
>
> + sync_filesystem(sb);
> err = parse_mount_options(data, PARSE_REMOUNT, opts);
>
> /*
> diff --git a/fs/efs/super.c b/fs/efs/super.c
> index 50215bb..103bbd8 100644
> --- a/fs/efs/super.c
> +++ b/fs/efs/super.c
> @@ -114,6 +114,7 @@ static void destroy_inodecache(void)
>
> static int efs_remount(struct super_block *sb, int *flags, char *data)
> {
> + sync_filesystem(sb);
> *flags |= MS_RDONLY;
> return 0;
> }
> diff --git a/fs/ext2/super.c b/fs/ext2/super.c
> index 20d6697..d260115 100644
> --- a/fs/ext2/super.c
> +++ b/fs/ext2/super.c
> @@ -1254,6 +1254,7 @@ static int ext2_remount (struct super_block * sb, int * flags, char * data)
> unsigned long old_sb_flags;
> int err;
>
> + sync_filesystem(sb);
> spin_lock(&sbi->s_lock);
>
> /* Store the old options */
> diff --git a/fs/ext3/super.c b/fs/ext3/super.c
> index 37fd31e..95c6c5a 100644
> --- a/fs/ext3/super.c
> +++ b/fs/ext3/super.c
> @@ -2649,6 +2649,8 @@ static int ext3_remount (struct super_block * sb, int * flags, char * data)
> int i;
> #endif
>
> + sync_filesystem(sb);
> +
> /* Store the original options */
> old_sb_flags = sb->s_flags;
> old_opts.s_mount_opt = sbi->s_mount_opt;
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index f5c13b8..aa3842f 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -4767,6 +4767,8 @@ static int ext4_remount(struct super_block *sb, int *flags, char *data)
> #endif
> char *orig_data = kstrdup(data, GFP_KERNEL);
>
> + sync_filesystem(sb);
> +
> /* Store the original options */
> old_sb_flags = sb->s_flags;
> old_opts.s_mount_opt = sbi->s_mount_opt;
> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> index 1a85f83..856bdf9 100644
> --- a/fs/f2fs/super.c
> +++ b/fs/f2fs/super.c
> @@ -568,6 +568,8 @@ static int f2fs_remount(struct super_block *sb, int *flags, char *data)
> struct f2fs_mount_info org_mount_opt;
> int err, active_logs;
>
> + sync_filesystem(sb);
> +
> /*
> * Save the old mount options in case we
> * need to restore them.
> diff --git a/fs/fat/inode.c b/fs/fat/inode.c
> index 854b578..343e477 100644
> --- a/fs/fat/inode.c
> +++ b/fs/fat/inode.c
> @@ -635,6 +635,8 @@ static int fat_remount(struct super_block *sb, int *flags, char *data)
> struct msdos_sb_info *sbi = MSDOS_SB(sb);
> *flags |= MS_NODIRATIME | (sbi->options.isvfat ? 0 : MS_NOATIME);
>
> + sync_filesystem(sb);
> +
> /* make sure we update state on remount. */
> new_rdonly = *flags & MS_RDONLY;
> if (new_rdonly != (sb->s_flags & MS_RDONLY)) {
> diff --git a/fs/freevxfs/vxfs_super.c b/fs/freevxfs/vxfs_super.c
> index e37eb27..7ca8c75 100644
> --- a/fs/freevxfs/vxfs_super.c
> +++ b/fs/freevxfs/vxfs_super.c
> @@ -124,6 +124,7 @@ vxfs_statfs(struct dentry *dentry, struct kstatfs *bufp)
>
> static int vxfs_remount(struct super_block *sb, int *flags, char *data)
> {
> + sync_filesystem(sb);
> *flags |= MS_RDONLY;
> return 0;
> }
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index d468643..ecdb255d 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -135,6 +135,7 @@ static void fuse_evict_inode(struct inode *inode)
>
> static int fuse_remount_fs(struct super_block *sb, int *flags, char *data)
> {
> + sync_filesystem(sb);
> if (*flags & MS_MANDLOCK)
> return -EINVAL;
>
> diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
> index 60f60f6..4c6dd50 100644
> --- a/fs/gfs2/super.c
> +++ b/fs/gfs2/super.c
> @@ -1175,6 +1175,8 @@ static int gfs2_remount_fs(struct super_block *sb, int *flags, char *data)
> struct gfs2_tune *gt = &sdp->sd_tune;
> int error;
>
> + sync_filesystem(sb);
> +
> spin_lock(&gt->gt_spin);
> args.ar_commit = gt->gt_logd_secs;
> args.ar_quota_quantum = gt->gt_quota_quantum;
> diff --git a/fs/hfs/super.c b/fs/hfs/super.c
> index 2d2039e..eee7206 100644
> --- a/fs/hfs/super.c
> +++ b/fs/hfs/super.c
> @@ -112,6 +112,7 @@ static int hfs_statfs(struct dentry *dentry, struct kstatfs *buf)
>
> static int hfs_remount(struct super_block *sb, int *flags, char *data)
> {
> + sync_filesystem(sb);
> *flags |= MS_NODIRATIME;
> if ((*flags & MS_RDONLY) == (sb->s_flags & MS_RDONLY))
> return 0;
> diff --git a/fs/hfsplus/super.c b/fs/hfsplus/super.c
> index 80875aa..8eb787b 100644
> --- a/fs/hfsplus/super.c
> +++ b/fs/hfsplus/super.c
> @@ -323,6 +323,7 @@ static int hfsplus_statfs(struct dentry *dentry, struct kstatfs *buf)
>
> static int hfsplus_remount(struct super_block *sb, int *flags, char *data)
> {
> + sync_filesystem(sb);
> if ((*flags & MS_RDONLY) == (sb->s_flags & MS_RDONLY))
> return 0;
> if (!(*flags & MS_RDONLY)) {
> diff --git a/fs/hpfs/super.c b/fs/hpfs/super.c
> index 4534ff6..fe3463a 100644
> --- a/fs/hpfs/super.c
> +++ b/fs/hpfs/super.c
> @@ -421,6 +421,8 @@ static int hpfs_remount_fs(struct super_block *s, int *flags, char *data)
> struct hpfs_sb_info *sbi = hpfs_sb(s);
> char *new_opts = kstrdup(data, GFP_KERNEL);
>
> + sync_filesystem(s);
> +
> *flags |= MS_NOATIME;
>
> hpfs_lock(s);
> diff --git a/fs/isofs/inode.c b/fs/isofs/inode.c
> index 4a9e10e..6af66ee 100644
> --- a/fs/isofs/inode.c
> +++ b/fs/isofs/inode.c
> @@ -117,6 +117,7 @@ static void destroy_inodecache(void)
>
> static int isofs_remount(struct super_block *sb, int *flags, char *data)
> {
> + sync_filesystem(sb);
> if (!(*flags & MS_RDONLY))
> return -EROFS;
> return 0;
> diff --git a/fs/jffs2/super.c b/fs/jffs2/super.c
> index 0defb1c..0918f0e 100644
> --- a/fs/jffs2/super.c
> +++ b/fs/jffs2/super.c
> @@ -243,6 +243,7 @@ static int jffs2_remount_fs(struct super_block *sb, int *flags, char *data)
> struct jffs2_sb_info *c = JFFS2_SB_INFO(sb);
> int err;
>
> + sync_filesystem(sb);
> err = jffs2_parse_options(c, data);
> if (err)
> return -EINVAL;
> diff --git a/fs/jfs/super.c b/fs/jfs/super.c
> index e2b7483..97f7fda 100644
> --- a/fs/jfs/super.c
> +++ b/fs/jfs/super.c
> @@ -418,6 +418,7 @@ static int jfs_remount(struct super_block *sb, int *flags, char *data)
> int flag = JFS_SBI(sb)->flag;
> int ret;
>
> + sync_filesystem(sb);
> if (!parse_options(data, sb, &newLVSize, &flag)) {
> return -EINVAL;
> }
> diff --git a/fs/minix/inode.c b/fs/minix/inode.c
> index 0332109..dcdc298 100644
> --- a/fs/minix/inode.c
> +++ b/fs/minix/inode.c
> @@ -123,6 +123,7 @@ static int minix_remount (struct super_block * sb, int * flags, char * data)
> struct minix_sb_info * sbi = minix_sb(sb);
> struct minix_super_block * ms;
>
> + sync_filesystem(sb);
> ms = sbi->s_ms;
> if ((*flags & MS_RDONLY) == (sb->s_flags & MS_RDONLY))
> return 0;
> diff --git a/fs/ncpfs/inode.c b/fs/ncpfs/inode.c
> index 2cf2ebe..5f86e80 100644
> --- a/fs/ncpfs/inode.c
> +++ b/fs/ncpfs/inode.c
> @@ -99,6 +99,7 @@ static void destroy_inodecache(void)
>
> static int ncp_remount(struct super_block *sb, int *flags, char* data)
> {
> + sync_filesystem(sb);
> *flags |= MS_NODIRATIME;
> return 0;
> }
> diff --git a/fs/nfs/super.c b/fs/nfs/super.c
> index 910ed90..2cb5694 100644
> --- a/fs/nfs/super.c
> +++ b/fs/nfs/super.c
> @@ -2215,6 +2215,8 @@ nfs_remount(struct super_block *sb, int *flags, char *raw_data)
> struct nfs4_mount_data *options4 = (struct nfs4_mount_data *)raw_data;
> u32 nfsvers = nfss->nfs_client->rpc_ops->version;
>
> + sync_filesystem(sb);
> +
> /*
> * Userspace mount programs that send binary options generally send
> * them populated with default values. We have no way to know which
> diff --git a/fs/nilfs2/super.c b/fs/nilfs2/super.c
> index 7ac2a12..8c532b2 100644
> --- a/fs/nilfs2/super.c
> +++ b/fs/nilfs2/super.c
> @@ -1129,6 +1129,7 @@ static int nilfs_remount(struct super_block *sb, int *flags, char *data)
> unsigned long old_mount_opt;
> int err;
>
> + sync_filesystem(sb);
> old_sb_flags = sb->s_flags;
> old_mount_opt = nilfs->ns_mount_opt;
>
> diff --git a/fs/ntfs/super.c b/fs/ntfs/super.c
> index 82650d5..bd5610d 100644
> --- a/fs/ntfs/super.c
> +++ b/fs/ntfs/super.c
> @@ -468,6 +468,8 @@ static int ntfs_remount(struct super_block *sb, int *flags, char *opt)
>
> ntfs_debug("Entering with remount options string: %s", opt);
>
> + sync_filesystem(sb);
> +
> #ifndef NTFS_RW
> /* For read-only compiled driver, enforce read-only flag. */
> *flags |= MS_RDONLY;
> diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
> index 49d84f8..5f9bf8f 100644
> --- a/fs/ocfs2/super.c
> +++ b/fs/ocfs2/super.c
> @@ -631,6 +631,8 @@ static int ocfs2_remount(struct super_block *sb, int *flags, char *data)
> struct ocfs2_super *osb = OCFS2_SB(sb);
> u32 tmp;
>
> + sync_filesystem(sb);
> +
> if (!ocfs2_parse_options(sb, data, &parsed_options, 1) ||
> !ocfs2_check_set_options(sb, &parsed_options)) {
> ret = -EINVAL;
> diff --git a/fs/openpromfs/inode.c b/fs/openpromfs/inode.c
> index 8c0ceb8..15e4500 100644
> --- a/fs/openpromfs/inode.c
> +++ b/fs/openpromfs/inode.c
> @@ -368,6 +368,7 @@ static struct inode *openprom_iget(struct super_block *sb, ino_t ino)
>
> static int openprom_remount(struct super_block *sb, int *flags, char *data)
> {
> + sync_filesystem(sb);
> *flags |= MS_NOATIME;
> return 0;
> }
> diff --git a/fs/proc/root.c b/fs/proc/root.c
> index 87dbcbe..ac823a8 100644
> --- a/fs/proc/root.c
> +++ b/fs/proc/root.c
> @@ -92,6 +92,8 @@ static int proc_parse_options(char *options, struct pid_namespace *pid)
> int proc_remount(struct super_block *sb, int *flags, char *data)
> {
> struct pid_namespace *pid = sb->s_fs_info;
> +
> + sync_filesystem(sb);
> return !proc_parse_options(data, pid);
> }
>
> diff --git a/fs/pstore/inode.c b/fs/pstore/inode.c
> index 1282384..192297b 100644
> --- a/fs/pstore/inode.c
> +++ b/fs/pstore/inode.c
> @@ -249,6 +249,7 @@ static void parse_options(char *options)
>
> static int pstore_remount(struct super_block *sb, int *flags, char *data)
> {
> + sync_filesystem(sb);
> parse_options(data);
>
> return 0;
> diff --git a/fs/qnx4/inode.c b/fs/qnx4/inode.c
> index 8955881..c4bcb77 100644
> --- a/fs/qnx4/inode.c
> +++ b/fs/qnx4/inode.c
> @@ -44,6 +44,7 @@ static int qnx4_remount(struct super_block *sb, int *flags, char *data)
> {
> struct qnx4_sb_info *qs;
>
> + sync_filesystem(sb);
> qs = qnx4_sb(sb);
> qs->Version = QNX4_VERSION;
> *flags |= MS_RDONLY;
> diff --git a/fs/qnx6/inode.c b/fs/qnx6/inode.c
> index 8d941ed..65cdaab 100644
> --- a/fs/qnx6/inode.c
> +++ b/fs/qnx6/inode.c
> @@ -55,6 +55,7 @@ static int qnx6_show_options(struct seq_file *seq, struct dentry *root)
>
> static int qnx6_remount(struct super_block *sb, int *flags, char *data)
> {
> + sync_filesystem(sb);
> *flags |= MS_RDONLY;
> return 0;
> }
> diff --git a/fs/reiserfs/super.c b/fs/reiserfs/super.c
> index 2c80335..abf2b76 100644
> --- a/fs/reiserfs/super.c
> +++ b/fs/reiserfs/super.c
> @@ -1319,6 +1319,7 @@ static int reiserfs_remount(struct super_block *s, int *mount_flags, char *arg)
> int i;
> #endif
>
> + sync_filesystem(s);
> reiserfs_write_lock(s);
>
> #ifdef CONFIG_QUOTA
> diff --git a/fs/romfs/super.c b/fs/romfs/super.c
> index d841878..ef90e8b 100644
> --- a/fs/romfs/super.c
> +++ b/fs/romfs/super.c
> @@ -432,6 +432,7 @@ static int romfs_statfs(struct dentry *dentry, struct kstatfs *buf)
> */
> static int romfs_remount(struct super_block *sb, int *flags, char *data)
> {
> + sync_filesystem(sb);
> *flags |= MS_RDONLY;
> return 0;
> }
> diff --git a/fs/squashfs/super.c b/fs/squashfs/super.c
> index 202df63..031c8d67 100644
> --- a/fs/squashfs/super.c
> +++ b/fs/squashfs/super.c
> @@ -371,6 +371,7 @@ static int squashfs_statfs(struct dentry *dentry, struct kstatfs *buf)
>
> static int squashfs_remount(struct super_block *sb, int *flags, char *data)
> {
> + sync_filesystem(sb);
> *flags |= MS_RDONLY;
> return 0;
> }
> diff --git a/fs/super.c b/fs/super.c
> index 80d5cf2..e9dc3c3 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -719,8 +719,6 @@ int do_remount_sb(struct super_block *sb, int flags, void *data, int force)
> }
> }
>
> - sync_filesystem(sb);
> -
> if (sb->s_op->remount_fs) {
> retval = sb->s_op->remount_fs(sb, &flags, data);
> if (retval) {
> diff --git a/fs/sysv/inode.c b/fs/sysv/inode.c
> index c327d4e..4742e58 100644
> --- a/fs/sysv/inode.c
> +++ b/fs/sysv/inode.c
> @@ -60,6 +60,7 @@ static int sysv_remount(struct super_block *sb, int *flags, char *data)
> {
> struct sysv_sb_info *sbi = SYSV_SB(sb);
>
> + sync_filesystem(sb);
> if (sbi->s_forced_ro)
> *flags |= MS_RDONLY;
> return 0;
> diff --git a/fs/ubifs/super.c b/fs/ubifs/super.c
> index 5ded849..e1598ab 100644
> --- a/fs/ubifs/super.c
> +++ b/fs/ubifs/super.c
> @@ -1827,6 +1827,7 @@ static int ubifs_remount_fs(struct super_block *sb, int *flags, char *data)
> int err;
> struct ubifs_info *c = sb->s_fs_info;
>
> + sync_filesystem(sb);
> dbg_gen("old flags %#lx, new flags %#x", sb->s_flags, *flags);
>
> err = ubifs_parse_options(c, data, 1);
> diff --git a/fs/udf/super.c b/fs/udf/super.c
> index 3306b9f..64f2b73 100644
> --- a/fs/udf/super.c
> +++ b/fs/udf/super.c
> @@ -646,6 +646,7 @@ static int udf_remount_fs(struct super_block *sb, int *flags, char *options)
> int error = 0;
> struct logicalVolIntegrityDescImpUse *lvidiu = udf_sb_lvidiu(sb);
>
> + sync_filesystem(sb);
> if (lvidiu) {
> int write_rev = le16_to_cpu(lvidiu->minUDFWriteRev);
> if (write_rev > UDF_MAX_WRITE_VERSION && !(*flags & MS_RDONLY))
> diff --git a/fs/ufs/super.c b/fs/ufs/super.c
> index 329f2f5..b8c6791 100644
> --- a/fs/ufs/super.c
> +++ b/fs/ufs/super.c
> @@ -1280,6 +1280,7 @@ static int ufs_remount (struct super_block *sb, int *mount_flags, char *data)
> unsigned new_mount_opt, ufstype;
> unsigned flags;
>
> + sync_filesystem(sb);
> lock_ufs(sb);
> mutex_lock(&UFS_SB(sb)->s_lock);
> uspi = UFS_SB(sb)->s_uspi;
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index f317488..aaa3eca 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -1197,6 +1197,7 @@ xfs_fs_remount(
> char *p;
> int error;
>
> + sync_filesystem(sb);
> while ((p = strsep(&options, ",")) != NULL) {
> int token;
>
> --
> 1.9.0
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2014-03-13 16:28:23

by Steven Whitehouse

[permalink] [raw]
Subject: Re: [PATCH] fs: push sync_filesystem() down to the file system's remount_fs()

Hi,

On Thu, 2014-03-13 at 17:23 +0100, Jan Kara wrote:
> On Thu 13-03-14 10:20:56, Ted Tso wrote:
> > Previously, the no-op "mount -o mount /dev/xxx" operation when the
> ^^remount
>
> > file system is already mounted read-write causes an implied,
> > unconditional syncfs(). This seems pretty stupid, and it's certainly
> > documented or guaraunteed to do this, nor is it particularly useful,
> > except in the case where the file system was mounted rw and is getting
> > remounted read-only.
> >
> > However, it's possible that there might be some file systems that are
> > actually depending on this behavior. In most file systems, it's
> > probably fine to only call sync_filesystem() when transitioning from
> > read-write to read-only, and there are some file systems where this is
> > not needed at all (for example, for a pseudo-filesystem or something
> > like romfs).
> Hum, I'd avoid this excercise at least for filesystem where
> sync_filesystem() is obviously useless - proc, debugfs, pstore, devpts,
> also always read-only filesystems such as isofs, qnx4, qnx6, befs, cramfs,
> efs, freevxfs, romfs, squashfs. I think you can find a couple more which
> clearly don't care about sync_filesystem() if you look a bit closer.
>
>
> Honza

I guess the same is true for other file systems which are mounted ro
too. So maybe a check for MS_RDONLY before doing the sync in those
cases?

Steve.

2014-03-13 23:15:06

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [Cluster-devel] [PATCH] fs: push sync_filesystem() down to the file system's remount_fs()

On Thu, Mar 13, 2014 at 04:28:23PM +0000, Steven Whitehouse wrote:
>
> I guess the same is true for other file systems which are mounted ro
> too. So maybe a check for MS_RDONLY before doing the sync in those
> cases?

My original patch moved the sync_filesystem into the check for
MS_RDONLY in the core VFS code. The objection was raised that there
might be some file system out there that might depend on this
behaviour. I can't imagine why, but I suppose it's at least
theoretically possible.

So the idea is that this particular patch is *guaranteed* not to make
any difference. That way there can be no question about the patch'es
correctness.

I'm going to follow up with a patch for ext4 that does exactly that,
but the idea is to allow each file system maintainer to do that for
their own file system.

I could do that as well for file systems that are "obviously"
read-only, but then I'll find out that there's some wierd case where
the file system can be used in a read-write fashion. (Example: UDF is
normally used for DVD's, but at least in theory it can be used
read/write --- I'm told that Windows supports read-write UDF file
systems on USB sticks, and at least in theory it could be used as a
inter-OS exchange format in situations where VFAT and exFAT might not
be appropriate for various reasons.)

Cheers,

- Ted

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech

2014-03-14 00:33:02

by Steve French

[permalink] [raw]
Subject: Re: [PATCH] fs: push sync_filesystem() down to the file system's remount_fs()

On Thu, Mar 13, 2014 at 9:20 AM, Theodore Ts'o <[email protected]> wrote:
> Previously, the no-op "mount -o mount /dev/xxx" operation when the
> file system is already mounted read-write causes an implied,
> unconditional syncfs(). This seems pretty stupid, and it's certainly
> documented or guaraunteed to do this, nor is it particularly useful,
> except in the case where the file system was mounted rw and is getting
> remounted read-only.

Is there a case where a file system, not mounted read-only,
would want to skip the syncfs on remount? I don't know
of any particular reason to do a syncfs on remount unless
caching behavior is changing (or moving to read-only mount),
but if as you say it is documented and guaranteed...


> However, it's possible that there might be some file systems that are
> actually depending on this behavior. In most file systems, it's
> probably fine to only call sync_filesystem() when transitioning from
> read-write to read-only, and there are some file systems where this is
> not needed at all (for example, for a pseudo-filesystem or something
> like romfs).
>
> Signed-off-by: "Theodore Ts'o" <[email protected]>
> Cc: [email protected]
> Cc: Christoph Hellwig <[email protected]>
> Cc: Artem Bityutskiy <[email protected]>
> Cc: Adrian Hunter <[email protected]>
> Cc: Evgeniy Dushistov <[email protected]>
> Cc: Jan Kara <[email protected]>
> Cc: OGAWA Hirofumi <[email protected]>
> Cc: Anders Larsen <[email protected]>
> Cc: Phillip Lougher <[email protected]>
> Cc: Kees Cook <[email protected]>
> Cc: Mikulas Patocka <[email protected]>
> Cc: Petr Vandrovec <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> ---
> fs/adfs/super.c | 1 +
> fs/affs/super.c | 1 +
> fs/befs/linuxvfs.c | 1 +
> fs/btrfs/super.c | 1 +
> fs/cifs/cifsfs.c | 1 +
> fs/coda/inode.c | 1 +
> fs/cramfs/inode.c | 1 +
> fs/debugfs/inode.c | 1 +
> fs/devpts/inode.c | 1 +
> fs/efs/super.c | 1 +
> fs/ext2/super.c | 1 +
> fs/ext3/super.c | 2 ++
> fs/ext4/super.c | 2 ++
> fs/f2fs/super.c | 2 ++
> fs/fat/inode.c | 2 ++
> fs/freevxfs/vxfs_super.c | 1 +
> fs/fuse/inode.c | 1 +
> fs/gfs2/super.c | 2 ++
> fs/hfs/super.c | 1 +
> fs/hfsplus/super.c | 1 +
> fs/hpfs/super.c | 2 ++
> fs/isofs/inode.c | 1 +
> fs/jffs2/super.c | 1 +
> fs/jfs/super.c | 1 +
> fs/minix/inode.c | 1 +
> fs/ncpfs/inode.c | 1 +
> fs/nfs/super.c | 2 ++
> fs/nilfs2/super.c | 1 +
> fs/ntfs/super.c | 2 ++
> fs/ocfs2/super.c | 2 ++
> fs/openpromfs/inode.c | 1 +
> fs/proc/root.c | 2 ++
> fs/pstore/inode.c | 1 +
> fs/qnx4/inode.c | 1 +
> fs/qnx6/inode.c | 1 +
> fs/reiserfs/super.c | 1 +
> fs/romfs/super.c | 1 +
> fs/squashfs/super.c | 1 +
> fs/super.c | 2 --
> fs/sysv/inode.c | 1 +
> fs/ubifs/super.c | 1 +
> fs/udf/super.c | 1 +
> fs/ufs/super.c | 1 +
> fs/xfs/xfs_super.c | 1 +
> 44 files changed, 53 insertions(+), 2 deletions(-)
>
> diff --git a/fs/adfs/super.c b/fs/adfs/super.c
> index 7b3003c..952aeb0 100644
> --- a/fs/adfs/super.c
> +++ b/fs/adfs/super.c
> @@ -212,6 +212,7 @@ static int parse_options(struct super_block *sb, char *options)
>
> static int adfs_remount(struct super_block *sb, int *flags, char *data)
> {
> + sync_filesystem(sb);
> *flags |= MS_NODIRATIME;
> return parse_options(sb, data);
> }
> diff --git a/fs/affs/super.c b/fs/affs/super.c
> index d098731..3074530 100644
> --- a/fs/affs/super.c
> +++ b/fs/affs/super.c
> @@ -530,6 +530,7 @@ affs_remount(struct super_block *sb, int *flags, char *data)
>
> pr_debug("AFFS: remount(flags=0x%x,opts=\"%s\")\n",*flags,data);
>
> + sync_filesystem(sb);
> *flags |= MS_NODIRATIME;
>
> memcpy(volume, sbi->s_volume, 32);
> diff --git a/fs/befs/linuxvfs.c b/fs/befs/linuxvfs.c
> index 845d2d6..56d70c8 100644
> --- a/fs/befs/linuxvfs.c
> +++ b/fs/befs/linuxvfs.c
> @@ -913,6 +913,7 @@ befs_fill_super(struct super_block *sb, void *data, int silent)
> static int
> befs_remount(struct super_block *sb, int *flags, char *data)
> {
> + sync_filesystem(sb);
> if (!(*flags & MS_RDONLY))
> return -EINVAL;
> return 0;
> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index 97cc241..00cd0c5 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -1381,6 +1381,7 @@ static int btrfs_remount(struct super_block *sb, int *flags, char *data)
> unsigned int old_metadata_ratio = fs_info->metadata_ratio;
> int ret;
>
> + sync_filesystem(sb);
> btrfs_remount_prepare(fs_info);
>
> ret = btrfs_parse_options(root, data);
> diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
> index 849f613..4942c94 100644
> --- a/fs/cifs/cifsfs.c
> +++ b/fs/cifs/cifsfs.c
> @@ -541,6 +541,7 @@ static int cifs_show_stats(struct seq_file *s, struct dentry *root)
>
> static int cifs_remount(struct super_block *sb, int *flags, char *data)
> {
> + sync_filesystem(sb);
> *flags |= MS_NODIRATIME;
> return 0;
> }
> diff --git a/fs/coda/inode.c b/fs/coda/inode.c
> index 506de34..3f48000 100644
> --- a/fs/coda/inode.c
> +++ b/fs/coda/inode.c
> @@ -96,6 +96,7 @@ void coda_destroy_inodecache(void)
>
> static int coda_remount(struct super_block *sb, int *flags, char *data)
> {
> + sync_filesystem(sb);
> *flags |= MS_NOATIME;
> return 0;
> }
> diff --git a/fs/cramfs/inode.c b/fs/cramfs/inode.c
> index 06610cf..a275911 100644
> --- a/fs/cramfs/inode.c
> +++ b/fs/cramfs/inode.c
> @@ -244,6 +244,7 @@ static void cramfs_kill_sb(struct super_block *sb)
>
> static int cramfs_remount(struct super_block *sb, int *flags, char *data)
> {
> + sync_filesystem(sb);
> *flags |= MS_RDONLY;
> return 0;
> }
> diff --git a/fs/debugfs/inode.c b/fs/debugfs/inode.c
> index 9c0444c..02928a9 100644
> --- a/fs/debugfs/inode.c
> +++ b/fs/debugfs/inode.c
> @@ -218,6 +218,7 @@ static int debugfs_remount(struct super_block *sb, int *flags, char *data)
> int err;
> struct debugfs_fs_info *fsi = sb->s_fs_info;
>
> + sync_filesystem(sb);
> err = debugfs_parse_options(data, &fsi->mount_opts);
> if (err)
> goto fail;
> diff --git a/fs/devpts/inode.c b/fs/devpts/inode.c
> index a726b9f..c710380 100644
> --- a/fs/devpts/inode.c
> +++ b/fs/devpts/inode.c
> @@ -313,6 +313,7 @@ static int devpts_remount(struct super_block *sb, int *flags, char *data)
> struct pts_fs_info *fsi = DEVPTS_SB(sb);
> struct pts_mount_opts *opts = &fsi->mount_opts;
>
> + sync_filesystem(sb);
> err = parse_mount_options(data, PARSE_REMOUNT, opts);
>
> /*
> diff --git a/fs/efs/super.c b/fs/efs/super.c
> index 50215bb..103bbd8 100644
> --- a/fs/efs/super.c
> +++ b/fs/efs/super.c
> @@ -114,6 +114,7 @@ static void destroy_inodecache(void)
>
> static int efs_remount(struct super_block *sb, int *flags, char *data)
> {
> + sync_filesystem(sb);
> *flags |= MS_RDONLY;
> return 0;
> }
> diff --git a/fs/ext2/super.c b/fs/ext2/super.c
> index 20d6697..d260115 100644
> --- a/fs/ext2/super.c
> +++ b/fs/ext2/super.c
> @@ -1254,6 +1254,7 @@ static int ext2_remount (struct super_block * sb, int * flags, char * data)
> unsigned long old_sb_flags;
> int err;
>
> + sync_filesystem(sb);
> spin_lock(&sbi->s_lock);
>
> /* Store the old options */
> diff --git a/fs/ext3/super.c b/fs/ext3/super.c
> index 37fd31e..95c6c5a 100644
> --- a/fs/ext3/super.c
> +++ b/fs/ext3/super.c
> @@ -2649,6 +2649,8 @@ static int ext3_remount (struct super_block * sb, int * flags, char * data)
> int i;
> #endif
>
> + sync_filesystem(sb);
> +
> /* Store the original options */
> old_sb_flags = sb->s_flags;
> old_opts.s_mount_opt = sbi->s_mount_opt;
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index f5c13b8..aa3842f 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -4767,6 +4767,8 @@ static int ext4_remount(struct super_block *sb, int *flags, char *data)
> #endif
> char *orig_data = kstrdup(data, GFP_KERNEL);
>
> + sync_filesystem(sb);
> +
> /* Store the original options */
> old_sb_flags = sb->s_flags;
> old_opts.s_mount_opt = sbi->s_mount_opt;
> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> index 1a85f83..856bdf9 100644
> --- a/fs/f2fs/super.c
> +++ b/fs/f2fs/super.c
> @@ -568,6 +568,8 @@ static int f2fs_remount(struct super_block *sb, int *flags, char *data)
> struct f2fs_mount_info org_mount_opt;
> int err, active_logs;
>
> + sync_filesystem(sb);
> +
> /*
> * Save the old mount options in case we
> * need to restore them.
> diff --git a/fs/fat/inode.c b/fs/fat/inode.c
> index 854b578..343e477 100644
> --- a/fs/fat/inode.c
> +++ b/fs/fat/inode.c
> @@ -635,6 +635,8 @@ static int fat_remount(struct super_block *sb, int *flags, char *data)
> struct msdos_sb_info *sbi = MSDOS_SB(sb);
> *flags |= MS_NODIRATIME | (sbi->options.isvfat ? 0 : MS_NOATIME);
>
> + sync_filesystem(sb);
> +
> /* make sure we update state on remount. */
> new_rdonly = *flags & MS_RDONLY;
> if (new_rdonly != (sb->s_flags & MS_RDONLY)) {
> diff --git a/fs/freevxfs/vxfs_super.c b/fs/freevxfs/vxfs_super.c
> index e37eb27..7ca8c75 100644
> --- a/fs/freevxfs/vxfs_super.c
> +++ b/fs/freevxfs/vxfs_super.c
> @@ -124,6 +124,7 @@ vxfs_statfs(struct dentry *dentry, struct kstatfs *bufp)
>
> static int vxfs_remount(struct super_block *sb, int *flags, char *data)
> {
> + sync_filesystem(sb);
> *flags |= MS_RDONLY;
> return 0;
> }
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index d468643..ecdb255d 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -135,6 +135,7 @@ static void fuse_evict_inode(struct inode *inode)
>
> static int fuse_remount_fs(struct super_block *sb, int *flags, char *data)
> {
> + sync_filesystem(sb);
> if (*flags & MS_MANDLOCK)
> return -EINVAL;
>
> diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
> index 60f60f6..4c6dd50 100644
> --- a/fs/gfs2/super.c
> +++ b/fs/gfs2/super.c
> @@ -1175,6 +1175,8 @@ static int gfs2_remount_fs(struct super_block *sb, int *flags, char *data)
> struct gfs2_tune *gt = &sdp->sd_tune;
> int error;
>
> + sync_filesystem(sb);
> +
> spin_lock(&gt->gt_spin);
> args.ar_commit = gt->gt_logd_secs;
> args.ar_quota_quantum = gt->gt_quota_quantum;
> diff --git a/fs/hfs/super.c b/fs/hfs/super.c
> index 2d2039e..eee7206 100644
> --- a/fs/hfs/super.c
> +++ b/fs/hfs/super.c
> @@ -112,6 +112,7 @@ static int hfs_statfs(struct dentry *dentry, struct kstatfs *buf)
>
> static int hfs_remount(struct super_block *sb, int *flags, char *data)
> {
> + sync_filesystem(sb);
> *flags |= MS_NODIRATIME;
> if ((*flags & MS_RDONLY) == (sb->s_flags & MS_RDONLY))
> return 0;
> diff --git a/fs/hfsplus/super.c b/fs/hfsplus/super.c
> index 80875aa..8eb787b 100644
> --- a/fs/hfsplus/super.c
> +++ b/fs/hfsplus/super.c
> @@ -323,6 +323,7 @@ static int hfsplus_statfs(struct dentry *dentry, struct kstatfs *buf)
>
> static int hfsplus_remount(struct super_block *sb, int *flags, char *data)
> {
> + sync_filesystem(sb);
> if ((*flags & MS_RDONLY) == (sb->s_flags & MS_RDONLY))
> return 0;
> if (!(*flags & MS_RDONLY)) {
> diff --git a/fs/hpfs/super.c b/fs/hpfs/super.c
> index 4534ff6..fe3463a 100644
> --- a/fs/hpfs/super.c
> +++ b/fs/hpfs/super.c
> @@ -421,6 +421,8 @@ static int hpfs_remount_fs(struct super_block *s, int *flags, char *data)
> struct hpfs_sb_info *sbi = hpfs_sb(s);
> char *new_opts = kstrdup(data, GFP_KERNEL);
>
> + sync_filesystem(s);
> +
> *flags |= MS_NOATIME;
>
> hpfs_lock(s);
> diff --git a/fs/isofs/inode.c b/fs/isofs/inode.c
> index 4a9e10e..6af66ee 100644
> --- a/fs/isofs/inode.c
> +++ b/fs/isofs/inode.c
> @@ -117,6 +117,7 @@ static void destroy_inodecache(void)
>
> static int isofs_remount(struct super_block *sb, int *flags, char *data)
> {
> + sync_filesystem(sb);
> if (!(*flags & MS_RDONLY))
> return -EROFS;
> return 0;
> diff --git a/fs/jffs2/super.c b/fs/jffs2/super.c
> index 0defb1c..0918f0e 100644
> --- a/fs/jffs2/super.c
> +++ b/fs/jffs2/super.c
> @@ -243,6 +243,7 @@ static int jffs2_remount_fs(struct super_block *sb, int *flags, char *data)
> struct jffs2_sb_info *c = JFFS2_SB_INFO(sb);
> int err;
>
> + sync_filesystem(sb);
> err = jffs2_parse_options(c, data);
> if (err)
> return -EINVAL;
> diff --git a/fs/jfs/super.c b/fs/jfs/super.c
> index e2b7483..97f7fda 100644
> --- a/fs/jfs/super.c
> +++ b/fs/jfs/super.c
> @@ -418,6 +418,7 @@ static int jfs_remount(struct super_block *sb, int *flags, char *data)
> int flag = JFS_SBI(sb)->flag;
> int ret;
>
> + sync_filesystem(sb);
> if (!parse_options(data, sb, &newLVSize, &flag)) {
> return -EINVAL;
> }
> diff --git a/fs/minix/inode.c b/fs/minix/inode.c
> index 0332109..dcdc298 100644
> --- a/fs/minix/inode.c
> +++ b/fs/minix/inode.c
> @@ -123,6 +123,7 @@ static int minix_remount (struct super_block * sb, int * flags, char * data)
> struct minix_sb_info * sbi = minix_sb(sb);
> struct minix_super_block * ms;
>
> + sync_filesystem(sb);
> ms = sbi->s_ms;
> if ((*flags & MS_RDONLY) == (sb->s_flags & MS_RDONLY))
> return 0;
> diff --git a/fs/ncpfs/inode.c b/fs/ncpfs/inode.c
> index 2cf2ebe..5f86e80 100644
> --- a/fs/ncpfs/inode.c
> +++ b/fs/ncpfs/inode.c
> @@ -99,6 +99,7 @@ static void destroy_inodecache(void)
>
> static int ncp_remount(struct super_block *sb, int *flags, char* data)
> {
> + sync_filesystem(sb);
> *flags |= MS_NODIRATIME;
> return 0;
> }
> diff --git a/fs/nfs/super.c b/fs/nfs/super.c
> index 910ed90..2cb5694 100644
> --- a/fs/nfs/super.c
> +++ b/fs/nfs/super.c
> @@ -2215,6 +2215,8 @@ nfs_remount(struct super_block *sb, int *flags, char *raw_data)
> struct nfs4_mount_data *options4 = (struct nfs4_mount_data *)raw_data;
> u32 nfsvers = nfss->nfs_client->rpc_ops->version;
>
> + sync_filesystem(sb);
> +
> /*
> * Userspace mount programs that send binary options generally send
> * them populated with default values. We have no way to know which
> diff --git a/fs/nilfs2/super.c b/fs/nilfs2/super.c
> index 7ac2a12..8c532b2 100644
> --- a/fs/nilfs2/super.c
> +++ b/fs/nilfs2/super.c
> @@ -1129,6 +1129,7 @@ static int nilfs_remount(struct super_block *sb, int *flags, char *data)
> unsigned long old_mount_opt;
> int err;
>
> + sync_filesystem(sb);
> old_sb_flags = sb->s_flags;
> old_mount_opt = nilfs->ns_mount_opt;
>
> diff --git a/fs/ntfs/super.c b/fs/ntfs/super.c
> index 82650d5..bd5610d 100644
> --- a/fs/ntfs/super.c
> +++ b/fs/ntfs/super.c
> @@ -468,6 +468,8 @@ static int ntfs_remount(struct super_block *sb, int *flags, char *opt)
>
> ntfs_debug("Entering with remount options string: %s", opt);
>
> + sync_filesystem(sb);
> +
> #ifndef NTFS_RW
> /* For read-only compiled driver, enforce read-only flag. */
> *flags |= MS_RDONLY;
> diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
> index 49d84f8..5f9bf8f 100644
> --- a/fs/ocfs2/super.c
> +++ b/fs/ocfs2/super.c
> @@ -631,6 +631,8 @@ static int ocfs2_remount(struct super_block *sb, int *flags, char *data)
> struct ocfs2_super *osb = OCFS2_SB(sb);
> u32 tmp;
>
> + sync_filesystem(sb);
> +
> if (!ocfs2_parse_options(sb, data, &parsed_options, 1) ||
> !ocfs2_check_set_options(sb, &parsed_options)) {
> ret = -EINVAL;
> diff --git a/fs/openpromfs/inode.c b/fs/openpromfs/inode.c
> index 8c0ceb8..15e4500 100644
> --- a/fs/openpromfs/inode.c
> +++ b/fs/openpromfs/inode.c
> @@ -368,6 +368,7 @@ static struct inode *openprom_iget(struct super_block *sb, ino_t ino)
>
> static int openprom_remount(struct super_block *sb, int *flags, char *data)
> {
> + sync_filesystem(sb);
> *flags |= MS_NOATIME;
> return 0;
> }
> diff --git a/fs/proc/root.c b/fs/proc/root.c
> index 87dbcbe..ac823a8 100644
> --- a/fs/proc/root.c
> +++ b/fs/proc/root.c
> @@ -92,6 +92,8 @@ static int proc_parse_options(char *options, struct pid_namespace *pid)
> int proc_remount(struct super_block *sb, int *flags, char *data)
> {
> struct pid_namespace *pid = sb->s_fs_info;
> +
> + sync_filesystem(sb);
> return !proc_parse_options(data, pid);
> }
>
> diff --git a/fs/pstore/inode.c b/fs/pstore/inode.c
> index 1282384..192297b 100644
> --- a/fs/pstore/inode.c
> +++ b/fs/pstore/inode.c
> @@ -249,6 +249,7 @@ static void parse_options(char *options)
>
> static int pstore_remount(struct super_block *sb, int *flags, char *data)
> {
> + sync_filesystem(sb);
> parse_options(data);
>
> return 0;
> diff --git a/fs/qnx4/inode.c b/fs/qnx4/inode.c
> index 8955881..c4bcb77 100644
> --- a/fs/qnx4/inode.c
> +++ b/fs/qnx4/inode.c
> @@ -44,6 +44,7 @@ static int qnx4_remount(struct super_block *sb, int *flags, char *data)
> {
> struct qnx4_sb_info *qs;
>
> + sync_filesystem(sb);
> qs = qnx4_sb(sb);
> qs->Version = QNX4_VERSION;
> *flags |= MS_RDONLY;
> diff --git a/fs/qnx6/inode.c b/fs/qnx6/inode.c
> index 8d941ed..65cdaab 100644
> --- a/fs/qnx6/inode.c
> +++ b/fs/qnx6/inode.c
> @@ -55,6 +55,7 @@ static int qnx6_show_options(struct seq_file *seq, struct dentry *root)
>
> static int qnx6_remount(struct super_block *sb, int *flags, char *data)
> {
> + sync_filesystem(sb);
> *flags |= MS_RDONLY;
> return 0;
> }
> diff --git a/fs/reiserfs/super.c b/fs/reiserfs/super.c
> index 2c80335..abf2b76 100644
> --- a/fs/reiserfs/super.c
> +++ b/fs/reiserfs/super.c
> @@ -1319,6 +1319,7 @@ static int reiserfs_remount(struct super_block *s, int *mount_flags, char *arg)
> int i;
> #endif
>
> + sync_filesystem(s);
> reiserfs_write_lock(s);
>
> #ifdef CONFIG_QUOTA
> diff --git a/fs/romfs/super.c b/fs/romfs/super.c
> index d841878..ef90e8b 100644
> --- a/fs/romfs/super.c
> +++ b/fs/romfs/super.c
> @@ -432,6 +432,7 @@ static int romfs_statfs(struct dentry *dentry, struct kstatfs *buf)
> */
> static int romfs_remount(struct super_block *sb, int *flags, char *data)
> {
> + sync_filesystem(sb);
> *flags |= MS_RDONLY;
> return 0;
> }
> diff --git a/fs/squashfs/super.c b/fs/squashfs/super.c
> index 202df63..031c8d67 100644
> --- a/fs/squashfs/super.c
> +++ b/fs/squashfs/super.c
> @@ -371,6 +371,7 @@ static int squashfs_statfs(struct dentry *dentry, struct kstatfs *buf)
>
> static int squashfs_remount(struct super_block *sb, int *flags, char *data)
> {
> + sync_filesystem(sb);
> *flags |= MS_RDONLY;
> return 0;
> }
> diff --git a/fs/super.c b/fs/super.c
> index 80d5cf2..e9dc3c3 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -719,8 +719,6 @@ int do_remount_sb(struct super_block *sb, int flags, void *data, int force)
> }
> }
>
> - sync_filesystem(sb);
> -
> if (sb->s_op->remount_fs) {
> retval = sb->s_op->remount_fs(sb, &flags, data);
> if (retval) {
> diff --git a/fs/sysv/inode.c b/fs/sysv/inode.c
> index c327d4e..4742e58 100644
> --- a/fs/sysv/inode.c
> +++ b/fs/sysv/inode.c
> @@ -60,6 +60,7 @@ static int sysv_remount(struct super_block *sb, int *flags, char *data)
> {
> struct sysv_sb_info *sbi = SYSV_SB(sb);
>
> + sync_filesystem(sb);
> if (sbi->s_forced_ro)
> *flags |= MS_RDONLY;
> return 0;
> diff --git a/fs/ubifs/super.c b/fs/ubifs/super.c
> index 5ded849..e1598ab 100644
> --- a/fs/ubifs/super.c
> +++ b/fs/ubifs/super.c
> @@ -1827,6 +1827,7 @@ static int ubifs_remount_fs(struct super_block *sb, int *flags, char *data)
> int err;
> struct ubifs_info *c = sb->s_fs_info;
>
> + sync_filesystem(sb);
> dbg_gen("old flags %#lx, new flags %#x", sb->s_flags, *flags);
>
> err = ubifs_parse_options(c, data, 1);
> diff --git a/fs/udf/super.c b/fs/udf/super.c
> index 3306b9f..64f2b73 100644
> --- a/fs/udf/super.c
> +++ b/fs/udf/super.c
> @@ -646,6 +646,7 @@ static int udf_remount_fs(struct super_block *sb, int *flags, char *options)
> int error = 0;
> struct logicalVolIntegrityDescImpUse *lvidiu = udf_sb_lvidiu(sb);
>
> + sync_filesystem(sb);
> if (lvidiu) {
> int write_rev = le16_to_cpu(lvidiu->minUDFWriteRev);
> if (write_rev > UDF_MAX_WRITE_VERSION && !(*flags & MS_RDONLY))
> diff --git a/fs/ufs/super.c b/fs/ufs/super.c
> index 329f2f5..b8c6791 100644
> --- a/fs/ufs/super.c
> +++ b/fs/ufs/super.c
> @@ -1280,6 +1280,7 @@ static int ufs_remount (struct super_block *sb, int *mount_flags, char *data)
> unsigned new_mount_opt, ufstype;
> unsigned flags;
>
> + sync_filesystem(sb);
> lock_ufs(sb);
> mutex_lock(&UFS_SB(sb)->s_lock);
> uspi = UFS_SB(sb)->s_uspi;
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index f317488..aaa3eca 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -1197,6 +1197,7 @@ xfs_fs_remount(
> char *p;
> int error;
>
> + sync_filesystem(sb);
> while ((p = strsep(&options, ",")) != NULL) {
> int token;
>
> --
> 1.9.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html



--
Thanks,

Steve

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

2014-03-14 01:23:00

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH] fs: push sync_filesystem() down to the file system's remount_fs()

On Thu, Mar 13, 2014 at 07:33:02PM -0500, Steve French wrote:
> On Thu, Mar 13, 2014 at 9:20 AM, Theodore Ts'o <[email protected]> wrote:
> > Previously, the no-op "mount -o mount /dev/xxx" operation when the
> > file system is already mounted read-write causes an implied,
> > unconditional syncfs(). This seems pretty stupid, and it's certainly
> > documented or guaraunteed to do this, nor is it particularly useful,
> > except in the case where the file system was mounted rw and is getting
> > remounted read-only.
>
> Is there a case where a file system, not mounted read-only,
> would want to skip the syncfs on remount? I don't know
> of any particular reason to do a syncfs on remount unless
> caching behavior is changing (or moving to read-only mount),
> but if as you say it is documented and guaranteed...

If the file system is mounted read-write, and it is transitioning to
read-only, i.e. "mount -o remount,ro /" then you do want to write out
all dirty data before you transition it to be read-only (otherwise you
would lose data).

It is my belief that this is the _only_ data integrity issue which is
implied by remount (and this is more about not losing work done by
previous system calls).

The background reason for this commit is that a user complained on the
ext4 list that he is using containers in a virtualization environment,
and due to the init scripts which the user doesn't want to change, it
is causing gazillions of no-op remounts, and this is causing ext4 (and
xfs) to do send CACHE FLUSH commands because it is required by the
syncfs(2) system call, which also calls sync_filesystem. These CACHE
FLUSH commands are unneeded for anything, especially for these no-op
remounts, and so I want them to go away for remounts, but they should
still be there in response to syncfs(2) requests.

Cheers,

- Ted

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/