Hi Ted, Andreas,
Do you think this mount option "fsync_mode=nobarrier"
can be added for EXT4 as well like in F2FS? Please
share your thoughts on this.
https://lore.kernel.org/patchwork/patch/908934/
Thanks,
Sahitya.
--
--
Sent by a consultant of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.
On Tue, May 28, 2019 at 08:52:57AM +0530, Sahitya Tummala wrote:
> Hi Ted, Andreas,
>
> Do you think this mount option "fsync_mode=nobarrier"
> can be added for EXT4 as well like in F2FS? Please
> share your thoughts on this.
>
> https://lore.kernel.org/patchwork/patch/908934/
Ext4 already has the nobarrier mount option.
Cheers,
- Ted
Hi Ted,
On Mon, May 27, 2019 at 11:40:07PM -0400, Theodore Ts'o wrote:
> On Tue, May 28, 2019 at 08:52:57AM +0530, Sahitya Tummala wrote:
> > Hi Ted, Andreas,
> >
> > Do you think this mount option "fsync_mode=nobarrier"
> > can be added for EXT4 as well like in F2FS? Please
> > share your thoughts on this.
> >
> > https://lore.kernel.org/patchwork/patch/908934/
>
> Ext4 already has the nobarrier mount option.
>
Yes, but fsync_mode=nobarrier is little different than
a general nobarrier option. The fsync_mode=nobarrier is
only controlling the flush policy for fsync() path, unlike
the nobarrier mount option which is applicable at all
places in the filesystem.
Thanks,
> Cheers,
>
> - Ted
--
--
Sent by a consultant of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.
On Tue, May 28, 2019 at 09:18:30AM +0530, Sahitya Tummala wrote:
>
> Yes, but fsync_mode=nobarrier is little different than
> a general nobarrier option. The fsync_mode=nobarrier is
> only controlling the flush policy for fsync() path, unlike
> the nobarrier mount option which is applicable at all
> places in the filesystem.
What are you really trying to accomplish with fsync_mode=nobarrier?
And when does that distinction have a difference?
What sort of guarantees are you trying to offer, given a particular
hardware and software design?
I gather that fsync_mode=nobarrier means one of two things:
* "screw you, application writer; your data consistency means nothing to me",
OR
* "we have sufficient guarantees --- e.g., UPS/battery protection to
guarantee that even if we lose AC mains, or the user press and holds
the power button for eight seconds, we will give storage devices a
sufficient grace period to write everything to persistent storage. We
also have the appropriate hardware to warn of an impending low-battery
shutdown and software to perform a graceful shutdown in that eventuality."
If it's the latter, then nobarrier works just as well --- even better.
If it's the former, *why* is it considered a good thing to ignore the
requests of userspace? And without any hardware assurances to provide
a backstop against power drop, do you care or not care about file
system consistency?
Why do you want the distinction between fsymc_mode=nobarrier and
nobarrier? When would this distinction be considered a good thing?
- Ted
Hi Ted,
On Tue, May 28, 2019 at 09:13:56AM -0400, Theodore Ts'o wrote:
> On Tue, May 28, 2019 at 09:18:30AM +0530, Sahitya Tummala wrote:
> >
> > Yes, but fsync_mode=nobarrier is little different than
> > a general nobarrier option. The fsync_mode=nobarrier is
> > only controlling the flush policy for fsync() path, unlike
> > the nobarrier mount option which is applicable at all
> > places in the filesystem.
>
> What are you really trying to accomplish with fsync_mode=nobarrier?
> And when does that distinction have a difference?
>
Thanks for your time and reply on this.
Here is what I think on these mount options. Please correct me if my
understanding is wrong.
The nobarrier mount option poses risk even if there is a battery
protection against sudden power down, as it doesn't guarantee the ordering
of important data such as journal writes on the disk. On the storage
devices with internal cache, if the cache flush policy is out-of-order,
then the places where FS is trying to enforce barriers will be at risk,
causing FS to be inconsistent.
But whereas with fsync_mode=nobarrier, FS is not trying to enforce
any ordering of data on the disk except to ensure the data is flushed
from the internal cache to non-volatile memory. Thus, I see this
fsync_mode=nobarrier is much better than a general nobarrier. And it
provides better performance too as with nobarrier but without
compromising much on FS consistency.
I do agree with all your points below on sudden power down scenarios,
but if someone wants to take a risk, then I think fsync_mode=nobarrier
may be better to enable based on their need/perf requirements.
Thanks,
> What sort of guarantees are you trying to offer, given a particular
> hardware and software design?
>
> I gather that fsync_mode=nobarrier means one of two things:
>
> * "screw you, application writer; your data consistency means nothing to me",
>
> OR
>
> * "we have sufficient guarantees --- e.g., UPS/battery protection to
> guarantee that even if we lose AC mains, or the user press and holds
> the power button for eight seconds, we will give storage devices a
> sufficient grace period to write everything to persistent storage. We
> also have the appropriate hardware to warn of an impending low-battery
> shutdown and software to perform a graceful shutdown in that eventuality."
>
> If it's the latter, then nobarrier works just as well --- even better.
>
> If it's the former, *why* is it considered a good thing to ignore the
> requests of userspace? And without any hardware assurances to provide
> a backstop against power drop, do you care or not care about file
> system consistency?
>
> Why do you want the distinction between fsymc_mode=nobarrier and
> nobarrier? When would this distinction be considered a good thing?
>
> - Ted
>
>
>
>
>
>
>
>
>
>
>
>
--
--
Sent by a consultant of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.
On Wed, May 29, 2019 at 09:37:58AM +0530, Sahitya Tummala wrote:
>
> Here is what I think on these mount options. Please correct me if my
> understanding is wrong.
>
> The nobarrier mount option poses risk even if there is a battery
> protection against sudden power down, as it doesn't guarantee the ordering
> of important data such as journal writes on the disk. On the storage
> devices with internal cache, if the cache flush policy is out-of-order,
> then the places where FS is trying to enforce barriers will be at risk,
> causing FS to be inconsistent.
If you have protection against sudden shutdown, then nobarrier is
perfectly safe --- which is to say, if it is guaranteed that any
writes sent to device will be persisted after a crash, then nobarrier
is perfectly safe. So for example, if you are using ext4 connected to
a million dollar EMC Storage Array, which has battery backup, using
nobarrier is perfectly safe.
That's because we still send writes to the device in an appropriate
order in nobarrier mode --- in particular, we send the journal updates
to the device in order. The cache flush policy on the HDD is
out-of-order, but so long as they all make it out to persistant store
in the end, it'll be fine.
> But whereas with fsync_mode=nobarrier, FS is not trying to enforce
> any ordering of data on the disk except to ensure the data is flushed
> from the internal cache to non-volatile memory. Thus, I see this
> fsync_mode=nobarrier is much better than a general nobarrier. And it
> provides better performance too as with nobarrier but without
> compromising much on FS consistency.
"without compomising much on FS consistency" doesn't have any meaning.
If you care about FS consistency, and you don't have power fail
protection, then at least for ext4, you *must* send a CACHE FLUSH
after any time that you modify any file system metadata block --- and
that's true for 99% of all fsync(2)'s.
I suppose you could do something where if there are times when no
metadata updates are necessary, but just data block writes, the CACHE
FLUSH could be suppressed. But (a) this won't actually provide much
performance improvements for the vast majority of workloads,
especially on an Android system, and (b) you're making a value
judgement that FS consistency is more important than application data
consistency.
You didn't answer my question directly --- exactly what is your goal
that you are trying to achieve, and what assumptions you are willing
to make? If you have power fail protection (this might require making
some adjustments to the EC), then you can use nobarrier and just not
worry about it.
If you don't have power fail protection, and you care about FS
consistency, then you pretty much have to leave the CACHE FLUSH
commands in.
If the problem is that some applications are fsync-happy, then I'd
suggest fixing the applications. Or if you really don't care about
the applications working correctly or users suffering application data
loss after a crash, you could hack in a mode, so that for non-root
users, or maybe certain specific users, fsync is turned into a no-op,
or a background, asynchronous (non-integrity) writeback.
Are you trying to hit some benchmark target? I'm really confused why
you would want to be so cavalier with application data safety.
- Ted
On Wed, May 29, 2019 at 01:23:32AM -0400, Theodore Ts'o wrote:
> If you have protection against sudden shutdown, then nobarrier is
> perfectly safe --- which is to say, if it is guaranteed that any
> writes sent to device will be persisted after a crash, then nobarrier
> is perfectly safe. So for example, if you are using ext4 connected to
> a million dollar EMC Storage Array, which has battery backup, using
> nobarrier is perfectly safe.
And while we had a few oddities in the past in general any such device
will obviously not claim to even have a volatile write cache, so
nobarrier or this broken proposed mount option won't actually make any
difference.
Hi Ted,
On Wed, May 29, 2019 at 01:23:32AM -0400, Theodore Ts'o wrote:
> On Wed, May 29, 2019 at 09:37:58AM +0530, Sahitya Tummala wrote:
> >
> > Here is what I think on these mount options. Please correct me if my
> > understanding is wrong.
> >
> > The nobarrier mount option poses risk even if there is a battery
> > protection against sudden power down, as it doesn't guarantee the ordering
> > of important data such as journal writes on the disk. On the storage
> > devices with internal cache, if the cache flush policy is out-of-order,
> > then the places where FS is trying to enforce barriers will be at risk,
> > causing FS to be inconsistent.
>
> If you have protection against sudden shutdown, then nobarrier is
> perfectly safe --- which is to say, if it is guaranteed that any
> writes sent to device will be persisted after a crash, then nobarrier
> is perfectly safe. So for example, if you are using ext4 connected to
> a million dollar EMC Storage Array, which has battery backup, using
> nobarrier is perfectly safe.
>
> That's because we still send writes to the device in an appropriate
> order in nobarrier mode --- in particular, we send the journal updates
> to the device in order. The cache flush policy on the HDD is
> out-of-order, but so long as they all make it out to persistant store
> in the end, it'll be fine.
>
Got it.
> > But whereas with fsync_mode=nobarrier, FS is not trying to enforce
> > any ordering of data on the disk except to ensure the data is flushed
> > from the internal cache to non-volatile memory. Thus, I see this
> > fsync_mode=nobarrier is much better than a general nobarrier. And it
> > provides better performance too as with nobarrier but without
> > compromising much on FS consistency.
>
> "without compomising much on FS consistency" doesn't have any meaning.
> If you care about FS consistency, and you don't have power fail
> protection, then at least for ext4, you *must* send a CACHE FLUSH
> after any time that you modify any file system metadata block --- and
> that's true for 99% of all fsync(2)'s.
>
> I suppose you could do something where if there are times when no
> metadata updates are necessary, but just data block writes, the CACHE
> FLUSH could be suppressed. But (a) this won't actually provide much
> performance improvements for the vast majority of workloads,
> especially on an Android system, and (b) you're making a value
> judgement that FS consistency is more important than application data
> consistency.
>
>
> You didn't answer my question directly --- exactly what is your goal
> that you are trying to achieve, and what assumptions you are willing
> to make? If you have power fail protection (this might require making
> some adjustments to the EC), then you can use nobarrier and just not
> worry about it.
>
> If you don't have power fail protection, and you care about FS
> consistency, then you pretty much have to leave the CACHE FLUSH
> commands in.
>
> If the problem is that some applications are fsync-happy, then I'd
> suggest fixing the applications. Or if you really don't care about
> the applications working correctly or users suffering application data
> loss after a crash, you could hack in a mode, so that for non-root
> users, or maybe certain specific users, fsync is turned into a no-op,
> or a background, asynchronous (non-integrity) writeback.
>
> Are you trying to hit some benchmark target? I'm really confused why
> you would want to be so cavalier with application data safety.
>
Yes, benchmarks for random write/fsync show huge improvement.
For ex, without issuing flush in the ext4 fsync() the
random write score improves from 13MB/s to 62MB/s on eMMC,
using Androbench.
And fsync_mode=nobarrier is enabled by default on pixel phones
where f2fs is used.
https://android.googlesource.com/device/google/crosshatch/+/e02e4813256e51bacdecb93ffd8340f6efbe68e0
We have been getting requests to evaluate the same for EXT4 and
hence, I was checking with the community on its feasibility.
Thanks,
Sahitya.
> - Ted
On Wed, May 29, 2019 at 04:18:09PM +0530, Sahitya Tummala wrote:
> Yes, benchmarks for random write/fsync show huge improvement.
> For ex, without issuing flush in the ext4 fsync() the
> random write score improves from 13MB/s to 62MB/s on eMMC,
> using Androbench.
>
> And fsync_mode=nobarrier is enabled by default on pixel phones
> where f2fs is used.
>
> https://android.googlesource.com/device/google/crosshatch/+/e02e4813256e51bacdecb93ffd8340f6efbe68e0
>
> We have been getting requests to evaluate the same for EXT4 and
> hence, I was checking with the community on its feasibility.
Have you run some tests to see how much power fail robustness was
impacted with f2fs's fsync_mode=nobarrier? Say, run fsstress on real
hardware then yank the power 100 times; how many times is the file
system corrupted? And is of those corruptions do they result in:
* Unrecoverable failures --- e.g., requires a factory reset losing all
user data? (Possibly because f2fs's fsck crashes or refuses to fix things?)
* Failures which corrupts the data, but can be fixed by fsck? (And in
how many cases with data loss?)
I'll note that for a long time in the early days of linux, we ran with
ext2 w/o a journal and without CACHE FLUSH, and it was very surprising
how often the corruption could be fixed with fsck (due to those very
early days, we did a lot of work to make e2fsck do as good of a job as
possible at not losing data, and if you run with -y, it will try to
automatically recover even if accepting some data loss.)
So if your goal is, "some file system corruptions and some complete
user data loss is OK, feel free to use nobarrier". After all, all the
user's data that we should care about is sync'ed to the cloud, right? :-)
And winning the benchmarketing game can mean millions and millions of dollars
to companies, and that's _obviously_ more important that user data.... :-/
Also, for people who are wondering how reliable/robust f2fs in the
ftace of corruption / SSD failures, I call your attention to this
Usenix paper, which will be presented at the upcoming Usenix ATC
conference in July:
https://www.usenix.org/conference/atc19/presentation/jaffer
It's not available yet, but in a week or two, it should be available
to people who have registered for Usenix ATC 2019, and if you care
about user data, and you are using f2fs, it's worth the price of
admission all by itself IMHO.
- Ted
P.S. I have considered adding tuning knobs to make fsync/fdatasync be
tunable perhaps on a per-uid basis, maybe on a root vs non-root basis,
mostly to protect systems against hostile, mutually suspicious Docker
users from each other. The problem is that it can also be used for
benchmarketing wars, which I really dislike, and I know there are
enterprise distros who hate these features because clueless sysadmins
turn them on, and then they lose data, and then they turn up at the
Distribution's Help Desk asking for help / complaining.
So if you really want a patch which does something like
fsync_mode=nobarrier, it's really not hard. To quote Shakespeare
(when Hamlet was pondering how easy it would be to commit suicide), it
can be done "with a bare bodkin". The question is whether it is a
*good* thing to do, not whether it can be done. And a lot of this
depends on your morals --- after all, companies have been known to
disable the CPU thermal safeties in order to win benchmarketing
wars....