2023-07-18 13:08:07

by Alan C. Assis

[permalink] [raw]
Subject: Re: File system robustness

Hi Bjørn,

On 7/18/23, Bjørn Forsman <[email protected]> wrote:
> On Tue, 18 Jul 2023 at 08:03, Kai Tomerius <[email protected]> wrote:
>> I should have mentioned that I'll have a large NAND flash, so ext4
>> might still be the file system of choice. The other ones you mentioned
>> are interesting to consider, but seem to be more fitting for a smaller
>> NOR flash.
>
> If you mean raw NAND flash I would think UBIFS is still the way to go?
> (It's been several years since I was into embedded Linux systems.)
>
> https://elinux.org/images/0/02/Filesystem_Considerations_for_Embedded_Devices.pdf
> is focused on eMMC/SD Cards, which have built-in controllers that
> enable them to present a block device interface, which is very unlike
> what raw NAND devices have.
>
> Please see https://www.kernel.org/doc/html/latest/filesystems/ubifs.html
> for more info.
>

You are right, for NAND there is an old (but gold) presentation here:

https://elinux.org/images/7/7e/ELC2009-FlashFS-Toshiba.pdf

UBIFS and YAFFS2 are the way to go.

But please note that YAFFS2 needs license payment for commercial
application (something that I only discovered recently when Xiaomi
integrated it into NuttX mainline, bad surprise).

BR,

Alan


2023-07-18 21:54:00

by Theodore Ts'o

[permalink] [raw]
Subject: Re: File system robustness

On Tue, Jul 18, 2023 at 10:04:55AM -0300, Alan C. Assis wrote:
>
> You are right, for NAND there is an old (but gold) presentation here:
>
> https://elinux.org/images/7/7e/ELC2009-FlashFS-Toshiba.pdf
>
> UBIFS and YAFFS2 are the way to go.

This presentation is specifically talking about flash devices that do
not have a flash translation layer (that is, they are using the MTD
interface).

There are multiple kinds of flash devices, that can be exported via
different interfaces: MTD, USB Storage, eMMC, UFS, SATA, SCSI, NVMe,
etc. There are also differences in terms of the sophistication of the
Flash Translation Layer in terms of how powerful is the
microcontroller, how much memory and persistant storage for flash
metadata is available to the FTL, etc.

F2FS is a good choice for "low end flash", especially those flash
devices that use a very simplistic mapping between LBA (block/sector
numbers) and the physical flash to be used, and may have a very
limited number of flash blocks that can be open for modification at a
time. For more sophiscated flash storage devices (e.g., SSD's and
higher end flash devices), this consideration won't matter, and then
the best file system to use will be very dependant on your workload.

In answer to Kai's original question, the setup that was described
should be fine --- assuming high quality hardware. There are some
flash devices that designed to handle power failures correctly; which
is to say, if power is cut suddenly, the data used by the Flash
Translation Layer can be corrupted, in which case data written months
or years ago (not just recent data) could be lost. There have been
horror stories about wedding photographers who dropped their camera,
and the SD Card came shooting out, and *all* of the data that was shot
on some couple's special day was completely *gone*.

Assuming that you have valid, power drop safe hardware, running fsck
after a power cut is not necessary, at least as far as file system
consistency is concerned. If you have badly written userspace
application code, then all bets can be off. For example, consider the
following sequence of events:

1) An application like Tuxracer truncates the top-ten score file
2) It then writes a new top-ten score file
3) <Fail to call fsync, or write the file to a foo.new and then
rename on top of the old version of the file>
4) Ut then closes the Open GL library, triggering a bug in the cruddy
proprietary binary-only kernel module video driver,
leading to an immediate system crash.
5) Complain to the file system developers that users' top-ten score
file was lost, and what are the file system developers going to
do about it?
6) File system developers start creating T-shirts saying that what userspace
applications really are asking for is a new open(2) flag, O_PONIES[1]

[1] https://blahg.josefsipek.net/?p=364

So when you talk about overall system robustness, you need robust
hardware, you need a robust file aystem, you need to use the file
system correctly, and you have robust userspace applications.

If you get it all right, you'll be fine. On the other hand, if you
have crappy hardware (such as might be found for cheap in the checkout
counter of the local Micro Center, or in a back alley vendor in
Shenzhen, China), or if you do something like misconfigure the file
system such as using the "nobarrier" mount option "to speed things
up", or if you have applications that update files in an unsafe
manner, then you will have problems.

Welcome to systems engineering. :-)

- Ted

2023-07-19 06:44:46

by Martin Steigerwald

[permalink] [raw]
Subject: Re: File system robustness

Theodore Ts'o - 18.07.23, 23:32:12 CEST:
> If you get it all right, you'll be fine. On the other hand, if you
> have crappy hardware (such as might be found for cheap in the checkout
> counter of the local Micro Center, or in a back alley vendor in
> Shenzhen, China), or if you do something like misconfigure the file
> system such as using the "nobarrier" mount option "to speed things
> up", or if you have applications that update files in an unsafe
> manner, then you will have problems.

Is "nobarrier" mount option still a thing? I thought those mount options
have been deprecated or even removed with the introduction of cache flush
handling in kernel 2.6.37?

Hmm, the mount option has been removed from XFS in in kernel 4.19
according to manpage, however no mention of any deprecation or removal
in ext4 manpage. It also does not seem to be removed in BTRFS at least
according to manpage btrfs(5).

--
Martin



2023-07-19 11:08:01

by Kai Tomerius

[permalink] [raw]
Subject: Re: File system robustness

> In answer to Kai's original question, the setup that was described
> should be fine --- assuming high quality hardware.

I wonder how to judge that ... it's an eMMC supposedly complying to
some JEDEC standard, so it *should* be ok.

> ... if power is cut suddenly, the data used by the Flash
> Translation Layer can be corrupted, in which case data written months
> or years ago (not just recent data) could be lost.

At least I haven't observed anything like that up to now.

But on another aspect: how about the interaction between dm-integrity
and ext4? Sure, they each have their own journal, and they're
independent layers. Is there anything that could go wrong, say a block
that can't be recovered in the dm-integrity layer, causing ext4 to run
into trouble, e.g., an I/O error that prevents ext4 from mounting?

I assume tne answer is "No", but can I be sure?

Thx
regards
Kai