2023-07-17 08:04:09

by Kai Tomerius

[permalink] [raw]
Subject: File system robustness

Hi,

let's suppose an embedded system with a read-only squashfs root file
system, and a writable ext4 data partition with data=journal.
Furthermore, the data partition shall be protected with dm-integrity.

Normally, I'd umount the data partition while shutting down the
system. There might be cases though where power is cut. In such a
case, there'll be ext4 recoveries, which is ok.

How robust would such a setup be? Are there chances that the ext4
requires a fsck? What might happen if fsck is not run, ever? Is there
a chance that the data partition can't be mounted at all? How often
might that happen?

Thx
regards
Kai


2023-07-17 14:07:25

by Alan C. Assis

[permalink] [raw]
Subject: Re: File system robustness

Hi Kai,

On 7/17/23, Kai Tomerius <[email protected]> wrote:
> Hi,
>
> let's suppose an embedded system with a read-only squashfs root file
> system, and a writable ext4 data partition with data=journal.
> Furthermore, the data partition shall be protected with dm-integrity.
>
> Normally, I'd umount the data partition while shutting down the
> system. There might be cases though where power is cut. In such a
> case, there'll be ext4 recoveries, which is ok.
>
> How robust would such a setup be? Are there chances that the ext4
> requires a fsck? What might happen if fsck is not run, ever? Is there
> a chance that the data partition can't be mounted at all? How often
> might that happen?
>

Please take a look at this document:

https://elinux.org/images/0/02/Filesystem_Considerations_for_Embedded_Devices.pdf

In general EXT4 is fine, but it has some limitation, more info here:
https://opensource.com/article/18/4/ext4-filesystem

I think Linux users suffer from the same problem we have with NuttX (a
Linux-like RTOS): which FS to use?

So for deep embedded systems running NuttX I follow this logic:

I need better performance and wear leveling, but I don't need to worry
about power loss: I choose SmartFS

I need good performance, wear leveling and some power loss protection: SPIFFS

I need good performance, wear leveling and good protection for
frequent power loss: LittleFS

In a NuttShell: There is no FS that 100% meets all user needs, select
the FS that meets your core needs and do lots of field testing to
confirm it works as expected.

BR,

Alan

2023-07-18 05:35:01

by Kai Tomerius

[permalink] [raw]
Subject: Re: File system robustness

Hi Alan,

thx a lot.

I should have mentioned that I'll have a large NAND flash, so ext4
might still be the file system of choice. The other ones you mentioned
are interesting to consider, but seem to be more fitting for a smaller
NOR flash.

Regards
Kai



On Mon, Jul 17, 2023 at 10:50:50AM -0300, Alan C. Assis wrote:
> Hi Kai,
>
> On 7/17/23, Kai Tomerius <[email protected]> wrote:
> > Hi,
> >
> > let's suppose an embedded system with a read-only squashfs root file
> > system, and a writable ext4 data partition with data=journal.
> > Furthermore, the data partition shall be protected with dm-integrity.
> >
> > Normally, I'd umount the data partition while shutting down the
> > system. There might be cases though where power is cut. In such a
> > case, there'll be ext4 recoveries, which is ok.
> >
> > How robust would such a setup be? Are there chances that the ext4
> > requires a fsck? What might happen if fsck is not run, ever? Is there
> > a chance that the data partition can't be mounted at all? How often
> > might that happen?
> >
>
> Please take a look at this document:
>
> https://elinux.org/images/0/02/Filesystem_Considerations_for_Embedded_Devices.pdf
>
> In general EXT4 is fine, but it has some limitation, more info here:
> https://opensource.com/article/18/4/ext4-filesystem
>
> I think Linux users suffer from the same problem we have with NuttX (a
> Linux-like RTOS): which FS to use?
>
> So for deep embedded systems running NuttX I follow this logic:
>
> I need better performance and wear leveling, but I don't need to worry
> about power loss: I choose SmartFS
>
> I need good performance, wear leveling and some power loss protection: SPIFFS
>
> I need good performance, wear leveling and good protection for
> frequent power loss: LittleFS
>
> In a NuttShell: There is no FS that 100% meets all user needs, select
> the FS that meets your core needs and do lots of field testing to
> confirm it works as expected.
>
> BR,
>
> Alan

2023-07-18 12:03:59

by Bjørn Forsman

[permalink] [raw]
Subject: Re: File system robustness

On Tue, 18 Jul 2023 at 08:03, Kai Tomerius <[email protected]> wrote:
> I should have mentioned that I'll have a large NAND flash, so ext4
> might still be the file system of choice. The other ones you mentioned
> are interesting to consider, but seem to be more fitting for a smaller
> NOR flash.

If you mean raw NAND flash I would think UBIFS is still the way to go?
(It's been several years since I was into embedded Linux systems.)

https://elinux.org/images/0/02/Filesystem_Considerations_for_Embedded_Devices.pdf
is focused on eMMC/SD Cards, which have built-in controllers that
enable them to present a block device interface, which is very unlike
what raw NAND devices have.

Please see https://www.kernel.org/doc/html/latest/filesystems/ubifs.html
for more info.

Regards,
Bjørn

2023-07-20 04:29:24

by Theodore Ts'o

[permalink] [raw]
Subject: Re: File system robustness

On Wed, Jul 19, 2023 at 08:22:43AM +0200, Martin Steigerwald wrote:
>
> Is "nobarrier" mount option still a thing? I thought those mount options
> have been deprecated or even removed with the introduction of cache flush
> handling in kernel 2.6.37?

Yes, it's a thing, and if your server has a UPS with a reliable power
failure / low battery feedback, it's *possible* to engineer a reliable
system. Or, for example, if you have a phone with an integrated
battery, so when you drop it the battery compartment won't open and
the battery won't go flying out, *and* the baseboard management
controller (BMC) will halt the CPU before the battery complete dies,
and gives a chance for the flash storage device to commit everything
before shutdown, *and* the BMC arranges to make sure the same thing
happens when the user pushes and holds the power button for 30
seconds, then it could be safe.

We also use nobarrier for a scratch file systems which by definition
go away when the borg/kubernetes job dies, and which will *never*
survive a reboot, let alone a power failure. In such a situation,
there's no point sending the cache flush, because the partition will
be mkfs'ed on reboot. Or, in if the iSCSI or Cloud Persistent Disk
will *always* go away when the VM dies, because any persistent state
is saved to some cluster or distributed file store (e.g., to the MySQL
server, or Big Table, or Spanner, etc. In these cases, you don't
*want* the Cache Flush operation, since skipping it reduce I/O
overhead.

So if you know what you are doing, in certain specialized use cases,
nobarrier can make sense, and it is used today at my $WORK's data
center for production jobs *all* the time. So we won't be making
ext4's nobarrier mount option go away; it has users. :-)

Cheers,

- Ted

2023-07-20 05:11:36

by Theodore Ts'o

[permalink] [raw]
Subject: Re: File system robustness

On Wed, Jul 19, 2023 at 12:51:39PM +0200, Kai Tomerius wrote:
> > In answer to Kai's original question, the setup that was described
> > should be fine --- assuming high quality hardware.
>
> I wonder how to judge that ... it's an eMMC supposedly complying to
> some JEDEC standard, so it *should* be ok.

JEDEC promulgates the eMMC interface specification. That's the
interface used to talk to the device, much like SATA and SCSI and
NVMe. The JEDEC eMMC specification says nothing about the quality of
the implementation of the FTL, or whether it is safe from power drops,
or how many wirte cycles are supported before the eMMC soldered on the
$2000 MCU would expire.

If you're a cell phone manufacturer, the way you judge it is *before*
you buy a few million of the eMMC devices, you subject the samples to
a huge amount of power drops and other torture tests (including
verifying the claimed number of write cycles in spec sheet), before
the device is qualified for use in your product.

> But on another aspect: how about the interaction between dm-integrity
> and ext4? Sure, they each have their own journal, and they're
> independent layers. Is there anything that could go wrong, say a block
> that can't be recovered in the dm-integrity layer, causing ext4 to run
> into trouble, e.g., an I/O error that prevents ext4 from mounting?
>
> I assume tne answer is "No", but can I be sure?

If there are I/O errors, with or without dm-integrity, you can have
problems. dm-integrity will turn bit-flips into hard I/O errors, but
a bit-flip might cause silent file system cocrruption (at least at
first), such that when you finally notice that there's a problem,
several days or weeks or months may have passed, the data loss might
be far worse. So turning an innocous bit flip into a hard I/O error
can be a feature, assuming that you've allowed for it in your system
architecture.

If you assume that the hardware doesn't introduce I/O errors or bit
flips, and if you assume you don't have any attackers trying to
corrupt the block device with bit flips, then sure, nothing will go
wrong. You can buy perfect hardware from the same supply store where
high school physics teachers buy frictionless pulleys and massless
ropes. :-)

Cheers,

- Ted