Hi,
(+CC the original author, Darrick)
I've been investigating (in the context of my EFI ext4 driver) why all
ext4 checksums appear inverted. After making sure my CRC32c
implementation was correct and up-to-par with other ones, I looked at
the fs/ext4 checksumming code, which took me to the implementation of
ext4_chksum in ext4.h (excuse the gmail whitespace damage):
>static inline u32 ext4_chksum(struct ext4_sb_info *sbi, u32 crc,
> const void *address, unsigned int length)
>{
> struct {
> struct shash_desc shash;
> char ctx[4];
> } desc;
Open coding the crc32c crypto driver's internal state, seemingly to save a call?
>
> BUG_ON(crypto_shash_descsize(sbi->s_chksum_driver)!=sizeof(desc.ctx));
>
> desc.shash.tfm = sbi->s_chksum_driver;
> *(u32 *)desc.ctx = crc;
...we set the starting CRC
>
> BUG_ON(crypto_shash_update(&desc.shash, address, length));
then call update, which keeps the current internal state in ctx[4]
>
> return *(u32 *)desc.ctx;
and then we never call ->final() (nor ->finup()), which for crc32c would do:
> put_unaligned_le32(~ctx->crc, out);
and as such get me the properly "inverted" crc32c I would expect.
FreeBSD never found this issue as their calculate_crc32c seems borked
too, and never inverts the result.
Is my assessment correct? Was ->final() never called on purpose, or is
it an accident? Or is this merely a CRC32c variation I'm unaware of?
I'd like to make sure I get all the context on this, before sending
any kind of documentation patch :)
Thanks,
Pedro
Hi Pedro,
On Mon, Jun 26, 2023 at 09:17:10PM +0100, Pedro Falcato wrote:
> Hi,
>
> (+CC the original author, Darrick)
> I've been investigating (in the context of my EFI ext4 driver) why all
> ext4 checksums appear inverted. After making sure my CRC32c
> implementation was correct and up-to-par with other ones, I looked at
> the fs/ext4 checksumming code, which took me to the implementation of
> ext4_chksum in ext4.h (excuse the gmail whitespace damage):
>
> >static inline u32 ext4_chksum(struct ext4_sb_info *sbi, u32 crc,
> > const void *address, unsigned int length)
> >{
> > struct {
> > struct shash_desc shash;
> > char ctx[4];
> > } desc;
>
> Open coding the crc32c crypto driver's internal state, seemingly to save a call?
> >
> > BUG_ON(crypto_shash_descsize(sbi->s_chksum_driver)!=sizeof(desc.ctx));
> >
> > desc.shash.tfm = sbi->s_chksum_driver;
> > *(u32 *)desc.ctx = crc;
>
> ...we set the starting CRC
> >
> > BUG_ON(crypto_shash_update(&desc.shash, address, length));
>
> then call update, which keeps the current internal state in ctx[4]
> >
> > return *(u32 *)desc.ctx;
>
> and then we never call ->final() (nor ->finup()), which for crc32c would do:
> > put_unaligned_le32(~ctx->crc, out);
>
> and as such get me the properly "inverted" crc32c I would expect.
> FreeBSD never found this issue as their calculate_crc32c seems borked
> too, and never inverts the result.
>
> Is my assessment correct? Was ->final() never called on purpose, or is
> it an accident? Or is this merely a CRC32c variation I'm unaware of?
>
> I'd like to make sure I get all the context on this, before sending
> any kind of documentation patch :)
>
> Thanks,
> Pedro
As far as I can tell, you are correct that ext4's CRC32C is just a raw CRC. It
doesn't do the bitwise inversion at either the beginning or end.
IMO, this is a mistake. In the design of CRCs, doing these inversions is
recommended to strengthen the CRC slightly.
However, it's also a common "mistake" to leave them out, and not too important,
especially if many of the messages checksummed are fixed-length structures.
Yes, if ext4 had used the kernel crypto API "properly", with crypto_shash_init()
+ crypto_shash_update() + crypto_shash_final(), it would have gotten the
inversion at the beginning and end. (Note, this is true for "crc32c" but not
"crc32". The crypto API isn't consistent about its CRC conventions.)
But I'd also think of ext4's direct use of crypto_shash_update() as less of ext4
taking a shortcut or hack, and more of ext4 just having to work around the
kernel crypto API being very clunky and inefficient for use cases like this...
- Eric
On Tue, Jun 27, 2023 at 09:52:06PM -0700, Eric Biggers wrote:
> Hi Pedro,
>
> On Mon, Jun 26, 2023 at 09:17:10PM +0100, Pedro Falcato wrote:
> > Hi,
> >
> > (+CC the original author, Darrick)
> > I've been investigating (in the context of my EFI ext4 driver) why all
> > ext4 checksums appear inverted. After making sure my CRC32c
> > implementation was correct and up-to-par with other ones, I looked at
> > the fs/ext4 checksumming code, which took me to the implementation of
> > ext4_chksum in ext4.h (excuse the gmail whitespace damage):
> >
> > >static inline u32 ext4_chksum(struct ext4_sb_info *sbi, u32 crc,
> > > const void *address, unsigned int length)
> > >{
> > > struct {
> > > struct shash_desc shash;
> > > char ctx[4];
> > > } desc;
> >
> > Open coding the crc32c crypto driver's internal state, seemingly to save a call?
> > >
> > > BUG_ON(crypto_shash_descsize(sbi->s_chksum_driver)!=sizeof(desc.ctx));
> > >
> > > desc.shash.tfm = sbi->s_chksum_driver;
> > > *(u32 *)desc.ctx = crc;
> >
> > ...we set the starting CRC
> > >
> > > BUG_ON(crypto_shash_update(&desc.shash, address, length));
> >
> > then call update, which keeps the current internal state in ctx[4]
> > >
> > > return *(u32 *)desc.ctx;
> >
> > and then we never call ->final() (nor ->finup()), which for crc32c would do:
> > > put_unaligned_le32(~ctx->crc, out);
> >
> > and as such get me the properly "inverted" crc32c I would expect.
> > FreeBSD never found this issue as their calculate_crc32c seems borked
> > too, and never inverts the result.
> >
> > Is my assessment correct? Was ->final() never called on purpose, or is
> > it an accident? Or is this merely a CRC32c variation I'm unaware of?
> >
> > I'd like to make sure I get all the context on this, before sending
> > any kind of documentation patch :)
> >
> > Thanks,
> > Pedro
>
> As far as I can tell, you are correct that ext4's CRC32C is just a raw CRC. It
> doesn't do the bitwise inversion at either the beginning or end.
Yep.
> IMO, this is a mistake. In the design of CRCs, doing these inversions is
> recommended to strengthen the CRC slightly.
Yep. I wondered about that too back in the day (see below).
> However, it's also a common "mistake" to leave them out, and not too important,
> especially if many of the messages checksummed are fixed-length structures.
>
> Yes, if ext4 had used the kernel crypto API "properly", with crypto_shash_init()
> + crypto_shash_update() + crypto_shash_final(), it would have gotten the
> inversion at the beginning and end. (Note, this is true for "crc32c" but not
> "crc32". The crypto API isn't consistent about its CRC conventions.)
15 years ago when Ted and I first started talking about adding checksums
to metadata blocks, we looked at what other parts of the kernel did, and
stumbled upon lib/libcrc32c.c:
u32 crc32c(u32 crc, const void *address, unsigned int length)
{
SHASH_DESC_ON_STACK(shash, tfm);
u32 ret, *ctx = (u32 *)shash_desc_ctx(shash);
int err;
shash->tfm = tfm;
*ctx = crc;
err = crypto_shash_update(shash, address, length);
BUG_ON(err);
ret = *ctx;
barrier_data(ctx);
return ret;
}
EXPORT_SYMBOL(crc32c);
This looked like a handy crc32c library function that we could use to
avoid dealing with the crypto api. I noticed way back then that it
didn't invert the outcome, but Ted and I decided it wasn't a big deal.
btrfs and XFS both used this library function in the same way.
Eventually someone else (Andreas, maybe?) piped up to suggest that
ext4/jbd2 should load the crc32{,c} driver dynamically to avoid a hard
dependency on crc32 if the user is only running old filesystems, so we
did end up using the crypto api directly. Unfortunately, ext4 can't
call the shash finalizer to invert the crc because that'll break the
ondisk format.
> But I'd also think of ext4's direct use of crypto_shash_update() as less of ext4
> taking a shortcut or hack, and more of ext4 just having to work around the
> kernel crypto API being very clunky and inefficient for use cases like this...
At the time I thought that libcrc32c.c was a convenient shim for anyone
who didn't want to deal with the clunky crypto api. It would have
really helped me to have had documentation of the preconditions (start
with ~0) and postconditions (invert the return value of the last call)
to nudge me into using this function correctly, because expecting
callers also to be really smart about crc32c as an alternative to
written guidelines is ... idiotic^WLKML.
An example of how to do a buffer would have helped:
static inline u32 crc32c_buffer(const void *address, unsigned int length)
{
return ~crc32c(~0U, address, length);
}
This misuse could be fixed, but you'd have to burn an incompat flag to
do it. I'm less smart about crc32* than I was back in 2008, so I also
don't have the skills to figure out if the correction is worth the cost.
--D
> - Eric
On Wed, Jun 28, 2023 at 11:58:32AM -0700, Darrick J. Wong wrote:
> > As far as I can tell, you are correct that ext4's CRC32C is just a raw CRC. It
> > doesn't do the bitwise inversion at either the beginning or end.
>
> Yep.
>
> > IMO, this is a mistake. In the design of CRCs, doing these inversions is
> > recommended to strengthen the CRC slightly.
>
> Yep. I wondered about that too back in the day (see below).
>
> > However, it's also a common "mistake" to leave them out, and not too important,
> > especially if many of the messages checksummed are fixed-length structures.
> >
> > Yes, if ext4 had used the kernel crypto API "properly", with crypto_shash_init()
> > + crypto_shash_update() + crypto_shash_final(), it would have gotten the
> > inversion at the beginning and end. (Note, this is true for "crc32c" but not
> > "crc32". The crypto API isn't consistent about its CRC conventions.)
>
> 15 years ago when Ted and I first started talking about adding checksums
> to metadata blocks, we looked at what other parts of the kernel did, and
> stumbled upon lib/libcrc32c.c:
>
> u32 crc32c(u32 crc, const void *address, unsigned int length)
> {
> SHASH_DESC_ON_STACK(shash, tfm);
> u32 ret, *ctx = (u32 *)shash_desc_ctx(shash);
> int err;
>
> shash->tfm = tfm;
> *ctx = crc;
>
> err = crypto_shash_update(shash, address, length);
> BUG_ON(err);
>
> ret = *ctx;
> barrier_data(ctx);
> return ret;
> }
> EXPORT_SYMBOL(crc32c);
>
> This looked like a handy crc32c library function that we could use to
> avoid dealing with the crypto api. I noticed way back then that it
> didn't invert the outcome, but Ted and I decided it wasn't a big deal.
> btrfs and XFS both used this library function in the same way.
>
> Eventually someone else (Andreas, maybe?) piped up to suggest that
> ext4/jbd2 should load the crc32{,c} driver dynamically to avoid a hard
> dependency on crc32 if the user is only running old filesystems, so we
> did end up using the crypto api directly. Unfortunately, ext4 can't
> call the shash finalizer to invert the crc because that'll break the
> ondisk format.
>
> > But I'd also think of ext4's direct use of crypto_shash_update() as less of ext4
> > taking a shortcut or hack, and more of ext4 just having to work around the
> > kernel crypto API being very clunky and inefficient for use cases like this...
>
> At the time I thought that libcrc32c.c was a convenient shim for anyone
> who didn't want to deal with the clunky crypto api. It would have
> really helped me to have had documentation of the preconditions (start
> with ~0) and postconditions (invert the return value of the last call)
> to nudge me into using this function correctly, because expecting
> callers also to be really smart about crc32c as an alternative to
> written guidelines is ... idiotic^WLKML.
>
> An example of how to do a buffer would have helped:
>
> static inline u32 crc32c_buffer(const void *address, unsigned int length)
> {
> return ~crc32c(~0U, address, length);
> }
IMO the best API for CRC's is like zlib's where you pass in 0 to start the CRC
and it does both the pre and post inversions for you. Note, "updates" still
work as expected, since two inversions cancel each other out.
Unfortunately, many but not all of the CRC APIs in Linux decided to go with the
other convention, which is to leave the inversions entirely to the caller.
I think the kernel should also make the architecture-specific CRC
implementations accessible directly via a library API, similar to what's done
for Blake2s and ChaCha20. There should be no need to go through shash at all...
>
> This misuse could be fixed, but you'd have to burn an incompat flag to
> do it. I'm less smart about crc32* than I was back in 2008, so I also
> don't have the skills to figure out if the correction is worth the cost.
>
> --D
No, it's not worth changing the ext4 on-disk format for this.
- Eric
On Mon, Jul 3, 2023 at 8:48 PM Eric Biggers <[email protected]> wrote:
>
Hi folks, really sorry for the big delay, this thread really slipped my mind :)
> IMO the best API for CRC's is like zlib's where you pass in 0 to start the CRC
> and it does both the pre and post inversions for you. Note, "updates" still
> work as expected, since two inversions cancel each other out.
I agree, I did that when adding CRC32c to EFI. u32
calculate_crc32c(const void *buf, size_t len, u32 initial) with
inversions on initial and the result is pretty simple and effective.
> Unfortunately, many but not all of the CRC APIs in Linux decided to go with the
> other convention, which is to leave the inversions entirely to the caller.
>
> I think the kernel should also make the architecture-specific CRC
> implementations accessible directly via a library API, similar to what's done
> for Blake2s and ChaCha20. There should be no need to go through shash at all...
>
> >
> > This misuse could be fixed, but you'd have to burn an incompat flag to
> > do it. I'm less smart about crc32* than I was back in 2008, so I also
> > don't have the skills to figure out if the correction is worth the cost.
> >
> > --D
>
> No, it's not worth changing the ext4 on-disk format for this.
I don't think we'd need to change the on-disk format for this? Or for
any other hash algorithm change (as long as the resulting digest is
32-bit), right? Given we have s_checksum_type.
Or do existing tools dangerously assume CRC32c at the moment?
In any case, thank you both for the background on this, I'll try to
submit a patch to the docs to clarify this point.
--
Pedro