2021-05-07 16:59:04

by Daniel Kestrel

[permalink] [raw]
Subject: Fwd: xts.c and block size inkonsistency? cannot pass generic driver comparision tests

Hi Eric,

I agree, that it can't be built on top of the kernels CBC. But in the
hardware CBC, e.g. for encryption I set the IV (encrypted tweak), set
the hardwares aes mode to CBC and start the encrypt of a 16 byte
block, then do an additional xor after that -> result of that full
block is the same as XTS. Then I gfmul the tweak and repeat the
previous starting with setting the tweak as iv.
Doing that is much faster and much more efficient than using the
kernels xts on top of ecb(aes). But it introduces the problem that I
have somehow to handle the CTS after my walk loop that just processes
full blocks or multiples of that. And I am trying to figure out, what
the best way is to do that with the least amount of code in my driver.
I cannot set blocksize to 1, because then the block size comparison to
generic xts fails and If I set the walksize to 1, I get the alignment
and split errors and would have to handle the splits and
missalignments manually.
So actually I need a combination of what the walk does (handle
alignment and splits) plus getting the last complete and incomplete
block after walk_skcipher_done returns -EINVAL. At least thats my
current idea. I could just copy most of the code from xts, but there
is a lot of stuff, that is not needed, if I combine the hardware CBC
and xor to be XEX (XTS without the cipher text stealing).

Thanks.

Am Fr., 7. Mai 2021 um 08:56 Uhr schrieb Eric Biggers <[email protected]>:
>
> On Fri, May 07, 2021 at 07:57:01AM +0200, Kestrel seventyfour wrote:
> > Hi,
> >
> > I have also added xts aes on combining the old hardware cbc algorithm
> > with an additional xor and the gfmul tweak handling. However, I
> > struggle to pass the comparision tests to the generic xts
> > implementation.
>
> XTS can't be built on top of CBC, unless you only do 1 block at a time.
>
> It can be built on top of ECB, which is what the template already does.
>
> Before getting too far into your questions, are you sure that what you're trying
> to do actually makes sense?
>
> - Eric


2021-05-07 18:59:13

by Eric Biggers

[permalink] [raw]
Subject: Re: Fwd: xts.c and block size inkonsistency? cannot pass generic driver comparision tests

On Fri, May 07, 2021 at 03:02:11PM +0200, Kestrel seventyfour wrote:
> Hi Eric,
>
> I agree, that it can't be built on top of the kernels CBC. But in the
> hardware CBC, e.g. for encryption I set the IV (encrypted tweak), set
> the hardwares aes mode to CBC and start the encrypt of a 16 byte
> block, then do an additional xor after that -> result of that full
> block is the same as XTS. Then I gfmul the tweak and repeat the
> previous starting with setting the tweak as iv.
> Doing that is much faster and much more efficient than using the
> kernels xts on top of ecb(aes). But it introduces the problem that I
> have somehow to handle the CTS after my walk loop that just processes
> full blocks or multiples of that. And I am trying to figure out, what
> the best way is to do that with the least amount of code in my driver.
> I cannot set blocksize to 1, because then the block size comparison to
> generic xts fails and If I set the walksize to 1, I get the alignment
> and split errors and would have to handle the splits and
> missalignments manually.
> So actually I need a combination of what the walk does (handle
> alignment and splits) plus getting the last complete and incomplete
> block after walk_skcipher_done returns -EINVAL. At least thats my
> current idea. I could just copy most of the code from xts, but there
> is a lot of stuff, that is not needed, if I combine the hardware CBC
> and xor to be XEX (XTS without the cipher text stealing).
>

Wouldn't it be easier to just implement ecb(aes) in your driver (using your
workaround to do it 1 block at a time using the hardware CBC engine)? If you
implement ecb(aes), then the xts template can use it, so you wouldn't need to
implement xts(aes) directly. And this would still avoid all the individual
calls to crypto_cipher_{encrypt,decrypt}, which I expect is the performance
bottleneck that you were seeing.

- Eric