2018-08-30 12:22:22

by Krzysztof Kozlowski

[permalink] [raw]
Subject: Locking for HW crypto accelerators

Hi,

I am trying to figure out necessary locking on the driver side of
crypto HW accelerator for symmetric hash (actually: CRC). I
implemented quite simple driver for shash_alg.

I looked at the docs, I looked at the crypto kcapi core code... and
there is nothing about necessary locking. kcapi does not perform it.

My HW is quite similar to drivers/crypto/stm32/stm32_crc32.c so it has
only one HW set of registers for dealing with CRC. Or in other words,
only one queue of one element. :) I implemented all shash_alg
callbacks - init(), update(), final()... and also finup() (manually
calling update+final) and digest() (init+update+final).

Now imagine multiple user-space users of this crypto alg where all of
them call kcapi_md_digest() (so essentially init() -> update() ->
final()). It seems that kcapi does not perform any locking here so at
some point updates from different processes might be mixed with
finals:

Process A: Process B:
init()
init()
update()
update()
final()
final()

My findings show that the requests are indeed being mixed with each other...

Should driver perform any weird locking here? Or maybe that is the
case of using ONLY the digest() callback (so no update, no final)
because my HW cannot track different kcapi requests?

Best regards,
Krzysztof


2018-08-30 12:56:01

by Stephan Müller

[permalink] [raw]
Subject: Re: Locking for HW crypto accelerators

Am Donnerstag, 30. August 2018, 14:22:22 CEST schrieb Krzysztof Kozlowski:

Hi Krzysztof,

> Hi,
>
> I am trying to figure out necessary locking on the driver side of
> crypto HW accelerator for symmetric hash (actually: CRC). I
> implemented quite simple driver for shash_alg.
>
> I looked at the docs, I looked at the crypto kcapi core code... and
> there is nothing about necessary locking. kcapi does not perform it.
>
> My HW is quite similar to drivers/crypto/stm32/stm32_crc32.c so it has
> only one HW set of registers for dealing with CRC. Or in other words,
> only one queue of one element. :) I implemented all shash_alg
> callbacks - init(), update(), final()... and also finup() (manually
> calling update+final) and digest() (init+update+final).
>
> Now imagine multiple user-space users of this crypto alg where all of
> them call kcapi_md_digest() (so essentially init() -> update() ->
> final()). It seems that kcapi does not perform any locking here so at
> some point updates from different processes might be mixed with
> finals:
>
> Process A: Process B:
> init()
> init()
> update()
> update()
> final()
> final()
>
> My findings show that the requests are indeed being mixed with each other...
>
> Should driver perform any weird locking here? Or maybe that is the
> case of using ONLY the digest() callback (so no update, no final)
> because my HW cannot track different kcapi requests?

The hashing is performed on a buffer provided by the caller. E.g. it is the
buffer pointed to by the ahash request or the shash desc structure. All
operations of init/update/final operate on that memory.

If you have parallel requests, each caller has a private buffer that it
provides to the kernel crypto API. This applies also to AF_ALG.

Thus, as long as the individual operations of init/update and final are atomic
operations, there should be no locking necessary.

Thus, all your driver needs to guarantee is the atomicitcy of the init/update/
final operation in respect to your hardware state.

Ciao
Stephan

2018-08-30 13:09:05

by Krzysztof Kozlowski

[permalink] [raw]
Subject: Re: Locking for HW crypto accelerators

On Thu, 30 Aug 2018 at 14:59, Stephan Mueller <[email protected]> wrote:
>
> Am Donnerstag, 30. August 2018, 14:22:22 CEST schrieb Krzysztof Kozlowski:
>
> Hi Krzysztof,
>
> > Hi,
> >
> > I am trying to figure out necessary locking on the driver side of
> > crypto HW accelerator for symmetric hash (actually: CRC). I
> > implemented quite simple driver for shash_alg.
> >
> > I looked at the docs, I looked at the crypto kcapi core code... and
> > there is nothing about necessary locking. kcapi does not perform it.
> >
> > My HW is quite similar to drivers/crypto/stm32/stm32_crc32.c so it has
> > only one HW set of registers for dealing with CRC. Or in other words,
> > only one queue of one element. :) I implemented all shash_alg
> > callbacks - init(), update(), final()... and also finup() (manually
> > calling update+final) and digest() (init+update+final).
> >
> > Now imagine multiple user-space users of this crypto alg where all of
> > them call kcapi_md_digest() (so essentially init() -> update() ->
> > final()). It seems that kcapi does not perform any locking here so at
> > some point updates from different processes might be mixed with
> > finals:
> >
> > Process A: Process B:
> > init()
> > init()
> > update()
> > update()
> > final()
> > final()
> >
> > My findings show that the requests are indeed being mixed with each other...
> >
> > Should driver perform any weird locking here? Or maybe that is the
> > case of using ONLY the digest() callback (so no update, no final)
> > because my HW cannot track different kcapi requests?
>
> The hashing is performed on a buffer provided by the caller. E.g. it is the
> buffer pointed to by the ahash request or the shash desc structure. All
> operations of init/update/final operate on that memory.
>
> If you have parallel requests, each caller has a private buffer that it
> provides to the kernel crypto API. This applies also to AF_ALG.
>
> Thus, as long as the individual operations of init/update and final are atomic
> operations, there should be no locking necessary.
>
> Thus, all your driver needs to guarantee is the atomicitcy of the init/update/
> final operation in respect to your hardware state.

Thanks Stephan for hints. Let's assume the each of init, update and
final are atomic... but how about the relation between update and
final? I have two concurrent users in user-space but only one HW:

Process A: Process B:
init() and set_key()
init() and different key
update(some_data)
update(different_data)
final()
final()

The final() from process A will now produce the result of hashing/CRC
of some_data and different_data (and even maybe mixed with init() for
different key). All because in the meantime process B added its own
data to the HW.

Best regards,
Krzysztof

2018-08-30 13:19:09

by Stephan Müller

[permalink] [raw]
Subject: Re: Locking for HW crypto accelerators

Am Donnerstag, 30. August 2018, 15:09:05 CEST schrieb Krzysztof Kozlowski:

Hi Krzysztof,

> Thanks Stephan for hints. Let's assume the each of init, update and
> final are atomic... but how about the relation between update and
> final? I have two concurrent users in user-space but only one HW:
>
> Process A: Process B:
> init() and set_key()
> init() and different key
> update(some_data)
> update(different_data)
> final()
> final()
>
> The final() from process A will now produce the result of hashing/CRC
> of some_data and different_data (and even maybe mixed with init() for
> different key). All because in the meantime process B added its own
> data to the HW.

The question is where is the state of the cipher operation kept that is
produced with each of the init/update/final calls. Your answer implies that
this state is kept in hardware.

But commonly the state is kept in software. Look at ahash_request for example,
the __ctx pointer is intended to point to the memory the driver needs to store
its state.

Pick a random driver implementation and search for ahash_request_ctx, for
example.
>
> Best regards,
> Krzysztof



Ciao
Stephan

2018-08-30 13:27:31

by Kamil Konieczny

[permalink] [raw]
Subject: Re: Locking for HW crypto accelerators

On 30.08.2018 15:09, Krzysztof Kozlowski wrote:
> [...]
> Thanks Stephan for hints. Let's assume the each of init, update and
> final are atomic... but how about the relation between update and
> final? I have two concurrent users in user-space but only one HW:
>
> Process A: Process B:
> init() and set_key()
> init() and different key
> update(some_data)
> update(different_data)
> final()
> final()
>
> The final() from process A will now produce the result of hashing/CRC
> of some_data and different_data (and even maybe mixed with init() for
> different key). All because in the meantime process B added its own
> data to the HW.
>
> Best regards,
> Krzysztof
Can your hardware do export/import ?

If yes, you can use workqueue and guard HW with spinlock,
as in exynos hash in s5p-sss.c (or see other drivers).

--
Best regards,
Kamil Konieczny
Samsung R&D Institute Poland

2018-08-30 13:39:31

by Herbert Xu

[permalink] [raw]
Subject: Re: Locking for HW crypto accelerators

On Thu, Aug 30, 2018 at 02:22:22PM +0200, Krzysztof Kozlowski wrote:
> Hi,
>
> I am trying to figure out necessary locking on the driver side of
> crypto HW accelerator for symmetric hash (actually: CRC). I
> implemented quite simple driver for shash_alg.
>
> I looked at the docs, I looked at the crypto kcapi core code... and
> there is nothing about necessary locking. kcapi does not perform it.
>
> My HW is quite similar to drivers/crypto/stm32/stm32_crc32.c so it has
> only one HW set of registers for dealing with CRC. Or in other words,
> only one queue of one element. :) I implemented all shash_alg
> callbacks - init(), update(), final()... and also finup() (manually
> calling update+final) and digest() (init+update+final).
>
> Now imagine multiple user-space users of this crypto alg where all of
> them call kcapi_md_digest() (so essentially init() -> update() ->
> final()). It seems that kcapi does not perform any locking here so at
> some point updates from different processes might be mixed with
> finals:
>
> Process A: Process B:
> init()
> init()
> update()
> update()
> final()
> final()
>
> My findings show that the requests are indeed being mixed with each other...

After each operation all state must be stored in the ahash_request
object. The next operation should then load the state from the
request object into the hardware.

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2018-08-30 13:54:19

by Krzysztof Kozlowski

[permalink] [raw]
Subject: Re: Locking for HW crypto accelerators

On Thu, 30 Aug 2018 at 15:19, Stephan Mueller <[email protected]> wrote:
>
> Am Donnerstag, 30. August 2018, 15:09:05 CEST schrieb Krzysztof Kozlowski:
>
> Hi Krzysztof,
>
> > Thanks Stephan for hints. Let's assume the each of init, update and
> > final are atomic... but how about the relation between update and
> > final? I have two concurrent users in user-space but only one HW:
> >
> > Process A: Process B:
> > init() and set_key()
> > init() and different key
> > update(some_data)
> > update(different_data)
> > final()
> > final()
> >
> > The final() from process A will now produce the result of hashing/CRC
> > of some_data and different_data (and even maybe mixed with init() for
> > different key). All because in the meantime process B added its own
> > data to the HW.
>
> The question is where is the state of the cipher operation kept that is
> produced with each of the init/update/final calls. Your answer implies that
> this state is kept in hardware.

Yes, that's correct. The HW once initialized to specific CRC
parameters (size, polynomial, initial value), will operate on this
till another init happens.

> But commonly the state is kept in software. Look at ahash_request for example,
> the __ctx pointer is intended to point to the memory the driver needs to store
> its state.
>
> Pick a random driver implementation and search for ahash_request_ctx, for
> example.

I see now some drivers are indeed saving and restoring state.

Thanks!

Best regards,
Krzysztof

2018-08-30 13:59:51

by Krzysztof Kozlowski

[permalink] [raw]
Subject: Re: Locking for HW crypto accelerators

On Thu, 30 Aug 2018 at 15:27, Kamil Konieczny
<[email protected]> wrote:
>
> On 30.08.2018 15:09, Krzysztof Kozlowski wrote:
> > [...]
> > Thanks Stephan for hints. Let's assume the each of init, update and
> > final are atomic... but how about the relation between update and
> > final? I have two concurrent users in user-space but only one HW:
> >
> > Process A: Process B:
> > init() and set_key()
> > init() and different key
> > update(some_data)
> > update(different_data)
> > final()
> > final()
> >
> > The final() from process A will now produce the result of hashing/CRC
> > of some_data and different_data (and even maybe mixed with init() for
> > different key). All because in the meantime process B added its own
> > data to the HW.
> >
> > Best regards,
> > Krzysztof
> Can your hardware do export/import ?
>
> If yes, you can use workqueue and guard HW with spinlock,
> as in exynos hash in s5p-sss.c (or see other drivers).

Yes, workqueue doing all necessary HW initialization would be the
solution here as well. The point is what later Herbert wrote - I need
to take care about the HW state because the requests even for shash
will be coming in random order.

Best regards,
Krzysztof

2018-08-30 14:00:45

by Krzysztof Kozlowski

[permalink] [raw]
Subject: Re: Locking for HW crypto accelerators

On Thu, 30 Aug 2018 at 15:39, Herbert Xu <[email protected]> wrote:
>
> On Thu, Aug 30, 2018 at 02:22:22PM +0200, Krzysztof Kozlowski wrote:
> > Hi,
> >
> > I am trying to figure out necessary locking on the driver side of
> > crypto HW accelerator for symmetric hash (actually: CRC). I
> > implemented quite simple driver for shash_alg.
> >
> > I looked at the docs, I looked at the crypto kcapi core code... and
> > there is nothing about necessary locking. kcapi does not perform it.
> >
> > My HW is quite similar to drivers/crypto/stm32/stm32_crc32.c so it has
> > only one HW set of registers for dealing with CRC. Or in other words,
> > only one queue of one element. :) I implemented all shash_alg
> > callbacks - init(), update(), final()... and also finup() (manually
> > calling update+final) and digest() (init+update+final).
> >
> > Now imagine multiple user-space users of this crypto alg where all of
> > them call kcapi_md_digest() (so essentially init() -> update() ->
> > final()). It seems that kcapi does not perform any locking here so at
> > some point updates from different processes might be mixed with
> > finals:
> >
> > Process A: Process B:
> > init()
> > init()
> > update()
> > update()
> > final()
> > final()
> >
> > My findings show that the requests are indeed being mixed with each other...
>
> After each operation all state must be stored in the ahash_request
> object. The next operation should then load the state from the
> request object into the hardware.

Thanks, that's the solution I was missing.

Best regards,
Krzysztof